is the process of employing complex search engine queries to locate sensitive information.
Because of various web server misconfigurations, sensitive information gets indexed by the search engines when spiders crawl them.The sensitive information may include:password files, confidential directories, logon portals,log files etc.BASIC SEARCH
The most basic search involves searching for the required terms via the Google’s web interface. E.g
The important thing to note here is that Google searches are case insensitive. Whether you search for (ethical hacker) or (ETHICAL HACKER) or (EtHiCaL HaCkEr) it provides you the same number of results.
Phrase search involves enclosing the required terms in double-quotes. Google searches for all the words in the phrase in the "exact" order you provide them. This is very useful when you are searching for a specific thing and want to omit extraneous results. Case insensitivity is maintained in phrase search too. E.g.
There are certain common words like "a", "and", "the", "for" etc. which Google ignores in basic search. These words are called stop words
. Stop words, when used in the phrase search are not excluded by Google.
ADVANCED SEARCHGoogle wildcard
Google treats asterisk (*) as a placeholder for any unknown term(s)
and tries to find the best match(es) for it. It can be used both in basic as well as phrase search, but you have to separate it by a space from the preceding and succeeding words; e.g:
Excluding words from search: the minus sign (-)
- you can * multiple *
- Here, Google will try to find best match(es) for the wildcard.
Sometimes you want to exclude pages containing certain words from the search result. You can do this by prefacing the minus (-) sign to the unwanted word. The minus sign should be preceded with a space and should be placed immediately before the unwanted term.
A search such as (ethical hacker -cracker) will return pages containing ethical hacker
but excluding the term cracker
If you want to exclude multiple terms, you can do so by placing the minus sign before each term. E.g
Searching as is: the plus sign (+)
- ethical hacker -cracker -blackhat
- computer virus -antivirus -antispyware
Sometimes, you want to include stop words in your search. One way to do this is by using them in a phrase search i.e enclosing it in double quotes. Another way is to place a plus (+) sign before the stop word. The plus sign tells Google to include the word succeeding it. Similar to the minus sign, the plus sign should be placed immediately before the word you want included and should be preceded by a space.E.g
Google's Boolean operators
- +this +is +a test
- From this example, you can see that there's no limit to the number of plus sign you can use in a query.
Google allows you to use three Boolean operators: AND, OR and NOT
Query string length limit
- The AND operator
AND operator is used to search multiple terms.E.g
Watch the above query carefully and you will see that it is just the basic Google query. Google includes the AND operator by default. You do not have to use it.
- If you want to search for pages containing the terms;'google', 'hacking' and 'tutorial', you can construct your query as:-
- google AND hacking AND tutorial
The NOT operator
The NOT operator is used to exclude words from a search.The NOT operator is not supported by Google. Instead, Google uses the minus sign to exclude terms.
The OR operator
The default Google search employs AND operation. You can override this functionality using the OR operator. The OR operator (OR is used in all caps) tells Google to locate either one of several words.
You can use the pipe symbol (|) instead of OR to perform OR operation.E.g
- The query (google | microsoft) returns all pages containing either Google or Microsoft or both.
Neglecting the stop words, you can search only up to 32 words in a single Google query. Google ignores any words after the first 32 words (excluding stop words) and returns a message.153,000,000 Results!..really?
Though Google claims to find thousands - or millions - of results for any query, it lets you view only the first 1,000 results. If you try to go beyond the first thousand results, Google displays an error message.ADVANCED OPERATORS
Google provides a myriad of additional operators to enhance your search (or hacking!) experience.
We will cover some of the most useful operators here.
All the advanced Google operators have the syntax- operator:search_term(s)
• There should not be any space between the operator, the colon and the search term.
• The search term can be a single term or a phrase.Searching within a domain: site operatorSyntax
This is perhaps the most useful Google operator for reconnaissance. The site
operator is used to limit search to a particular domain. E.g
This will return pages only from ethicalhacker.net
- site:sans.org training
This will search only 'sans.org' for the term 'training'.
You can also exclude results from specific subdomains with the help of minus operator. E.g
Searching the title: intitle and allintitle operatorsSyntax
- site:sans.org training -site:www.sans.org
This will search 'sans.org' for the term 'training' but omits results from the subdomain 'www'
operator is used to search the title of the pages. E.g
- intitle:"google hacking"[/i]
This will list all the pages with 'google hacking' somewhere in their title.
It can also be combined with the site
operator to limit search to a specific domain; e.g
Locating directory listings
- site:blogntweets.info intitle:"google hacking"
One of the most sinister uses of the intitle
operator is in locating directory listings. Directory listings include the phrase "index of"
in their title. So, we can search for (intitle:"index of") or (intitle:index.of) to locate all the directory listings indexed by Google.
The period (.) in 'index.of' is the wildcard for single character.
You can also use this operator to search for password files; as,
- intitle:Index.of etc shadow
This will search for UNIX /etc/shadow password files
is also used to match the title of a webpage, but it searches for all the words that follow it. E.g
- allintitle:penetration testing
This query will search for webpages with the words 'penetration' and 'testing'. Notice that unlike the intitle operator it does not require multiple words to be enclosed in quotes.
Similar to the intitle
operator, the allintitle
operator can also be used to discover index directories. This operator does not gel well with other advanced operators; consequently, you should use the intitle
operator instead.Searching the URL: inurl and allinurl operatorSyntax
operator is used to locate URLs containing the search term.E.g
This will list all the URLs containing the term 'hacker'.
operator can also be combined with the site
operator to search URLs associated with a specific domain. It can also be utilized to discover vulnerable scripts if the script names are included in the URL.
Akin to the inurl
is also used to match the URL of a webpage, but it searches for all the words following it. The allinurl
operator also does not combine well with other advanced operators and its use should be refrained.Searching for a specific file type: filetype and ext operatorsSyntax
Google supports two operators to search for a specific type of file based on the file extension: filetype
. You can use either of the two operators.E.g
- filetype:pdf “google hacking”
- ext:pdf “google hacking”
This will list all the .pdf files comprising the term ‘google hacking’
You can use both these operators together with the site
operator to search for specific type of files in a particular domain.E.g
Links to a URL: link operatorSyntax
- • site:sans.org ext:doc training
This will search 'sans.org' for all the Microsoft word document files comprising the term 'training'.
operator is used to find all the webpages that have links to the specified URL. E.g
Viewing modified pages: cache operatorSyntax
Whenever Googlebot crawls a webpage, it caches a snapshot
of that page. This cached version could be very useful if that page is recently deleted or inaccessible owing to other internet problems. If the page is deleted, you can view its cached version which is stored in Google's server.
Every result that Google hands you over for your query, it also provides a link to the cached version of that page. You can browse the cached version via the cached
link below the snippet for that result.
Another way to view the cached page is via the Google cache
This will show you the cached version of the page when Googlebot last crawled it. The current version of the page could be different than the cached version.
Google cache could be very useful for an attacker if the website has modified their original content. It helps to view the old content of the page.
GOOGLE HACKING DEFENSESDisable directory listings
Directory listings give away too much information than a visitor needs. Disabling it is always a good option.Noindex your confidential files
Web crawlers can be forbidden from crawling a webpage using the 'noindex' meta tag or by putting that URL in the /robots.txt file.
Note that /robots.txt is publicly available; it shouldn't be used for hiding information as it can be viewed by anyone.Noarchive to forbid caching
You can prevent search engines from caching a webpage by employing the 'noarchive' meta tag.Employ Google-fu against your own site
You could perform these advanced searches against your website to discover any vulnerabilities.
Additionally, you can make use of automated tools like Wikto
which will thoroughly scan your site.Removing an indexed page
If your confidential page has already been indexed by Google, you van remove it via Google's URL removal tool
• Johnny "ihackstuff" Long maintains a Google Hacking Database
- a list of numerous advanced Google queries which can be used to discover vulnerable targets.
• Johnny Long is also the author of Google Hacking for Penetration Testers
which is a must for any serious Google hacker.
• SANS institute maintains a very handy Google cheat sheet