Title: Google Hacking: An Intro for Beginners
Post by: Equix3n- on September 22, 2009, 04:32:17 AM
Original article is at the following link-
Google hacking is the process of employing complex search engine queries to locate sensitive information.
Because of various web server misconfigurations, sensitive information gets indexed by the search engines when spiders crawl them.The sensitive information may include:password files, confidential directories, logon portals,log files etc.
The most basic search involves searching for the required terms via the Google’s web interface. E.g
Phrase search involves enclosing the required terms in double-quotes. Google searches for all the words in the phrase in the "exact" order you provide them. This is very useful when you are searching for a specific thing and want to omit extraneous results. Case insensitivity is maintained in phrase search too. E.g.
Google treats asterisk (*) as a placeholder for any unknown term(s) and tries to find the best match(es) for it. It can be used both in basic as well as phrase search, but you have to separate it by a space from the preceding and succeeding words; e.g:
Excluding words from search: the minus sign (-)
Sometimes you want to exclude pages containing certain words from the search result. You can do this by prefacing the minus (-) sign to the unwanted word. The minus sign should be preceded with a space and should be placed immediately before the unwanted term.
A search such as (ethical hacker -cracker) will return pages containing ethical hacker but excluding the term cracker.
If you want to exclude multiple terms, you can do so by placing the minus sign before each term. E.g
Searching as is: the plus sign (+)
Sometimes, you want to include stop words in your search. One way to do this is by using them in a phrase search i.e enclosing it in double quotes. Another way is to place a plus (+) sign before the stop word. The plus sign tells Google to include the word succeeding it. Similar to the minus sign, the plus sign should be placed immediately before the word you want included and should be preceded by a space.E.g
Google's Boolean operators
Google allows you to use three Boolean operators: AND, OR and NOT
Query string length limit
Neglecting the stop words, you can search only up to 32 words in a single Google query. Google ignores any words after the first 32 words (excluding stop words) and returns a message.
Though Google claims to find thousands - or millions - of results for any query, it lets you view only the first 1,000 results. If you try to go beyond the first thousand results, Google displays an error message.
Google provides a myriad of additional operators to enhance your search (or hacking!) experience.
We will cover some of the most useful operators here.
All the advanced Google operators have the syntax- operator:search_term(s)
• There should not be any space between the operator, the colon and the search term.
• The search term can be a single term or a phrase.
Searching within a domain: site operator
This is perhaps the most useful Google operator for reconnaissance. The site operator is used to limit search to a particular domain. E.g
You can also exclude results from specific subdomains with the help of minus operator. E.g
Searching the title: intitle and allintitle operators
The intitle operator is used to search the title of the pages. E.g
It can also be combined with the site operator to limit search to a specific domain; e.g
Locating directory listings
One of the most sinister uses of the intitle operator is in locating directory listings. Directory listings include the phrase "index of" in their title. So, we can search for (intitle:"index of") or (intitle:index.of) to locate all the directory listings indexed by Google.
The period (.) in 'index.of' is the wildcard for single character.
You can also use this operator to search for password files; as,
The allintitle is also used to match the title of a webpage, but it searches for all the words that follow it. E.g
Similar to the intitle operator, the allintitle operator can also be used to discover index directories. This operator does not gel well with other advanced operators; consequently, you should use the intitle operator instead.
Searching the URL: inurl and allinurl operator
The inurl operator is used to locate URLs containing the search term.E.g
The inurl: operator can also be combined with the site operator to search URLs associated with a specific domain. It can also be utilized to discover vulnerable scripts if the script names are included in the URL.
Akin to the inurl operator, allinurl is also used to match the URL of a webpage, but it searches for all the words following it. The allinurl operator also does not combine well with other advanced operators and its use should be refrained.
Searching for a specific file type: filetype and ext operators
Google supports two operators to search for a specific type of file based on the file extension: filetype and ext. You can use either of the two operators.E.g
Links to a URL: link operator
The link operator is used to find all the webpages that have links to the specified URL. E.g
Viewing modified pages: cache operator
Whenever Googlebot crawls a webpage, it caches a snapshot of that page. This cached version could be very useful if that page is recently deleted or inaccessible owing to other internet problems. If the page is deleted, you can view its cached version which is stored in Google's server.
Every result that Google hands you over for your query, it also provides a link to the cached version of that page. You can browse the cached version via the cached link below the snippet for that result.
Another way to view the cached page is via the Google cache operator. E.g
Google cache could be very useful for an attacker if the website has modified their original content. It helps to view the old content of the page.
GOOGLE HACKING DEFENSES
Disable directory listings
Directory listings give away too much information than a visitor needs. Disabling it is always a good option.
Noindex your confidential files
Web crawlers can be forbidden from crawling a webpage using the 'noindex' meta tag or by putting that URL in the /robots.txt file.
Note that /robots.txt is publicly available; it shouldn't be used for hiding information as it can be viewed by anyone.
Noarchive to forbid caching
You can prevent search engines from caching a webpage by employing the 'noarchive' meta tag.
Employ Google-fu against your own site
You could perform these advanced searches against your website to discover any vulnerabilities.
Additionally, you can make use of automated tools like Wikto (http://www.sensepost.com/research/wikto/)and Sitedigger (http://"http://www.foundstone.com/us/resources/proddesc/sitedigger.htm") which will thoroughly scan your site.
Removing an indexed page
If your confidential page has already been indexed by Google, you van remove it via Google's URL removal tool (http://"http://google.webatnet.de/support/webmasters/bin/answer.py?hl=en&answer=92865" rel="nofollow").
• Johnny "ihackstuff" Long maintains a Google Hacking Database (http://"http://johnny.ihackstuff.com/ghdb/" rel="nofollow")- a list of numerous advanced Google queries which can be used to discover vulnerable targets.
• Johnny Long is also the author of Google Hacking for Penetration Testers (http://"http://www.amazon.com/Google-Hacking-Penetration-Testers-2/dp/B001UFP658/ref=sr_1_6?ie=UTF8&s=books&qid=1253202258&sr=8-6" rel="nofollow") which is a must for any serious Google hacker.
• SANS institute maintains a very handy Google cheat sheet (http://"http://www.sans.org/mentor/GoogleCheatSheet.pdf" rel="nofollow").