Post Tue Sep 22, 2009 4:32 am

Google Hacking: An Intro for Beginners

Original article is at the following link-
http://www.blogntweets.info/google-hack ... beginners/
[br]
Google hacking is the process of employing complex search engine queries to locate sensitive information.

Because of various web server misconfigurations, sensitive information gets indexed by the search engines when spiders crawl them.The sensitive information may include:password files, confidential directories, logon portals,log files etc.

BASIC SEARCH
The most basic search involves searching for the required terms via the Google’s web interface. E.g
  • hacker
  • ethical hacker
The important thing to note here is that Google searches are case insensitive. Whether you search for (ethical hacker) or (ETHICAL HACKER) or (EtHiCaL HaCkEr) it provides you the same number of results.


PHRASE SEARCH

Phrase search involves enclosing the required terms in double-quotes. Google searches for all the words in the phrase in the "exact" order you provide them. This is very useful when you are searching for a specific thing and want to omit extraneous results. Case insensitivity is maintained in phrase search too. E.g.
  • "hacker"
  • "ethical hacker"
There are certain common words like "a", "and", "the", "for" etc. which Google ignores in basic search. These words are called stop words. Stop words, when used in the phrase search are not excluded by Google.


ADVANCED SEARCH


Google wildcard
Google treats asterisk (*) as a placeholder for any unknown term(s) and tries to find the best match(es) for it. It can be used both in basic as well as phrase search, but you have to separate it by a space from the preceding and succeeding words; e.g:
  • you can * multiple *
  • Here, Google will try to find best match(es) for the wildcard.
[br]

Excluding words from search: the minus sign (-)
Sometimes you want to exclude pages containing certain words from the search result. You can do this by prefacing the minus (-) sign to the unwanted word. The minus sign should be preceded with a space and should be placed immediately before the unwanted term.
A search such as (ethical hacker -cracker) will return pages containing ethical hacker but excluding the term cracker.
If you want to exclude multiple terms, you can do so by placing the minus sign before each term. E.g
  • ethical hacker -cracker -blackhat
  • computer virus -antivirus -antispyware
[br]

Searching as is: the plus sign (+)
Sometimes, you want to include stop words in your search. One way to do this is by using them in a phrase search i.e enclosing it in double quotes. Another way is to place a plus (+) sign before the stop word. The plus sign tells Google to include the word succeeding it. Similar to the minus sign, the plus sign should be placed immediately before the word you want included and should be preceded by a space.E.g
  • +this +is +a test
  • From this example, you can see that there's no limit to the number of plus sign you can use in a query.
[br]
Google's Boolean operators
Google allows you to use three Boolean operators: AND, OR and NOT
         
  • The AND operator
    AND operator is used to search multiple terms.E.g
    • If you want to search for pages containing the terms;'google', 'hacking' and 'tutorial', you can construct your query as:-
    • google AND hacking AND tutorial

    Watch the above query carefully and you will see that it is just the basic Google query. Google includes the AND operator by default. You do not have to use it.

    The NOT operator
    The NOT operator is used to exclude words from a search.The NOT operator is not supported by Google. Instead, Google uses the minus sign to exclude terms.

    The OR operator
    The default Google search employs AND operation. You can override this functionality using the OR operator. The OR operator (OR is used in all caps) tells Google to locate either one of several words.
    You can use the pipe symbol (|) instead of OR to perform OR operation.E.g
    • The query (google | microsoft) returns all pages containing either Google or Microsoft or both.
[br]
Query string length limit
Neglecting the stop words, you can search only up to 32 words in a single Google query. Google ignores any words after the first 32 words (excluding stop words) and returns a message.


153,000,000 Results!..really?
Though Google claims to find thousands - or millions - of results for any query, it lets you view only the first 1,000 results. If you try to go beyond the first thousand results, Google displays an error message.


ADVANCED OPERATORS
Google provides a myriad of additional operators to enhance your search (or hacking!) experience.
We will cover some of the most useful operators here.

All the advanced Google operators have the syntax- operator:search_term(s)
• There should not be any space between the operator, the colon and the search term.
• The search term can be a single term or a phrase.

Searching within a domain: site operator
Syntax: site:Domain

This is perhaps the most useful Google operator for reconnaissance. The site operator is used to limit search to a particular domain. E.g
  • site:ethicalhacker.net
    This will return pages only from ethicalhacker.net
  • site:sans.org training
    This will search only 'sans.org' for the term 'training'.
[br]
You can also exclude results from specific subdomains with the help of minus operator. E.g
  • site:sans.org training -site:www.sans.org
    This will search 'sans.org' for the term 'training' but omits results from the subdomain 'www'
[br]
Searching the title: intitle and allintitle operators
Syntax: intitle:search_term
Syntax: allintitle:search_term(s)

The intitle operator is used to search the title of the pages. E.g
  • intitle:"google hacking"[/i]
    This will list all the pages with 'google hacking' somewhere in their title.
[br]
It can also be combined with the site operator to limit search to a specific domain; e.g
  • site:blogntweets.info intitle:"google hacking"
[br]
Locating directory listings
One of the most sinister uses of the intitle operator is in locating directory listings. Directory listings include the phrase "index of" in their title. So, we can search for (intitle:"index of") or (intitle:index.of) to locate all the directory listings indexed by Google.

The period (.) in 'index.of' is the wildcard for single character.

You can also use this operator to search for password files; as,
  • intitle:Index.of etc shadow
    This will search for UNIX /etc/shadow password files
[br]
The allintitle is also used to match the title of a webpage, but it searches for all the words that follow it. E.g
  • allintitle:penetration testing
    This query will search for webpages with the words 'penetration' and 'testing'. Notice that unlike the intitle operator it does not require multiple words to be enclosed in quotes.
[br]
Similar to the intitle operator, the allintitle operator can also be used to discover index directories. This operator does not gel well with other advanced operators; consequently, you should use the intitle operator instead.


Searching the URL: inurl and allinurl operator
Syntax: inurl:search_term
Syntax: allinurl:search_term(s)

The inurl operator is used to locate URLs containing the search term.E.g
  • inurl:hacker
    This will list all the URLs containing the term 'hacker'.
[br]
The inurl: operator can also be combined with the site operator to search URLs associated with a specific domain. It can also be utilized to discover vulnerable scripts if the script names are included in the URL.

Akin to the inurl operator, allinurl is also used to match the URL of a webpage, but it searches for all the words following it. The allinurl operator also does not combine well with other advanced operators and its use should be refrained.


Searching for a specific file type:  filetype and ext operators
Syntax: filetype:type_of_file
Syntax: ext:type_of_file

Google supports two operators to search for a specific type of file based on the file extension: filetype and ext. You can use either of the two operators.E.g
  • filetype:pdf “google hacking”
  • ext:pdf “google hacking”
    This will list all the .pdf files comprising the term ‘google hacking’
You can use both these operators together with the site operator to search for specific type of files in a particular domain.E.g
  • • site:sans.org ext:doc training
    This will search 'sans.org' for all the Microsoft word document files comprising the term 'training'.
[br]
Links to a URL: link operator
Syntax: link:URL

The link operator is used to find all the webpages that have links to the specified URL. E.g
  • link:www.blogntweets.info
  • ink:www.ethicalhacker.net
[br]
Viewing modified pages: cache operator
Syntax: cache:URL

Whenever Googlebot crawls a webpage, it caches a snapshot of that page. This cached version could be very useful if that page is recently deleted or inaccessible owing to other internet problems. If the page is deleted, you can view its cached version which is stored in Google's server.

Every result that Google hands you over for your query, it also provides a link to the cached version of that page. You can browse the cached version via the cached link below the snippet for that result.

Another way to view the cached page is via the Google cache operator. E.g
  • cache:www.blogntweets.info
    This will show you the cached version of the page when Googlebot last crawled it. The current version of the page could be different than the cached version.
[br]
Google cache could be very useful for an attacker if the website has modified their original content. It helps to view the old content of the page.


GOOGLE HACKING  DEFENSES

Disable directory listings
Directory listings give away too much information than a visitor needs. Disabling it is always a good option.

Noindex your confidential files
Web crawlers can be forbidden from crawling a webpage using the 'noindex' meta tag or by putting that URL in the /robots.txt file.
Note that /robots.txt is publicly available; it shouldn't be used for hiding information as it can be viewed by anyone.

Noarchive to forbid caching
You can prevent search engines from caching a webpage by employing the 'noarchive' meta tag.

Employ Google-fu against your own site
You could perform these advanced searches against your website to discover any vulnerabilities.

Additionally, you can make use of automated tools like Wiktoand Sitedigger which will thoroughly scan your site.

Removing an indexed page
If your confidential page has already been indexed by Google, you van remove it via Google's URL removal tool.


ADDITIONAL RESOURCES
• Johnny "ihackstuff" Long maintains a Google Hacking Database- a list of numerous advanced Google queries which can be used to discover vulnerable targets.

• Johnny Long is also the author of  Google Hacking for Penetration Testers which is a must for any serious Google hacker.

• SANS institute maintains a very handy  Google cheat sheet.

http://www.google.com/support/websearch/?hl=en

http://www.googleguide.com/
Last edited by Xen on Thu Oct 08, 2009 2:18 am, edited 1 time in total.