Google pops in as a serving guardian until you see the other side of it. Google may have answers to all your queries, but you need to frame your questions properly and that’s where GOOGLE DORKS pitches in. It’s not a complicated software to install, execute and wait for results, instead it’s a combination of keywords (intitle, inurl, site, intext, allinurl etc) with which you can access Google to get what you are exactly after.
For example, your objective is to download pdf documents related to JAVA, the normal Google search will be “java pdf document free download” (free is a mandatory keyword without which any Google search is not complete). But when you use Google dorks, your search will be “filetype: pdf intext: java”. Now with these keywords, Google will understand what exactly you are looking for than your previous search. Also, you will get more accurate results. That seems promising for an effective Google search.
However, attackers can use these keyword searches for a very different purpose – to steal/extract information from your website/server. Now assuming I need usernames and passwords which are cached in servers, I can use a simple query like this. “filetype:xls passwords site: in”, this will give you Google results of cached contents from different websites in India which have usernames and passwords saved in it. It is as simple as that. In relation to online shopper website, if I use a query “filetype:xls passwords inurl:onlineshopper.com” the results might dismay anyone. In simple terms, your private or sensitive information will be available on the internet, not because someone hacked your information but because Google was able to retrieve it free of cost.
The file named “robots.txt” (often referred to as web robots, wanderers, crawlers, spiders) is a program that can traverse the web automatically. Many search engines like Google, Bing, and Yahoo use robots.txt to scan websites and extract information.
robots.txt is a file that gives permission to search engines what to access & what not to access from the website. It is a kind of control you have over search engines. Configuring Google dorks isn’t rocket science, you need to know which information to be allowed and not allowed in search engines. Sample configuration of robots.txt will look like this.
Sadly, these robots.txt configurations are often missed or configured inappropriately by website designers. Shockingly, most of the government & college websites in India are prone to this attack, revealing all sensitive information about their websites. With malware, remote attacks, botnets & other types of high-end threats flooding the internet, Google dork can be more threatening since it requires a working internet connection in any device to retrieve any sensitive information. This doesn’t end with retrieving sensitive information alone, using Google dorks anyone can access vulnerable CCTV cameras, modems, mail usernames, passwords and online order details just by searching Google.