https://www.owasp.org/index.php/Testing:_Search_engine_discovery/reconnaissance_(OWASP-IG-002)
Summary
There are direct and indirect elements to search engine discovery and reconnaissance. Direct methods relate to searching the indexes and the associated content from caches. Indirect methods relate to gleaning sensitive design and configuration information by searching forums, newsgroups and tendering websites.
通过搜索引擎可以发现或者侦测到一些直接或者间接的元素。直接方法是从缓存中搜索索引和相关内容。间接方法是通过搜索论坛、新闻组和相关网站收集敏感的设计和配置信息。
Once a search engine robot has completed crawling, it commences indexing the web page based on tags and associated attributes, such as <TITLE>, in order to return the relevant search results. [1]
搜索引擎完成爬行,会基于标签存储网页的索引和相关属性,以返回搜索结果相关的内容。
If the robots.txt file is not updated during the lifetime of the web site, and inline HTML meta tags that instruct robots not to index content have not been used, then it is possible for indexes to contain web content not intended to be included in by the owners. Website owners may use the previously mentioned robots.txt, HTML meta tags, authentication and tools provided by search engines to remove such content.
如果robots.txt文件在网站生命期不进行更新,且内嵌的HTML标签会指示robot不索引未被使用的内容,那么这就可能包含网站拥有者不期望被包含的网页索引。网站所有者可能使用之前提到的robot.txt,HTML标签,认证,可以使用搜索引擎提供的工具移除这些内容。
Test Objectives
To understand what sensitive design and configuration information is exposed of the application/system/organization both directly (on the organization's website) or indirectly (on a third party website)
通过间接或者直接的方法了解应用程序/系统/组织的哪些敏感设计和配置信息泄露出来了。
How to Test
Using a search engine, search for:
- Network diagrams and configurations
- 网络图表和配置
- Archived posts and emails by administrators and other key staff
- 管理员和其他关键用户的存档和电邮
- Logon procedures and username formats
- 登录程序和用户名格式
- Usernames and passwords
- 用户名和密码
- Error message content
- 错误信息内容
- Development, test, UAT and staging versions of the website
- 网站的开发、测试、用户验收测试(User Acceptance Testing)和阶段版本
Black Box Testing
Using the advanced "site:" search operator, it is possible to restrict search results to a specific domain [2]. Do not limit testing to just one search engine provider - they may generate different results depending on when they crawled content and their own algorithms. Consider:
使用site的高级部分:搜索操作符,它可以限制搜索结果局限在特定的域名。不要仅局限于一个搜索引擎,因为他们的爬行内容和算法的不同可能导致的结果也不同。考虑一下搜索引擎:
- Baidu
- binsearch.info
- Bing
- Duck Duck Go
- ixquick/Startpage
- Shodan
- PunkSpider
Duck Duck Go and ixquick/Startpage provide reduced information leakage about the tester.
Duck Duck Go和ixquick/Startpage降低关于测试者的信息泄露。
Google provides the Advanced "cache:" search operator [2], but this is the equivalent to clicking the "Cached" next to each Google Search Result. Hence, the use of the Advanced "site:" Search Operator and then clicking "Cached" is preferred.
cache 操作符:先使用site 再点击Cached 是极好的。
The Google SOAP Search API supports the doGetCachedPage and the associated doGetCachedPageResponse SOAP Messages [3] to assist with retrieving cached pages. An implementation of this is under development by the OWASP "Google Hacking" Project.
Google SOAP Search API提供与返回缓存页面相关的doGetCachedPage 和doGetCachedPageResponse SOAP相关的信息。
PunkSpider is web application vulnerability search engine. It has little use for pentester doing manual work. However it can be useful as demonstration of easiness of finding vulnerabilities by script-kiddies.
PunkSpider是一个网络应用程序漏洞搜索引擎。对于人工渗透测试用处较小,但是通过脚本较容易找到漏洞
Example
To find the web content of owasp.org indexed by a typical search engine, the syntax required is:
site:owasp.org
To display the index.html of owasp.org as cached, the syntax is:
cache:owasp.org
Google Hacking Database
Google Hacking Database is list of useful search queries for for google. Queries are put in several categories:
Google Hacking Database列出了google有用的搜索查询。查询被分为几类:
- Footholds
- 立足点
- Files containing usernames
- 包含用户名的文件
- Sensitive Directories
- 敏感目录
- Web Server Detection
- 网页服务器探测
- Vulnerable Files
- 脆弱文件
- Vulnerable Servers
- 脆弱服务器
- Error Messages
- 错误信息
- Files containing juicy info
- 含有更多信息的文件
- Files containing passwords
- 包含密码的文件
- Sensitive Online Shopping Info
- 敏感的网上超市信息
Gray Box testing and example
Gray Box testing is the same as Black Box testing above.
Vulnerability References
Web
[1] "Google Basics: Learn how Google Discovers, Crawls, and Serves Web Pages" - https://support.google.com/webmasters/answer/70897
[2] "Operators and More Search Help" - https://support.google.com/websearch/answer/136861?hl=en
[3] "Google Hacking Database" - http://www.exploit-db.com/google-dorks/
Tools
[4] FoundStone SiteDigger - http://www.mcafee.com/uk/downloads/free-tools/sitedigger.aspx
[5] Google Hacker - http://yehg.net/lab/pr0js/files.php/googlehacker.zip
[6] Stach & Liu's Google Hacking Diggity Project - http://www.stachliu.com/resources/tools/google-hacking-diggity-project/
[7] PunkSPIDER - http://punkspider.hyperiongray.com/
Remediation
Carefully consider the sensitivity of design and configuration information before it is posted online.
Periodically review the sensitivity of existing design and configuration information that is posted online.