判断IP是否为搜索引擎蜘蛛或爬虫
主要是通过向DNS服务器发送反向域名解析查询,获取指定ip的相关域名信息来判断是否为相应搜索引擎或爬虫.
通过 dig 或者 host 工具皆可查询.
常见搜索引擎蜘蛛及官方说明
百度robots协议
主要是通过向DNS服务器发送反向域名解析查询,获取指定ip的相关域名信息来判断是否为相应搜索引擎或爬虫.
通过 dig 或者 host 工具皆可查询.
Example:
> dig -x 8.8.8.8 +short
google-public-dns-a.google.com.
> dig google-public-dns-a.google.com +short
8.8.8.8
Example:
> host 8.8.8.8
8.8.8.8.in-addr.arpa domain name pointer google-public-dns-a.google.com.
> host google-public-dns-a.google.com
google-public-dns-a.google.com has address 8.8.8.8
google-public-dns-a.google.com has IPv6 address 2001:4860:4860::8888
常见搜索引擎蜘蛛及官方说明
Googlebot
http://www.google.com/bot.html
bingbot
http://www.bing.com/webmaster/help/which-crawlers-does-bing-use-8c184ec0
Baiduspider
http://www.baidu.com/search/spider.htm
Yahoo!
http://help.yahoo.com/help/us/ysearch/slurp
360Spider
http://www.so.com/help/help_3_2.html
YoudaoBot
http://www.youdao.com/help/webmaster/spider/
sogou spider
http://www.sogou.com/docs/help/webmasters.htm#07
EasouSpider
http://www.easou.com/search/spider.html
Applebot
http://www.apple.com/go/applebot
FacebookBot
https://developers.facebook.com/docs/sharing/webmasters/crawler
百度robots协议
> curl -i http://www.baidu.com/robots.txt
HTTP/1.1 200 OK
Date: Thu, 31 Mar 2016 04:27:29 GMT
Server: Apache
P3P: CP=" OTI DSP COR IVA OUR IND COM "
Set-Cookie: BAIDUID=178CAA8DA6084CFB2B1131C5BC48270B:FG=1; expires=Fri, 31-Mar-17 04:27:29 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
Last-Modified: Thu, 25 Dec 2014 04:29:36 GMT
ETag: "91e-50b02db060c00"
Accept-Ranges: bytes
Content-Length: 2334
Vary: Accept-Encoding,User-Agent
Connection: Keep-Alive
Content-Type: text/plain
User-agent: Baiduspider
Disallow: /baidu
Disallow: /s?
Disallow: /ulink?
Disallow: /link?
User-agent: Googlebot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: MSNBot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: Baiduspider-image
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: YoudaoBot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: Sogou web spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: Sogou inst spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: Sogou spider2
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: Sogou blog
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: Sogou News Spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: Sogou Orion spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: ChinasoSpider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: Sosospider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: yisouspider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: EasouSpider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
User-agent: *
Disallow: /