判断IP是否为搜索引擎蜘蛛或爬虫

判断IP是否为搜索引擎蜘蛛或爬虫


主要是通过向DNS服务器发送反向域名解析查询,获取指定ip的相关域名信息来判断是否为相应搜索引擎或爬虫. 
通过 dig 或者 host 工具皆可查询.


Example:


> dig -x 8.8.8.8 +short
google-public-dns-a.google.com.


>  dig google-public-dns-a.google.com +short
8.8.8.8




Example:


> host 8.8.8.8
8.8.8.8.in-addr.arpa domain name pointer google-public-dns-a.google.com.


> host google-public-dns-a.google.com
google-public-dns-a.google.com has address 8.8.8.8
google-public-dns-a.google.com has IPv6 address 2001:4860:4860::8888


常见搜索引擎蜘蛛及官方说明
Googlebot
http://www.google.com/bot.html


bingbot
http://www.bing.com/webmaster/help/which-crawlers-does-bing-use-8c184ec0


Baiduspider
http://www.baidu.com/search/spider.htm


Yahoo!
http://help.yahoo.com/help/us/ysearch/slurp


360Spider
http://www.so.com/help/help_3_2.html


YoudaoBot
http://www.youdao.com/help/webmaster/spider/


sogou spider
http://www.sogou.com/docs/help/webmasters.htm#07


EasouSpider
http://www.easou.com/search/spider.html


Applebot
http://www.apple.com/go/applebot


FacebookBot
https://developers.facebook.com/docs/sharing/webmasters/crawler




百度robots协议
> curl -i  http://www.baidu.com/robots.txt
HTTP/1.1 200 OK
Date: Thu, 31 Mar 2016 04:27:29 GMT
Server: Apache
P3P: CP=" OTI DSP COR IVA OUR IND COM "
Set-Cookie: BAIDUID=178CAA8DA6084CFB2B1131C5BC48270B:FG=1; expires=Fri, 31-Mar-17 04:27:29 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1
Last-Modified: Thu, 25 Dec 2014 04:29:36 GMT
ETag: "91e-50b02db060c00"
Accept-Ranges: bytes
Content-Length: 2334
Vary: Accept-Encoding,User-Agent
Connection: Keep-Alive
Content-Type: text/plain


User-agent: Baiduspider
Disallow: /baidu
Disallow: /s?
Disallow: /ulink?
Disallow: /link?


User-agent: Googlebot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: MSNBot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Baiduspider-image
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: YoudaoBot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou web spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou inst spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou spider2
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou blog
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou News Spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sogou Orion spider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: ChinasoSpider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: Sosospider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?




User-agent: yisouspider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: EasouSpider
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?


User-agent: *
Disallow: /









评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值