常见博客类网站robots.txt
因为网站有可能变动,所以这些robots.txt都是某一时刻的情况。
csdn
http://www.csdn.net/robots.txt
Sitemap: http://www.csdn.net/article/sitemap.txt Disallow: /article_preview.html*
博客园
http://www.cnblogs.com/robots.txt
User-Agent: * Allow: /
中国博客网
http://www.blogchina.com/robots.txt
User-agent: * Disallow: /
网易博客
http://blog.163.com/robots.txt
User-agent: * Disallow: /apps/ Disallow: /settings Disallow: /dwr/ Disallow: /*/dwr/ Disallow: /unblock.do Disallow: /feedback.do Disallow: /*\${*}* Disallow: *jsessionid=* Disallow: /login.do Disallow: /qiangbao Disallow: /error.do Sitemap: http://blog.163.com/sitemap.xml
新浪博客
#限制的搜索引擎的User-Agent代码,*表示所有############## User-agent: * #限制不能搜索的目录,Disallow: 为空时开放所有目录###### Allow: /admin/blogmove/ Disallow: /admin/ Disallow: /include/ Disallow: /html/ Disallow: /queue/ Disallow: /config/ #开放搜索的目录有#################################### # / # /advice/ # /help/ # /lm/ # /main/ # /myblog/ #搜索引擎User-Agent代码对照表######################## # 搜索引擎 User-Agent代码 # AltaVista Scooter # Infoseek Infoseek # Hotbot Slurp # AOL Search Slurp # Excite ArchitextSpider # Google Googlebot # Goto Slurp # Lycos Lycos # MSN MSNBOT # Netscape Googlebot # NorthernLight Gulliver # WebCrawler ArchitextSpider # Iwon Slurp # Fast Fast # DirectHit Grabber # Yahoo Web Pages Googlebot # Looksmart Web Pages Slurp # Baiduspider Baidu