http://news.sina.com.cn/robots.txt 新浪网
User-agent: * //*代表任何用户都拦截 通配符
Disallow: /wap/ //Disallow 不允许 中间含有 /wap/ 来爬取数据
Disallow: /iframe/
Disallow: /temp/
https://www.qq.com/robots.txt qq网
User-agent: *
Disallow:
Sitemap: http://www.qq.com/sitemap_index.xml
https://news.qq.com/robots.txt
User-agent: *
Disallow:
Sitemap: //www.qq.com/sitemap_index.xml
Sitemap: http://news.qq.com/topic_sitemap.xml
https://www.baidu.com/robots.txt