2020-11-12 python Robots协议

最新推荐文章于 2023-07-03 14:33:35 发布

miyafung

最新推荐文章于 2023-07-03 14:33:35 发布

阅读量273

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/m0_38010621/article/details/109646721

版权

python 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

Robots协议

以京东为例子https://www.jd.com/robots.txt

User-agent: *    //代表任何的网络爬虫都应该遵守以下协议
Disallow: /?*    //任何的网络爬虫都不允许访问以？开头的路径
Disallow: /pop/*.html 
Disallow: /pinpai/*.html?* 
User-agent: EtaoSpider 
Disallow: / 
User-agent: HuihuiSpider 
Disallow: / 
User-agent: GwdangSpider 
Disallow: / 
User-agent: WochachaSpider 
Disallow: /

基本语法

# 注释， *代表所有，/代表根目录

User-agent:*
Disallow:/

Robots协议的使用
网络爬虫:自动或人工识别robots.txt,再进行内容爬取。
约束性:Robots协议是建议但非约束性,网络爬虫可以不遵守,但存在法律风险。

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

miyafung

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
2020-11-12 python Robots协议

Robots协议以京东为例子https://www.jd.com/robots.txtUser-agent: * //代表任何的网络爬虫都应该遵守以下协议Disallow: /?* //任何的网络爬虫都不允许访问以？开头的路径Disallow: /pop/*.html Disallow: /pinpai/*.html?* User-agent: EtaoSpider Disallow: / User-agent: HuihuiSpider Disallow: / U.
复制链接

扫一扫