关于robots.txt的实例

最新推荐文章于 2022-01-09 08:37:57 发布

txmmy

最新推荐文章于 2022-01-09 08:37:57 发布

阅读量318

点赞数

分类专栏： python-reptile 文章标签： python 爬虫

本文链接：https://blog.csdn.net/txmmy/article/details/115797305

版权

python-reptile 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

robot.txt：降低爬虫程序被网站的反爬虫机制封禁的风险
使用python自带robotparser
参考书：python网络爬虫实战吕云翔张扬

```RobotParser.py
import urllib.robotparser as urobot
import requests
import urllib
#方法一：
url="https://www.taobao.com/"
rp=urobot.RobotFileParser()
rp.set_url(url+"/robots.txt")
rp.read()
user_agent='Googlebot'
if rp.can_fetch(user_agent,'https://www.taobao.com/item/'):
    site=requests.get(url)
    print("seem good")
else:
    print("cannot scrap because robots.txt banned you!")
#方法二：
def url_robots(url,newurl,user_agent):
    rp = urobot.RobotFileParser()
    rp.set_url(url + "/robots.txt")
    rp.read()
    if rp.can_fetch(user_agent, newurl):
        urllib.request.urlopen(newurl)
        print("seem good")
    else:
        print("cannot scrap because robots.txt banned you!")

url="https://www.taobao.com/"
user_agent ='Googlebot'
newurl='https://www.taobao.com/item/'
test=url_robots(url,newurl,user_agent)

运行结果：seem good*2

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

txmmy

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
关于robots.txt的实例

robot.txt：降低爬虫程序被网站的反爬虫机制封禁的风险参考书：python网络爬虫实战吕云翔张扬```RobotParser.pyimport urllib.robotparser as urobotimport requestsimport urllib#方法一：url="https://www.taobao.com/"rp=urobot.RobotFileParser()rp.set_url(url+"/robots.txt")rp.read()user_agent='G
复制链接

扫一扫