主流网站爬虫-被反爬

最新推荐文章于 2021-05-28 16:25:30 发布

香菇酱沙拉

最新推荐文章于 2021-05-28 16:25:30 发布

阅读量298

点赞数

分类专栏： python 文章标签： python 爬虫

原文链接：https://www.cnblogs.com/yunlixingchen/p/12157848.html

版权

python 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

问题：urllib.error.HTTPError: HTTP Error 418:

问题描述：当我使用Python的request爬取豆瓣网页时返回了http状态码为418，不是200，一般错误是4XX

错误描述：经过网上查询得知，418的意思是被网站的反爬程序返回的，网上解释为，418. I’m a teapot
The HTTP 418 I’m a teapot client error response code indicates that the server refuses to brew coffee because it is a teapot. This error is a reference to Hyper Text Coffee Pot Control Protocol which was an April Fools’ joke in 1998.

requests

当时我用的是urllib的request,我感觉这个库应该有点久了，所以换了requests这个库，然后再次请求，并添加了header的信息就可以了，如果不加程序放回的是空，没有结果，运行不会错

from urllib import request

r = request.urlopen(url)

html = r.read().decode("utf-8")

print(html)

使用requests并添加headers信息后：

import requests

headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}

r = requests.get(url,headers=headers)

html = r.text

print(html)

添加请求的头部信息就可以爬取成功了

香菇酱沙拉

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
主流网站爬虫-被反爬

问题：urllib.error.HTTPError: HTTP Error 418:问题描述：当我使用Python的request爬取豆瓣网页时返回了http状态码为418，不是200，一般错误是4XX错误描述：经过网上查询得知，418的意思是被网站的反爬程序返回的，网上解释为，418. I’m a teapotThe HTTP 418 I’m a teapot client error response code indicates that the server refuses to brew c
复制链接

扫一扫

专栏目录