爬虫中文乱码解决方法

最新推荐文章于 2024-08-08 17:59:21 发布

Great_lid

最新推荐文章于 2024-08-08 17:59:21 发布

阅读量1.1k

点赞数 1

原文链接：https://blog.csdn.net/qq330214001/article/details/103376748

版权

Python 爬虫编码 response apparent_encoding

关键词由CSDN通过智能技术生成

如果爬取的中文形如’\x9d\x9cå\x8f\x8bç\x94’，则多半是编码有问题。其中一种解决方法为，通过requests.get获取网页访问的response后，查看response的encoding和apparent_encoding是否一致，若不一致，则编码有误。代码如下（其中url、headers自行定义）：

response = requests.get(url, headers = headers)
print(response.encoding == response.apparent_encoding)

若打印结果为false，则不一致，则需在获取response后，改变其编码格式，代码如下：

response = requests.get(url, headers = headers)
print(response.encoding == response.apparent_encoding)
response.encoding = response.apparent_encoding
print(response.encoding == response.apparent_encoding)