python爬虫系列二：requests-乱码处理（2）

最新推荐文章于 2024-07-12 16:58:11 发布

qq_42787271

最新推荐文章于 2024-07-12 16:58:11 发布

阅读量761

点赞数

分类专栏： python爬虫文章标签： Python 爬虫乱码处理

本文链接：https://blog.csdn.net/qq_42787271/article/details/81564413

版权

本文主要探讨了Python爬虫中requests库在处理乱码问题上的方法，包括人工转码和自动处理。介绍了编码方式属性如gbk、utf-8，以及通过meta的charset来确定编码。讲解了decode和encode函数在转码过程中的应用。同时，提到了自动处理乱码时可以使用chardet模块检测内容的编码，通过res2.status_code检查请求状态，并利用chardet.detect(rp.content)['encoding']获取编码信息。

摘要由CSDN通过智能技术生成

在转码之前，我们首先了解一下常识：

编码方式属性encoding：gbk,utf-8，寻找编码方式，找meta下的charset
转码函数：decode，encode
rp.content本身就是字节流形式
rp.text字符串形式

人工转码

decode,encode,encoding

#人工转码
#转码函数：decode(),encode()

import requests
res=requests.get("http://ibeifeng.com")
#print(res.content.decode("gbk"))#字节流（bytes）->字符串（str）
#print(res.text.encode("gbk")) #字符串（str）->字节流（bytes）

#如果出现乱码，就需要转换编码方式，
#encoding 编码方式属性，设置text编码格式
res2=reques