Python requests乱码的五种解决办法

bigcarp

已于 2023-10-17 12:03:19 修改

阅读量2.1k

点赞数 1

文章标签： python 开发语言

于 2023-07-16 13:54:03 首次发布

文章介绍了使用requests模块请求网页内容时可能出现的乱码问题及其解决方案，包括通过apparent_encoding、指定utf-8解码、使用chardet和cchardet库进行编码检测以及encode+decode方法。重点提到了cchardet作为chardet的加速版，适用于对性能有要求的场景。

摘要由CSDN通过智能技术生成

使用requests模块请求网页内容，经常会出现乱码，例如：

import requests
res = requests.get("https://www.baidu.com/")
print(res.text)

乱码的原因是内容编码和解码方式不一致导致的，解决办法有以下几种解决办法：

第一种：apparent_encoding

import requests
res = requests.get("https://www.baidu.com/")
res.encoding = res.apparent_encoding
print(res.text)

第二种：content utf-8解码
一种临时性的解决办法，不建议用这种方法，相当于写死代码了。

import requests
res = requests.get("https://www.baidu.com/")
try:
    txt = res.content.decode('gbk')
except UnicodeDecodeError as e:
    # print(e)
    txt = res.content.decode('utf-8')
print(txt)

第三种：chardet

import requests
import chardet
res = requests.get("https://www.baidu.com/")
encoding = chardet.detect(res.content)['encoding']
print(res.content.decode(encoding))

第四种：cchardet
cchardet需要提前安装一下：pip install cchardet。

import requests
import cchardet
res = requests.get("https://www.baidu.com/")
encoding = cchardet.detect(res.content)['encoding']
print(res.content.decode(encoding))

chardet 和 cchardet的区别：cchardet 是 chardet 的一个加速版本，使用了C语言实现，因此性能更高

chardet 和 cchardet 都是 Python 库，用于字符编码检测，主要用于确定文本数据的字符编码格式（如UTF-8、ISO-8859-1等），以便正确地解析和处理文本数据。它们之间的主要区别在于性能和实现语言。

chardet:
- chardet 是一个用 Python 编写的字符编码检测库。
- 它的性能相对较慢，因为它是一个纯Python库，不是特别适合处理大型文本数据。
- chardet 基于统计模型和启发式算法，通过分析字符的分布和出现频率来猜测文本的编码。
- 你可以使用 chardet 安装它，通常是通过 pip：pip install chardet。
cchardet:
- cchardet 是 chardet 的一个加速版本，使用了C语言实现，因此性能更高。
- 由于它是用C编写的，所以在处理大型文本文件时速度更快，适用于需要高性能字符编码检测的应用。
- cchardet 通常被认为是 chardet 的替代品，可以无缝替代 chardet，因为它提供了相同的接口。
- 你可以使用 cchardet 安装它，通常是通过 pip：pip install cchardet。

总之，如果你需要进行字符编码检测并且对性能有较高要求，可以考虑使用 cchardet。如果性能不是首要考虑因素，或者你需要在某些环境中使用纯Python库，那么 chardet 仍然是一个不错的选择。

第五种：encode + decode

import requests
import cchardet
res = requests.get("https://www.baidu.com/")
res_encoding = res.encoding  # 响应的编码方式
con_encoding = cchardet.detect(res.content)['encoding']  # 内容的编码方式
print(res.text.encode(res_encoding).decode(con_encoding))  # 重新编解码text

bigcarp

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
Python requests乱码的五种解决办法

cchardet需要提前安装一下：pip install cchardet。一种临时性的解决办法，不建议用这种方法，相当于写死代码了。第一种：apparent_encoding。第二种：content utf-8解码。第五种：encode + decode。第四种：cchardet。第三种：chardet。
复制链接

扫一扫