Python 使用requests时的编码问题

最新推荐文章于 2024-06-18 10:00:40 发布

八戒爱飘柔

最新推荐文章于 2024-06-18 10:00:40 发布

阅读量3w

点赞数 5

分类专栏： Python学习文章标签： python 编码

本文链接：https://blog.csdn.net/a491057947/article/details/47292923

版权

Python学习专栏收录该内容

8 篇文章 0 订阅

订阅专栏

官网说明：

Compliance

Requests is intended to be compliant with all relevant specifications and RFCs where that compliance will not cause difficulties for users. This attention to the specification can lead to some behaviour that may seem unusual to those not familiar with the relevant specification.

Encodings

When you receive a response, Requests makes a guess at the encoding to use for decoding the response when you access the Response.text attribute. Requests will first check for an encoding in the HTTP header, and if none is present, will use chardet to attempt to guess the encoding.

The only time Requests will not do this is if no explicit charset is present in the HTTP headersand the Content-Type header contains text. In this situation, RFC 2616 specifies that the default charset must be ISO-8859-1. Requests follows the specification in this case. If you require a different encoding, you can manually set the Response.encoding property, or use the rawResponse.content.

意思就是:

当你收到一个响应时，Requests会猜测响应的编码方式，用于在你调用 Response.text 方法时对响应进行解码。Requests首先在HTTP头部检测是否存在指定的编码方式，如果不存在，则会使用 charade 来尝试猜测编码方式。

只有当HTTP头部不存在明确指定的字符集，并且 Content-Type 头部字段包含 text 值之时， Requests才不去猜测编码方式。

在这种情况下， RFC 2616 指定默认字符集必须是 ISO-8859-1 。Requests遵从这一规范。如果你需要一种不同的编码方式，你可以手动设置 Response.encoding 属性，或使用原始的 Response.content 。

测试

经过测试发现也有不准确的时候，下面看例子。

下面是获得的response内容：

很明显header部分有指定charset="gbk",按文档中的说明应该不会使用默认的编码ISO-8859-1进行解码，但结果却不是这样。

 r = requests.get(url)
 print r.encoding
#结果：ISO-8859-1

结果出现乱码，解决办法就是手动指定编码方式，调用requests.text时它就会按照指定的编码方式去解码。

r = requests.get(url)

r.encoding='gbk'
print r.headers['content-type']
data = r.text
print data

#打印结果无乱码

八戒爱飘柔

关注

5
点赞
踩
8

收藏

觉得还不错? 一键收藏
9
评论
Python 使用requests时的编码问题

官网说明：ComplianceRequests is intended to be compliant with all relevant specifications and RFCs where that compliance will not cause difficulties for users. This attention to the specification
复制链接

扫一扫

专栏目录