python request text 和 content的区别

最新推荐文章于 2023-10-25 15:57:41 发布

weixin_34281477

最新推荐文章于 2023-10-25 15:57:41 发布

阅读量367

点赞数 1

原文链接：http://blog.51cto.com/10676568/2361654

版权

1 test和content简介
resp.text返回的是Unicode型的数据。

resp.content返回的是bytes型也就是二进制的数据。

如果你想取文本，可以通过r.text。
如果想取图片，文件，则可以通过r.content。

2 如何查看网页的编码
方式一：
import requests
import chardet
s=requests.get('https://hao.360.cn/?h_lnk')
print(chardet.detect(s.content))

方式二：

import requests

s=requests.get('https://hao.360.cn/?h_lnk')

print(s.encoding)

意思是requests.text是根据网页的响应来猜测编码，如果服务器不指定的话，默认编码是"ISO-8859-1"所以这是为什么有些时候用 response.text 返回的是乱码的原因。

可以用response.encoding看一下他猜测的编码是啥。然后用response.encoding = 'utf-8'来设置编码

实例如下：

import requests

response=requests.get('http://www.qq.com')

response.encoding

>>'GB2312'

response.encoding="UTF-8"

response.encoding

>> 'UTF-8'

3 大部分情况建议使用.text，因为显示的是汉字，但有时会显示乱码，这时需要用.text.encode('utf-8')，中文常用utf-8和GBK，GB2312等。这样可以手工选择文字编码方式。

所以简而言之，.text是现成的字符串，.content还要解码，但是.text不是所有时候显示都正常，这是就需要用.content进行手动解码。

如：

方式一：content解码

s=requests.get('https://hao.360.cn/?h_lnk').content.decode('utf-8')

方式二：test编码

s=requests.get('https://hao.360.cn/?h_lnk').text.encode('utf-8')

转载于:https://blog.51cto.com/10676568/2361654

weixin_34281477

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。