python3 requests content和text 区别

最新推荐文章于 2023-10-25 15:57:41 发布

whatday

最新推荐文章于 2023-10-25 15:57:41 发布

阅读量1k

点赞数 1

原文链接：https://blog.csdn.net/whatday/article/details/107548587

版权

区别介绍

例子1 中文问题

例子2：保存图片

区别介绍

一直在想requests的content和text属性的区别，从print 结果来看是没有任何区别的

response = requests.get(url)
response.content
response.text

两个函数定义如下：

 @property
    def text(self):
        """Content of the response, in unicode.

        If Response.encoding is None, encoding will be guessed using
        ``chardet``.

        The encoding of the response content is determined based solely on HTTP
        headers, following RFC 2616 to the letter. If you can take advantage of
        non-HTTP knowledge to make a better guess at the encoding, you should
        set ``r.encoding`` appropriately before accessing this property.
        """

  
    @property
    def content(self):
        """Content of the response, in bytes."""

通过函数定义可以知道：

resp.text返回的是Unicode型的数据。
resp.content返回的是bytes型也就是二进制的数据
文本类型用text，图片、文件类型用contexnt

content中存的是字节码，而text中存的是Beautifulsoup根据猜测的编码方式将content内容编码成字符串。

直接输出content，会发现前面存在b'这样的标志，这是字节字符串的标志，而text是，没有前面的b,对于纯ascii码，这两个可以说一模一样，对于其他的文字，需要正确编码才能正常显示。大部分情况建议使用.text，因为显示的是汉字，但有时会显示乱码，这时需要用.content.decode('utf-8')

所以简而言之，.text是现成的字符串，.content还要编码，但是.text不是所有时候显示都正常，这是就需要用.content进行手动编码。

例子1 中文问题

输出的结果为：

遇到中文时就会显示乱码

这样不是我们想要的结果了

那么这个时候就只能用content的方法来实现解码了

结果是：

这个时候不过都是16进制的东西，这个没有关系再进行 decode('utf-8') 对应的解码就可以了

解决方案：进行 decode('utf-8') 对应的解码就可以了

结果：

这样就可以显示你想要的结果了

例子2：保存图片

import re
import requests

session = requests.session()
index = 1
try:
    f = open('chouti.txt','r')
    txt = f.read()
    pattern = re.compile('http://(.*?).jpg',re.S)
    items = re.findall(pattern, txt)
    for item in items:
        url = "http://" + item + ".jpg"
        respone = session.get(url)

        # context open文件要用wb，text open文件用w。
        f1 = open(str(index)+".jpg", 'wb')
        f1.write(respone.content)
        f1.close()
        index+=1
finally:
    f.close()