python 爬虫 response得到乱码

最新推荐文章于 2024-08-07 16:30:29 发布

炽天使_晓

最新推荐文章于 2024-08-07 16:30:29 发布

阅读量7.3k

点赞数 1

这个问题折磨了我几乎一天，好在我倔强地不停搜索解决方法。

“终于等到你，还好我没放弃。”

进入正题，感谢大神的分享，开个传送门：https://www.cnblogs.com/leomo/p/6869230.html

以下为代码,爬取汉字“一”的篆书字，得到网页源代码：

import requests

#使用post方法爬取网页信息

url = 'http://www.diyiziti.com/Builder'
data = {'Content':urllib2.quote('一'),
        'FontInfoId':Sort}
headers = {'content-type': 'charset=utf8'}
response = requests.post(url = url, data = data, headers=headers)
print(response.content)

过程：

当我使用get方法不传入参数时，打印其得到的网页的编码格式。

url = 'http://www.diyiziti.com/Builder'
response = requests.get(url)
print(response.encoding)

>>>utf-8

得到结果：utf-8

但是当我用post方法传入参数进去，打印其得到的网页的编码格式。

url = 'http://www.diyiziti.com/Builder'
data = {'Content':urllib2.quote(wd),'FontInfoId':Sort}
response = requests.post(url=url,data = data)
print(response.encoding)

>>>None

得到结果：None

百思不得其解，直到看到大神的解决方法，明白了当我输入数据得到响应后的网页源码时，它并未指定编码方式。

文章：https://blog.csdn.net/sentimental_dog/article/details/52661974 中指出