Python使用requests库做爬虫时，乱码问题解决方案

最新推荐文章于 2023-10-13 15:38:44 发布

andelk

最新推荐文章于 2023-10-13 15:38:44 发布

阅读量313

点赞数

分类专栏： python 文章标签： python requests python 乱码

本文链接：https://blog.csdn.net/andelk/article/details/89378291

版权

python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

一、获取网页内容

分析：
res = requests.get(“http://www.baidu.com“)
res.text返回的是Unicode型的数据。
使用res.content返回的是bytes型的数据。
也就是说，如果你想取文本，可以通过res.text。
如果想取图片，文件，则可以通过res.content。
方法1：使用res.content，得到的是bytes型，再转为str
url='http://news.baidu.com'
res = requests.get(url)
html=res.content
html_doc=str(html,'utf-8') #html_doc=html.decode("utf-8","ignore")
print(html_doc)
方法2：使用res.text
url="http://news.baidu.com"
res=requests.get(url)
res.encoding='utf-8'
print(res.text)
方法3：使用res.text+apparent_encoding
url="http://news.baidu.com"
res=requests.get(url)
res.encoding=res.apparent_encoding
print(res.text)

二、获取内容后存入本地

方法1：r.content为bytes型，则open时需要open(filename,”wb”)
res=requests.get("music.baidu.com")
html=res.content
withopen('test.html','wb') as f:
f.write(html)
方法2：r.content为bytes型，转为str后存储
res = requests.get("http://www.baidu.com")
html=res.content
html_doc=str(html,'utf-8') #html_doc=html.decode("utf-8","ignore")
withopen('test5.html','w',encoding="utf-8") as f:
f.write(html_doc)
方法3：r.text为str，可以直接存储
res=requests.get("http://www.baidu.com")
res.encoding='utf-8'
html=res.text
withopen('test.html','w',encoding="utf-8") as f:
f.write(html)

andelk

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python使用requests库做爬虫时，乱码问题解决方案

一、获取网页内容分析：res = requests.get(“http://www.baidu.com“)res.text返回的是Unicode型的数据。使用res.content返回的是bytes型的数据。也就是说，如果你想取文本，可以通过res.text。如果想取图片，文件，则可以通过res.content。方法1：使用res.content，得到的是bytes型，再转为str...
复制链接

扫一扫

专栏目录