[python爬虫]对html解析读取编码格式，统一转码为utf-8

最新推荐文章于 2024-06-08 11:12:05 发布

时光在身后挡住去路

最新推荐文章于 2024-06-08 11:12:05 发布

阅读量1w

点赞数 3

分类专栏： python 文章标签： python html url 爬虫编码

本文链接：https://blog.csdn.net/qq_34369618/article/details/53463206

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

from urllib.request import urlopen
import  chardet
response=urlopen(url,timeout=3)
html_byte=response.read()
chardit1 = chardet.detect(html_byte)
file = open(PROJECT_NAME + '/' + str(ALLNUM) + '.html', 'wb')          html_string=html_byte.decode(chardit1['encoding']).encode('utf-8')
file.write(html_string)
file.close()

利用到了chardet中的detect方法，获取chardit1[‘encoding’]探知是何种类型的编码，对其进行译码，再编码。

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

时光在身后挡住去路

关注关注

3
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
[python爬虫]对html解析读取编码格式，统一转码为utf-8

from urllib.request import urlopen import chardet response=urlopen(url,timeout=3) html_byte=response.read() chardit1 = chardet.detect(html_byte) file = open(PROJ
复制链接

扫一扫