python爬虫时，解决编码方式问题的万能钥匙（uicode,utf8,gbk......）

西门大盗

于 2018-07-11 23:06:23 发布

阅读量1.6k

点赞数 1

文章标签：编码方式 decode chardet detect

本文链接：https://blog.csdn.net/xiongzaiabc/article/details/81008330

版权

无论遇到的网页代码是何种编码方式，都可以用以下方法统一解决

import chardet




 response = requests.get(url, headers=headers).content
cod = chardet.detect(response)   #获取具体的编码方式
coding = cod['encoding']  
html = response.decode(coding, 'ignore')  #进行编码回原来的编码方式

 注意：

str.decode(encoding='UTF-8',errors='strict').decode(encoding='UTF-8',errors='strict')

errors -- 设置不同错误的处理方案。默认为 'strict',意为编码错误引起一个UnicodeError。其他可能得值有 'ignore', 'replace', 'xmlcharrefreplace', 'backslashreplace' 以及通过 codecs.register_error() 注册的任何值。