python抓取网页后用decode解码,报错信息如下:
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
html = html.decode("gb2312")
UnicodeDecodeError: 'gb2312' codec can't decode byte 0x8f in position 6018: illegal multibyte sequence
初步推测是网页中有部分数值是错误的或者说不是采用<meta>标签中charset显示的显示的编码,那么可以通过设置‘decode’函数的第二参数——‘errors’来解决这一问题
举例:
html = html.decode("gb2312",errors = 'ignore')
截图:
注意:不要把‘ignore’输成了‘ignone’,否则会报错!
报错信息:
LookupError: unknown error handler name 'ignone'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\Personal\Desktop\测试.py", line 8, in <module>
html = rep.read().decode("gb2312",errors="ignone")
LookupError: decoding with 'gb2312' codec failed (LookupError: unknown error handler name 'ignone')
截图: