最近发现一个问题,网页爬取出来的中文显示为十六进制。代码如下:
import urllib.request as rst
import re
import requests
response = rst.urlopen('http://hq.sinajs.cn/list=s_sz000001')
print("dest text=", stockStr)
结果如下:
dest text= b'var hq_str_s_sz000001="\xc6\xbd\xb0\xb2\xd2\xf8\xd0\xd0,8.88,0.00,0.00,603378,53540";\n'
我试着改为utf8的编码,结果报错了。
print("dest text=", stockStr.decode('utf-8'))
提示解码失败:
UnicodeDecodeErr