问题:
python中使用requests库对大量网站爬取中文时编码格式不同一,
解决:
暴力转换格式
# 更换为utf-8
try:
text = field.encode('iso-8859-1').decode("utf-8")
except Exception:
print("不用更换utf-8")
# 更换为gb2312
try:
text = field.encode('iso-8859-1').decode("gb2312")
except Exception:
print("不用更换gb2312")
# 更换为gbk
try:
text = field.encode('iso-8859-1').decode("gbk")
except Exception:
print("不用更换gbk")
参考: