python下载网页后解码内容失败
Exception has occurred: UnicodeDecodeError
'gb2312' codec can't decode byte 0x9f in position 7976: illegal multibyte sequence
解决方式,碰到gb2312则当成gb18030来解码
# coding=utf-8
import requests
import os
import chardet
if __name__ == "__main__":
url = "http://www.ysxs8.com/yousheng/12032_228.html"
result = requests.get(url)
base = os.path.basename(url)
if (result.status_code == 200):
en = chardet.detect(result.content)
encode = en["encoding"]
if encode.lower() == "gb2312":
encode = "gb18030"
content = result.content.decode(encode)
with open(base, "w") as f:
f.write(content)