今天写一个入门的python爬虫,每次到html=response.read().decode('utf8')
时都会报错:
UnicodeDecodeError: 'utf-8' codec can't decode byte
,在网上找了很多教程都不行,最后看了一下print(response.info())
的信息:
C:\Users\21565\AppData\Local\Programs\Python\Python37-32\python.exe “D:/pycharm/PyCharm Community Edition 2019.1.3/jre64/script.py”
http://www.baidu.com/
Bdpagetype: 1
Bdqid: 0xee8547da0008b738
Cache-Control: private
Content-Type: text/html
Cxy_all: baidu+7b9f4839a960139644644560adfd5884
Date: Wed, 24 Jul 2019 06:49:03 GMT
Expires: Wed, 24 Jul 2019 06:48:47 GMT
P3p: CP=" OTI DSP COR IVA OUR IND COM "
Server: BWS/1.1
Set-Cookie: BAIDUID=212E504168171A3E7DE4002F3F7906B0:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BIDUPSID=212E504168171A3E7DE4002F3F7906B0; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=1563950943; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: delPer=0; path=/; domain=.baidu.com
Set-Cookie: BDSVRTM=0; path=/
Set-Cookie: BD_HOME=0; path=/
Set-Cookie: H_PS_PSSID=1469_21089_29518_28518_29098_29568_28838_29221_22158; path=/; domain=.baidu.com
Vary: Accept-Encoding
X-Ua-Compatible: IE=Edge,chrome=1
Connection: close
Transfer-Encoding: chunked
看到有chrome=1
这句时,我打开了chrome浏览器,因为之前用了ssr,所以chrome全是繁体字,我也不知道为啥,然后我试着把chrome给删了,再试着运行脚本。。。。。。。。。。。。。。
万恶的chrome啊,成了,不报错了,编码正常。