UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0x8b in position 1: invalid start byte

当我们使用urllib库打印爬取的网页信息print(res.read().decode('utf-8'))出现:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

示例:

from urllib import request
url = 'https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=index&fr=&hs=0&xthttps=111110&sf=1&fmq=&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E7%8B%97&oq=%E7%8B%97&rsp=-1'

headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Cookie': 'BDqhfp=%E7%8B%97%26%260-10-1undefined%26%260%26%261; BIDUPSID=4B61D634D704A324E3C7E274BF11F280; PSTM=1624157516; BAIDUID=4B61D634D704A324C7EA5BA47BA5886E:FG=1; __yjs_duid=1_f7116f04cddf75093b9236654a2d70931624173362209; indexPageSugList=%5B%22%E7%8B%97%22%2C%22%E7%8C%AB%E5%92%AA%22%2C%22%E5%B0%8F%E9%80%8F%E6%98%8E%22%5D; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BAIDUID_BFESS=5DD3805F1A4CC3C9562CEAC3C22A1408:FG=1; __yjs_st=2_YTMzN2ZlYWQwNjg5NTFlNGY4NTMxMDBhOTc0ZDQxZjYwZWI0NzBiNjU1N2UyOGRiY2MzNWQ4OTM2YjU4MGU4MmNjYTNiZTk4ZDFkMWE1YmU2ODZhNGMwYzQ3OGE1YjcxZjNmZTEzYWY2ZjNiNGYxNjc0NWNlYjY5YmRhMTI3MmI2N2ZjOTkyYWUwYTZlZDUyMzY3NTc3YmU0MWUwNGM3MDk5NWE1ZTRhNzE4NjQwYWJlMjE3OTg5YzdkYjc0NmE4MjBhMjA2MDBkZmIwNDhjMjYzZjYxMTcyOGM2OTZmYjRlOGUwNTc1N2ZhYWI5YzEwZTVkODg0ZjI4OWM2ZjcyZF83XzM0OWQ2ZTJh; H_PS_PSSID=34268_34099_33969_34222_31660_34226_33848_34113_34073_34107_26350_22159; delPer=0; PSINO=6; BA_HECTOR=al21a125ag2l25851j1genv370q; BDRCVFR[X_XKQks0S63]=mk3SLVN4HKm; firstShowTip=1; cleanHistoryStatus=0; BDRCVFR[dG2JNJb_ajR]=mk3SLVN4HKm; BDRCVFR[-pGxjrCMryR]=mk3SLVN4HKm; userFrom=null; ab_sr=1.0.1_NzczYjg1NGJiOWUwOGQwM2E4YTE0MDJkM2E0YjQ4M2E1ZDk0YWQ1MGUyMmNjZTg4NzhjZDNkZDI0YjcwMjU5N2MxYmQxNWIwZmRjMWEwZjVkNmZkYzkwYTNiYTE3NDUwYWFkZDkyZWM3Njg3ZjQ0OGQ5ZWU3YTkxNDk1M2FiZTAxZTY5NmY3ZjA1NDgxODE3ZWE4MWQxOWUwMmIwYmUxZA==',
'Host': 'image.baidu.com',
'Referer': 'https://image.baidu.com/',
'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
'sec-ch-ua-mobile': '?0',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36'
}

req = request.Request(url,headers=headers)
res = request.urlopen(req)
print(res.read().decode('utf-8'))

#结果:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

以下提供了两种解决方法:

方法一:做gzip的解压
导入模块:

import gzip
from io import BytesIO

做gzip的解压

req = request.Request(url,headers=headers)
res = request.urlopen(req)
#在示例里导入模块,以及添加下面的这几行代码就OK了
buff = BytesIO(res.read())
f = gzip.GzipFile(fileobj=buff)
data= f.read().decode('utf-8')
print(data)

方法二

直接去掉在请求的头里的:"Accept-Encoding":"gzip, deflate, br"就OK了

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

别呀

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值