UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 6608: invalid start byte

1. 请求网页
import requests

url = 'https://tech.china.com/article/20190829/kejiyuan0129355913.html'

response = requests.get(url).text

print(response)

在这里插入图片描述

2. 出现乱码,查看网页源代码

utf-8 编码格式
在这里插入图片描述

3. 修改代码
import requests

url = 'https://tech.china.com/article/20190829/kejiyuan0129355913.html'

response = requests.get(url).content.decode('utf-8')

print(response)

在这里插入图片描述

4. 报错

很明显是decode的时候出现了错误

5. 查看decode源码
def decode(self, *args, **kwargs): # real signature unknown
   """
   Decode the bytes using the codec registered for encoding.
   
     encoding
       The encoding with which to decode the bytes.
     errors
       The error handling scheme to use for the handling of decoding errors.
       The default is 'strict' meaning that decoding errors raise a
       UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
       as well as any other name registered with codecs.register_error that
       can handle UnicodeDecodeErrors.
   """
   pass
6. 解决办法

源码里已经说的很清楚了
解码有三种处理模式
(1)strict
(2)ignore
(3)replace

默认是strict 模式,如果报错UnicodeDecodeError, 就替换成其他任何一种模式

7. 解决后代码
import requests

url = 'https://tech.china.com/article/20190829/kejiyuan0129355913.html'

response = requests.get(url).content.decode('utf-8', 'ignore')

print(response)

或者

import requests

url = 'https://tech.china.com/article/20190829/kejiyuan0129355913.html'

response = requests.get(url).content.decode('utf-8', 'replace')

print(response)
发布了260 篇原创文章 · 获赞 49 · 访问量 9802
展开阅读全文

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 书香水墨 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览