python3 unicode error,使用Python 3的readlines()进行Unicode错误处理

本文介绍了在Python中遇到UnicodeEncodeError时如何处理,建议在打开文件时指定'utf-8'编码并设置errors参数为'ignore'或'replace',以忽略或替换无法解码的字符。在Python2中,需要先将字节转换为字符串,并使用类似的方法处理无法解码的字符。
摘要由CSDN通过智能技术生成

I keep getting this error while reading a text file. Is it possible to handle/ignore it and proceed?

UnicodeEncodeError: ‘charmap’ codec can’t decode byte 0x81 in position

7827: character maps to undefined.

解决方案

In Python 3, pass an appropriate errors= value (such as errors=ignore or errors=replace) on creating your file object (presuming it to be a subclass of io.TextIOWrapper -- and if it isn't, consider wrapping it in one!); also, consider passing a more likely encoding than charmap (when you aren't sure, utf-8 is always a good place to start).

For instance:

f = open('misc-notes.txt', encoding='utf-8', errors='ignore')

In Python 2, the read() operation simply returns bytes; the trick, then, is decoding them to get them into a string (if you do, in fact, want characters as opposed to bytes). If you don't have a better guess for their real encoding:

your_string.decode('utf-8', 'replace')

...to replace unhandled characters, or

your_string.decode('utf-8', 'ignore')

to simply ignore them.

That said, finding and using their real encoding (rather than guessing utf-8) would be preferred.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值