python3 unicode error,使用Python 3的readlines（）进行Unicode错误处理

最新推荐文章于 2023-07-27 14:01:18 发布

人事星球

最新推荐文章于 2023-07-27 14:01:18 发布

阅读量767

点赞数 1

文章标签： python3 unicode error

本文介绍了在Python中遇到UnicodeEncodeError时如何处理，建议在打开文件时指定'utf-8'编码并设置errors参数为'ignore'或'replace'，以忽略或替换无法解码的字符。在Python2中，需要先将字节转换为字符串，并使用类似的方法处理无法解码的字符。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

I keep getting this error while reading a text file. Is it possible to handle/ignore it and proceed?

UnicodeEncodeError: ‘charmap’ codec can’t decode byte 0x81 in position

7827: character maps to undefined.

解决方案

In Python 3, pass an appropriate errors= value (such as errors=ignore or errors=replace) on creating your file object (presuming it to be a subclass of io.TextIOWrapper -- and if it isn't, consider wrapping it in one!); also, consider passing a more likely encoding than charmap (when you aren't sure, utf-8 is always a good place to start).

For instance:

f = open('misc-notes.txt', encoding='utf-8', errors='ignore')

In Python 2, the read() operation simply returns bytes; the trick, then, is decoding them to get them into a string (if you do, in fact, want characters as opposed to bytes). If you don't have a better guess for their real encoding:

your_string.decode('utf-8', 'replace')

...to replace unhandled characters, or

your_string.decode('utf-8', 'ignore')

to simply ignore them.

That said, finding and using their real encoding (rather than guessing utf-8) would be preferred.