代码示例:
with open(TargetFile,'r',encoding='utf-8') as inputdata:
contents=inputdata.readlines()
当我们打开一个utf8含bom或者其他未知字符的文件时,会提示unicodeDecodeError:‘utf-8’ codec can’t decode byte 0x8f
解决办法:
我们加入erros参数,可以设置为’ignore’或者’replace’
with open(TargetFile,'r',encoding='utf-8',errors='ignore') as inputdata:
contents=inputdata.readlines()
这里借用stackoverflow一个回答来看效果:
>>> s = b'\xe5abc\nline2\nline3'
>>> with open('evil_unicode.txt','wb') as f:
... f.write(s)
...
16
>>> with open('evil_unicode.txt', 'r') as f:
... lines = f.readlines()
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: invalid continuation byte
>>> with open('evil_unicode.txt', 'r', errors='replace') as f:
... lines = f.readlines()
...
>>> lines
['�abc\n', 'line2\n', 'line3']
>>> with open('evil_unicode.txt', 'r', errors='ignore') as f:
... lines = f.readlines()
...
>>> lines
['abc\n', 'line2\n', 'line3']