读取文本文件内容时,文件的编码方式有可能不一样。不正确读取的时候会出现如下错误:UnicodeDecodeError: 'gbk' codec can't decode byte。
所以要先识别文件的编码方式,然后根据此编码方式进行读取:
myfile = r'c:\test.cpp'
encoding = 'utf-8-sig'
bytes = min(32, os.path.getsize(myfile))
raw = open(myfile, 'rb').read(bytes)
result = chardet.detect(raw)
encoding = result['encoding']
if encoding == 'ascii':
encoding = None
with codecs.open(myfile, 'r', encoding=encoding) as file:
lines = file.readlines()