I have a UTF-16 LE file with BOM. I'd like to flip this file in to UTF-8 without BOM so I can parse it using Python.
The usual code that I use didn't do the trick, it returned unknown characters instead of the actual file contents.
f = open('dbo.chrRaces.Table.sql').read()
f = str(f).decode('utf-16le', errors='ignore').encode('utf8')
print f
What would be the proper way to decode this file so I can parse through it with f.readlines()?
解决方案
Firstly, you should read in binary mode, otherwise things will get confusing.
Then, check for and remove the BOM, since it is part of the file, but not part of the actual text.
import codecs
encoded_text = open('dbo.chrRaces.Table.sql', 'rb').read() #you should read in binary mode to get the BOM correctly