I have a Python 3 program that reads some strings from a Windows-1252 encoded file:
with open(file, 'r', encoding="cp1252") as file_with_strings:
# save some strings
Which I later want to write to stdout. I've tried to do:
print(some_string)
# => UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 180: ordinal not in range(128)
print(some_string.decode("utf-8"))
# => AttributeError: 'str' object has no attribute 'decode'
sys.stdout.buffer.write(some_str)
# => TypeError: 'str' does not support the buffer interface
print(some_string.encode("cp1252").decode("utf-8"))
# => UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 180: invalid continuation byte
print(some_string.encode("cp1252"))
# => has the unfortunate result of printing b'' instead of just the string
I'm scratching my head here. I'd like to print the string I got from the file just as it appears there, in cp1252. (In my terminal, when I do more $file, these characters appear as question marks, so my terminal is probably ascii.)
Would love some clarification! Thanks!
解决方案
When you encode with cp1252, you have to decode with the same.
Eg:
import sys
txt = ("hi hello\n").encode("cp1252")
#print((txt).decode("cp1252"))
sys.stdout.buffer.write(txt)
sys.stdout.flush()
This will print "hi hello\n" (which was encoded in cp1252) after decoding it.