I'm learning how to read text files. I used this way:
f=open("sample.txt")
print(f.read())
It worked fine if I typed the txt file myself. But when I copied text from a news article on the web, it produced the following error:
UnicodeEncodeError: 'charmap' codec can't encode charater '\u2014' in position 738: character maps to undefined
I tried changing the Encoding setting in Notepad++ to UTF-8 as I read somewhere it is due to that
I also tried using:
f=open("sample.txt",encoding='utf-8')
from here
But it still didn't work.
解决方案
You're on Windows and trying to print to the console. The print() is throwing the exception.
The Windows console only natively supports 8bit code pages, so anything outside of your region will break (despite what people say about chcp 65001).
You need to install and use https://github.com/Drekin/win-unicode-console. This module talks at a low-level to the console API, giving support for multi-byte characters, for input and output.
Alternatively, don't print to the console and write your output to a file, opened with an encoding. For example:
with open("myoutput.log", "w", encoding="utf-8") as my_log:
my_log.write(body)
Ensure you open the file with the correct encoding.