UnicodeDecodeError:“ charmap”编解码器无法解码位置Y的字节X:字符映射到<undefined>

在尝试用Python 3读取文本文件时遇到了'charmap'编解码器错误。错误表明在位置Y无法解码字节X,因为字符映射到未定义。解决方案包括确定并指定正确的文件编码,如使用Notepad++识别编码,或者在打开文件时指定'ignore'或'replace'选项来处理未知字符。
摘要由CSDN通过智能技术生成

本文翻译自:UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to

I'm trying to get a Python 3 program to do some manipulations with a text file filled with information. 我正在尝试使一个Python 3程序对充满信息的文本文件进行一些操作。 However, when trying to read the file I get the following error: 但是,当尝试读取文件时,出现以下错误:

Traceback (most recent call last): 追溯(最近一次通话):
File "SCRIPT LOCATION", line NUMBER, in 文件“ SCRIPT LOCATION”(第NUMBER行)位于
text = file.read()
File "C:\\Python31\\lib\\encodings\\cp1252.py", line 23, in decode 解码时的文件“ C:\\ Python31 \\ lib \\ encodings \\ cp1252.py”,第23行
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2907500: character maps to <undefined> UnicodeDecodeError:“ charmap”编解码器无法解码位置2907500中的字节0x90:字符映射为<undefined>


#1楼

参考:https://stackoom.com/question/cjvn/UnicodeDecodeError-charmap-编解码器无法解码位置Y的字节X-字符映射到-undefined


#2楼

As an extension to @LennartRegebro's answer : 作为@LennartRegebro答案的扩展:

If you can't tell what encoding your file uses and the solution above does not work (it's not utf8 ) and you found yourself merely guessing - there are online tools that you could use to identify what encoding that is. 如果您不知道文件使用什么编码,并且上面的解决方案不起作用(它不是utf8 ),而您发现自己只是在猜测-您可以使用在线工具来识别哪种编码。 They aren't perfect but usually work just fine. 它们并不完美,但通常效果很好。 After you figure out the encoding you should be able to use solution above. 确定编码后,您应该可以使用上面的解决方案。

EDIT: (Copied from comment) 编辑:(从评论中复制)

A quite popular text editor Sublime Text has a command to display encoding if it has been set... 一个非常流行的文本编辑器Sublime Text有一个命令可以显示编码(如果已设置)。

  1. Go to View -> Show Console (or Ctrl + ` ) 转到View -> Show Console (或Ctrl + `

在此处输入图片说明

  1. Type into field at the bottom view.encoding() and hope for the best (I was unable to get anything but Undefined but maybe you will have better luck...) 在底部view.encoding()处输入字段,并希望取得最佳效果(除了Undefined我什么都没得到,但也许您会遇到好运...)

在此处输入图片说明


#3楼

仅在万一file = open(filename, encoding="utf8")不起作用的情况下添加,请尝试尝试file = open(filename, errors='ignore')


#4楼

Alternatively if you don't need to decode the file, such as uploading the file to a website, open(filename, 'rb') . 另外,如果您不需要解码文件,例如将文件上传到网站,请open(filename, 'rb') r = reading, b = binary r =读数,b =二进制


#5楼

For those working in Anaconda in Windows, I had the same problem. 对于那些在Windows的Anaconda中工作的人来说,我遇到了同样的问题。 Notepad++ help me to solve it. Notepad ++可以帮助我解决它。

Open the file in Notepad++. 在记事本++中打开文件。 In the bottom right it will tell you the current file encoding. 在右下角,它将告诉您当前的文件编码。 In the top menu, next to "View" locate "Encoding". 在顶部菜单中,在“视图”旁边找到“编码”。 In "Encoding" go to "character sets" and there with patiente look for the enconding that you need. 在“编码”中,转到“字符集”,然后耐心地寻找所需的编码。 In my case the encoding "Windows-1252" was found under "Western European" 在我的情况下,在“西欧”下找到了编码“ Windows-1252”


#6楼

TLDR? TLDR? Try: file = open(filename, encoding='cp437) 试试: file = open(filename, encoding='cp437)

Why? 为什么? When one use: 一次使用:

file = open(filename)
text = file.read()

Python assumes the file uses the same codepage as current environment (cp1252 in case of the opening post) and tries to decode it to its own default UTF-8. Python假定该文件使用与当前环境相同的代码页(在开篇文章的情况下为cp1252),并尝试将其解码为自己的默认UTF-8。 If the file contains characters of values not defined in this codepage (like 0x90) we get UnicodeDecodeError. 如果文件包含此代码页中未定义的值的字符(如0x90),则会得到UnicodeDecodeError。 Sometimes we don't know the encoding of the file, sometimes the file's encoding may be unhandled by Python (like eg cp790), sometimes the file can contain mixed encodings. 有时我们不知道文件的编码,有时文件的编码可能无法通过Python处理(例如cp790),有时文件可能包含混合编码。

If such characters are unneeded, one may decide to replace them by question marks, with: 如果不需要这些字符,则可以决定用问号替换它们,方法是:

file = open(filename, errors='replace')

Another workaround is to use: 另一个解决方法是使用:

file = open(filename, errors='ignore')

The characters are then left intact, but other errors will be masked too. 这些字符将保留完整,但是其他错误也将被掩盖。

Quite good solution is to specify the encoding, yet not any encoding (like cp1252), but the one which has ALL characters defined (like cp437): 很好的解决方案是指定编码,但不指定任何编码(例如cp1252),而是指定已定义所有字符的编码(例如cp437):

file = open(filename, encoding='cp437')

Codepage 437 is the original DOS encoding. 代码页437是原始DOS编码。 All codes are defined, so there are no errors while reading the file, no errors are masked out, the characters are preserved (not quite left intact but still distinguishable). 所有代码都已定义,因此在读取文件时没有错误,没有错误被掩盖,字符得以保留(不是很完整,但仍可区分)。

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值