python的代码保存到文档中 打不开怎么办,Python无法打开UTF-8编码的文本文件

I have .py script which contains following code to open specific text file (which was generated by Exchange Powershell):

with codecs.open("C:\\Temp\\myfile.txt",encoding="utf_8",mode="r",errors="replace") as myfile:

content = myfile.readlines() #here we convert lines to list

print(content)

however, i tried also utf-16-be and utf-16-le (and standard ASCII obviously), but the file output is still looking like this (this is just part of it):

['��\r', '\x00\n', '\x00D\x00o\x00m\x00a\x00i\x00n\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r', '\x00\n', '\x00-\x00-\x00-\x00-\x00-\x00-\x00

the file which i am trying to open is located here

does anybody please know what am i doing wrong? Is this some different kind of encoding?

解决方案

First, this text is definitely not UTF-8, so that's why Python can't open it as a UTF-8-encoded text file.

Second, you claim you "tried also utf-16-be and utf-16-le", but didn't show how you did that, and I suspect you did it wrong.

From the output, this is very likely BOM-encoded UTF-16-LE.

The first two bytes—because of the way you've printed them, we can't tell which bytes they are, but this is what it looks like when you print out \xFF and \xFE bytes. And the rest of the strings are a bunch of NUL even bytes alternating with reasonable-looking bytes, which almost always means UTF-16-LE. Plus, most common two-byte with a BOM in the wild is UTF-16-LE, and the fact that you're using all Microsoft tools makes that even more likely.

So, if you'd really tried utf-16-le, you would almost certainly have gotten the right string, but with an extra \ufeff at the start.

But of course the right answer is to just decode it as 'utf-16', which will consume and use the BOM properly.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值