python默认编码有什么用_Python2.7.8的默认编码是什么？

weixin_39849287

于 2020-11-24 07:44:02 发布

阅读量75

点赞数

文件编码 ASCII Windows-1252 字节解析编码识别

关键词由CSDN通过智能技术生成

它基本上不会做任何透明的编码/解码-它只是打开文件并返回它。在

这是图书馆的代码：def open(filename, mode='rb', encoding=None, errors='strict', buffering=1):

""" Open an encoded file using the given mode and return

a wrapped version providing transparent encoding/decoding.

Note: The wrapped version will only accept the object format

defined by the codecs, i.e. Unicode objects for most builtin

codecs. Output is also codec dependent and will usually be

Unicode as well.

Files are always opened in binary mode, even if no binary mode

was specified. This is done to avoid data loss due to encodings

using 8-bit values. The default file mode is 'rb' meaning to

open the file in binary read mode.

encoding specifies the encoding which is to be used for the

file.

errors may be given to define the error handling. It defaults

to 'strict' which causes ValueErrors to be raised in case an

encoding error occurs.

buffering has the same meaning as for the builtin open() API.

It defaults to line buffered.

The returned wrapped file object provides an extra attribute

.encoding which allows querying the used encoding. This

attribute is only available if an encoding was specified as

parameter.

"""

if encoding is not None:

if 'U' in mode:

# No automatic conversion of '\n' is done on reading and writing

mode = mode.strip().replace('U', '')

if mode[:1] not in set('rwa'):

mode = 'r' + mode

if 'b' not in mode:

# Force opening of the file in binary mode

mode = mode + 'b'

file = __builtin__.open(filename, mode, buffering)

if encoding is None:

return file

info = lookup(encoding)

srw = StreamReaderWriter(file, info.streamreader, info.streamwriter, errors)

# Add attributes to simplify introspection

srw.encoding = encoding

return srw

如您所见，如果encoding为None，它只返回打开的文件。在

以下是您的文件，每个字节以十进制表示，并显示其相应的ascii字符：

^{pr2}$

在ascii中打开它时遇到的问题是十进制值为180的字节。Ascii码最多只能达到127。所以这让我想到这一定是某种扩展的ascii，128-255用于额外的符号。在仔细阅读了wikipedia关于ascii（https://en.wikipedia.org/wiki/ASCII）的文章之后，它提到了一个流行的ascii扩展名windows-1252。在windows-1252中，十进制值180映射到锐音符（'）。然后我决定搜索你文件中的字符串，看看它实际上与什么相关。这是我发现“哈佛杯30周年”http://www.365chess.com/tournaments/Harvard_Cup_30%C2%B4_1989/21650

所以在夏天，正确的编码方式可能是windows-1252。这是我的测试程序：import codecs

with codecs.open('f.txt', 'r', encoding='windows-1252') as f:

print f.read()

输出... 0-1

[Event "Harvard Cup 30´"]

...

weixin_39849287

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。