python默认编码有什么用_Python2.7.8的默认编码是什么?

它基本上不会做任何透明的编码/解码-它只是打开文件并返回它。在

这是图书馆的代码:def open(filename, mode='rb', encoding=None, errors='strict', buffering=1):

""" Open an encoded file using the given mode and return

a wrapped version providing transparent encoding/decoding.

Note: The wrapped version will only accept the object format

defined by the codecs, i.e. Unicode objects for most builtin

codecs. Output is also codec dependent and will usually be

Unicode as well.

Files are always opened in binary mode, even if no binary mode

was specified. This is done to avoid data loss due to encodings

using 8-bit values. The default file mode is 'rb' meaning to

open the file in binary read mode.

encoding specifies the encoding which is to be used for the

file.

errors may be given to define the error handling. It defaults

to 'strict' which causes ValueErrors to be raised in case an

encoding error occurs.

buffering has the same meaning as for the builtin open() API.

It defaults to line buffered.

The returned wrapped file object provides an extra attribute

.encoding which allows querying the used encoding. This

attribute is only available if an encoding was specified as

parameter.

"""

if encoding is not None:

if 'U' in mode:

# No automatic conversion of '\n' is done on reading and writing

mode = mode.strip().replace('U', '')

if mode[:1] not in set('rwa'):

mode = 'r' + mode

if 'b' not in mode:

# Force opening of the file in binary mode

mode = mode + 'b'

file = __builtin__.open(filename, mode, buffering)

if encoding is None:

return file

info = lookup(encoding)

srw = StreamReaderWriter(file, info.streamreader, info.streamwriter, errors)

# Add attributes to simplify introspection

srw.encoding = encoding

return srw

如您所见,如果encoding为None,它只返回打开的文件。在

以下是您的文件,每个字节以十进制表示,并显示其相应的ascii字符:

^{pr2}$

在ascii中打开它时遇到的问题是十进制值为180的字节。Ascii码最多只能达到127。所以这让我想到这一定是某种扩展的ascii,128-255用于额外的符号。在仔细阅读了wikipedia关于ascii(https://en.wikipedia.org/wiki/ASCII)的文章之后,它提到了一个流行的ascii扩展名windows-1252。在windows-1252中,十进制值180映射到锐音符(')。然后我决定搜索你文件中的字符串,看看它实际上与什么相关。这是我发现“哈佛杯30周年”http://www.365chess.com/tournaments/Harvard_Cup_30%C2%B4_1989/21650

所以在夏天,正确的编码方式可能是windows-1252。这是我的测试程序:import codecs

with codecs.open('f.txt', 'r', encoding='windows-1252') as f:

print f.read()

输出... 0-1

[Event "Harvard Cup 30´"]

...

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值