它基本上不会做任何透明的编码/解码-它只是打开文件并返回它。在
这是图书馆的代码:def open(filename, mode='rb', encoding=None, errors='strict', buffering=1):
""" Open an encoded file using the given mode and return
a wrapped version providing transparent encoding/decoding.
Note: The wrapped version will only accept the object format
defined by the codecs, i.e. Unicode objects for most builtin
codecs. Output is also codec dependent and will usually be
Unicode as well.
Files are always opened in binary mode, even if no binary mode
was specified. This is done to avoid data loss due to encodings
using 8-bit values. The default file mode is 'rb' meaning to
open the file in binary read mode.
encoding specifies the encoding which is to be used for the
file.
errors may be given to define the error handling. It defaults
to 'strict' which causes ValueErrors to be raised in case an
encoding error occurs.
buffering has the same meaning as for the builtin open() API.
It defaults to line buffered.
The returned wrapped file object provides an extra attribute
.encoding which allows querying the used encoding. This
attribute is only available if an encoding was specified as
parameter.
"""
if encoding is not None:
if 'U' in mode:
# No automatic conversion of '\n' is done on reading and writing
mode = mode.strip().replace('U', '')
if mode[:1] not in set('rwa'):
mode = 'r' + mode
if 'b' not in mode:
# Force opening of the file in binary mode
mode = mode + 'b'
file = __builtin__.open(filename, mode, buffering)
if encoding is None:
return file
info = lookup(encoding)
srw = StreamReaderWriter(file, info.streamreader, info.streamwriter, errors)
# Add attributes to simplify introspection
srw.encoding = encoding
return srw
如您所见,如果encoding为None,它只返回打开的文件。在
以下是您的文件,每个字节以十进制表示,并显示其相应的ascii字符:
^{pr2}$
在ascii中打开它时遇到的问题是十进制值为180的字节。Ascii码最多只能达到127。所以这让我想到这一定是某种扩展的ascii,128-255用于额外的符号。在仔细阅读了wikipedia关于ascii(https://en.wikipedia.org/wiki/ASCII)的文章之后,它提到了一个流行的ascii扩展名windows-1252。在windows-1252中,十进制值180映射到锐音符(')。然后我决定搜索你文件中的字符串,看看它实际上与什么相关。这是我发现“哈佛杯30周年”http://www.365chess.com/tournaments/Harvard_Cup_30%C2%B4_1989/21650
所以在夏天,正确的编码方式可能是windows-1252。这是我的测试程序:import codecs
with codecs.open('f.txt', 'r', encoding='windows-1252') as f:
print f.read()
输出... 0-1
[Event "Harvard Cup 30´"]
...