Python中许多内建函数(built-in Function)都是由C语言写成的(我这里也不太确定,但Python中是找不到内建函数的源码的,在CPython中能够找到具体的C实现的内建函数),其源码在cpython下的Python/bltinmodule.c中,这里还能注意到的是在Python(非CPython)中,如Python34/include文件夹下有bltinmodule.h文件,但找不到bltinmodule.c。
Python2.7下open():open(name[, mode[, buffering]])name: name is the file name to be opened
mode: mode is a string indicating how the file is to be opened
buffering: The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used.
其C实现源码:static PyObject *
builtin_open(PyObject *self, PyObject *args, PyObject *kwds)
{
return PyObject_Call((PyObject*)&PyFile_Type, args, kwds);
}
后面的这步调用我没找到具体位置,但可以肯定open()直接用C实现了。
Python3.4下open():open(file, mode="r", buffering=-1, encoding=None,
errors=None, newline=None, closefd=True, opener=None)
这里不仔细介绍每个参数了,需要注意这个内建函数open()实际上是io.open()。
在io库下实际也有介绍:io.open(file, mode="r", buffering=-1, encoding=None, errors=None,
newline=None, closefd=True, opener=None)This is an alias for the builtin open() function.
自然而然,在C源码中找不到builtin_open()了。
对比两个版本的open()可以发现,3.4版本的open()相比2.7而言增加了相当多的新功能,尤其是encoding加入到了open()中,这样在处理UTF-8时不需要像在Python2中每次都要import codecs了。也可以猜想,当open()用python而不是C实现,其速度肯定会慢下来。
下面看看Python3中的open()实现
open()源码在Python34/Lib/_pyio.py中:def open(file, mode="r", buffering=-1, encoding=None, errors=None,
newline=None, closefd=True, opener=None):
if not isinstance(file, (str, bytes, int)):
raise TypeError("invalid file: %r" % file)
if not isinstance(mode, str):
raise TypeError("invalid mode: %r" % mode)
if not isinstance(buffering, int):
raise TypeError("invalid buffering: %r" % buffering)
if encoding is not None and not isinstance(encoding, str):
raise TypeError("invalid encoding: %r" % encoding)
if errors is not None and not isinstance(errors, str):
raise TypeError("invalid errors: %r" % errors)
modes = set(mode)
if modes - set("axrwb+tU") or len(mode) > len(modes):
raise ValueError("invalid mode: %r" % mode)
creating = "x" in modes
reading = "r" in modes
writing = "w" in modes
appending = "a" in modes
updating = "+" in modes
text = "t" in modes
binary = "b" in modes
if "U" in modes:
if creating or writing or appending:
raise ValueError("can"t use U and writing mode at once")
import warnings
warnings.warn(""U" mode is deprecated",
DeprecationWarning, 2)
reading = True
if text and binary:
raise ValueError("can"t have text and binary mode at once")
if creating + reading + writing + appending > 1:
raise ValueError("can"t have read/write/append mode at once")
if not (creating or reading or writing or appending):
raise ValueError("must have exactly one of read/write/append mode")
if binary and encoding is not None:
raise ValueError("binary mode doesn"t take an encoding argument")
if binary and errors is not None:
raise ValueError("binary mode doesn"t take an errors argument")
if binary and newline is not None:
raise ValueError("binary mode doesn"t take a newline argument")
raw = FileIO(file,
(creating and "x" or "") +
(reading and "r" or "") +
(writing and "w" or "") +
(appending and "a" or "") +
(updating and "+" or ""),
closefd, opener=opener)
result = raw
try:
line_buffering = False
if buffering == 1 or buffering < 0 and raw.isatty():
buffering = -1
line_buffering = True
if buffering < 0:
buffering = DEFAULT_BUFFER_SIZE
try:
bs = os.fstat(raw.fileno()).st_blksize
except (OSError, AttributeError):
pass
else:
if bs > 1:
buffering = bs
if buffering < 0:
raise ValueError("invalid buffering size")
if buffering == 0:
if binary:
return result
raise ValueError("can"t have unbuffered text I/O")
if updating:
buffer = BufferedRandom(raw, buffering)
elif creating or writing or appending:
buffer = BufferedWriter(raw, buffering)
elif reading:
buffer = BufferedReader(raw, buffering)
else:
raise ValueError("unknown mode: %r" % mode)
result = buffer
if binary:
return result
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
result = text
text.mode = mode
return result
except:
result.close()
raise
需要注意的是这一段:raw = FileIO(file,
(creating and "x" or "") +
(reading and "r" or "") +
(writing and "w" or "") +
(appending and "a" or "") +
(updating and "+" or ""),
closefd, opener=opener)
result = raw
我没有找到FileIO的源码,但感觉FileIO()就是用C写成的,如果我的猜想没错的话,open()的大部分新功能实际上还是交给了各个Python模块来共同实现。
注意到这一段:if binary:
return result
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
对其进行encoding等处理是交给了TextIOWrapper,TextIOWrapper是一个类(class),关于其encoding部分有下面两个方法:def _get_encoder(self):
make_encoder = codecs.getincrementalencoder(self._encoding)
self._encoder = make_encoder(self._errors)
return self._encoderdef _get_decoder(self):
make_decoder = codecs.getincrementaldecoder(self._encoding)
decoder = make_decoder(self._errors)
if self._readuniversal:
decoder = IncrementalNewlineDecoder(decoder, self._readtranslate)
self._decoder = decoder
return decoder
看到codecs就豁然开朗了
总的来说,Python2中open()直接用C实现,速度上快了不少,但实现的功能很少;需要更多功能可以利用codecs库。但在Python3中,相当于取消了“低端”版本的open(),似乎将codecs版和内建版融合;如果调用open()时不使用那些新参数,大概还是能认为是C实现的,但如果用到了新参数,就相当于在用codecs处理,只是省去了import codecs而已。