python内建函数import-Python3中内建函数open()的一些细节

最新推荐文章于 2024-03-27 09:10:29 发布

weixin_39760650

最新推荐文章于 2024-03-27 09:10:29 发布

阅读量589

点赞数

Python中许多内建函数（built-in Function）都是由C语言写成的（我这里也不太确定，但Python中是找不到内建函数的源码的，在CPython中能够找到具体的C实现的内建函数），其源码在cpython下的Python/bltinmodule.c中，这里还能注意到的是在Python（非CPython）中，如Python34/include文件夹下有bltinmodule.h文件，但找不到bltinmodule.c。

Python2.7下open():open(name[, mode[, buffering]])name: name is the file name to be opened

mode: mode is a string indicating how the file is to be opened

buffering: The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used.

其C实现源码：static PyObject *

builtin_open(PyObject *self, PyObject *args, PyObject *kwds)

{

return PyObject_Call((PyObject*)&PyFile_Type, args, kwds);

}

后面的这步调用我没找到具体位置，但可以肯定open()直接用C实现了。

Python3.4下open():open(file, mode="r", buffering=-1, encoding=None,

errors=None, newline=None, closefd=True, opener=None)

这里不仔细介绍每个参数了，需要注意这个内建函数open()实际上是io.open()。

在io库下实际也有介绍：io.open(file, mode="r", buffering=-1, encoding=None, errors=None,

newline=None, closefd=True, opener=None)This is an alias for the builtin open() function.

自然而然，在C源码中找不到builtin_open()了。

对比两个版本的open()可以发现，3.4版本的open()相比2.7而言增加了相当多的新功能，尤其是encoding加入到了open()中，这样在处理UTF-8时不需要像在Python2中每次都要import codecs了。也可以猜想，当open()用python而不是C实现，其速度肯定会慢下来。

下面看看Python3中的open()实现

open()源码在Python34/Lib/_pyio.py中：def open(file, mode="r", buffering=-1, encoding=None, errors=None,

newline=None, closefd=True, opener=None):

if not isinstance(file, (str, bytes, int)):

raise TypeError("invalid file: %r" % file)

if not isinstance(mode, str):

raise TypeError("invalid mode: %r" % mode)

if not isinstance(buffering, int):

raise TypeError("invalid buffering: %r" % buffering)

if encoding is not None and not isinstance(encoding, str):

raise TypeError("invalid encoding: %r" % encoding)

if errors is not None and not isinstance(errors, str):

raise TypeError("invalid errors: %r" % errors)

modes = set(mode)

if modes - set("axrwb+tU") or len(mode) > len(modes):

raise ValueError("invalid mode: %r" % mode)

creating = "x" in modes

reading = "r" in modes

writing = "w" in modes

appending = "a" in modes

updating = "+" in modes

text = "t" in modes

binary = "b" in modes

if "U" in modes:

if creating or writing or appending:

raise ValueError("can"t use U and writing mode at once")

import warnings

warnings.warn(""U" mode is deprecated",

DeprecationWarning, 2)

reading = True

if text and binary:

raise ValueError("can"t have text and binary mode at once")

if creating + reading + writing + appending > 1:

raise ValueError("can"t have read/write/append mode at once")

if not (creating or reading or writing or appending):

raise ValueError("must have exactly one of read/write/append mode")

if binary and encoding is not None:

raise ValueError("binary mode doesn"t take an encoding argument")

if binary and errors is not None:

raise ValueError("binary mode doesn"t take an errors argument")

if binary and newline is not None:

raise ValueError("binary mode doesn"t take a newline argument")

raw = FileIO(file,

(creating and "x" or "") +

(reading and "r" or "") +

(writing and "w" or "") +

(appending and "a" or "") +

(updating and "+" or ""),

closefd, opener=opener)

result = raw

try:

line_buffering = False

if buffering == 1 or buffering < 0 and raw.isatty():

buffering = -1

line_buffering = True

if buffering < 0:

buffering = DEFAULT_BUFFER_SIZE

try:

bs = os.fstat(raw.fileno()).st_blksize

except (OSError, AttributeError):

pass

else:

if bs > 1:

buffering = bs

if buffering < 0:

raise ValueError("invalid buffering size")

if buffering == 0:

if binary:

return result

raise ValueError("can"t have unbuffered text I/O")

if updating:

buffer = BufferedRandom(raw, buffering)

elif creating or writing or appending:

buffer = BufferedWriter(raw, buffering)

elif reading:

buffer = BufferedReader(raw, buffering)

else:

raise ValueError("unknown mode: %r" % mode)

result = buffer

if binary:

return result

text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)

result = text

text.mode = mode

return result

except:

result.close()

raise

需要注意的是这一段：raw = FileIO(file,

(creating and "x" or "") +

(reading and "r" or "") +

(writing and "w" or "") +

(appending and "a" or "") +

(updating and "+" or ""),

closefd, opener=opener)

result = raw

我没有找到FileIO的源码，但感觉FileIO()就是用C写成的，如果我的猜想没错的话，open()的大部分新功能实际上还是交给了各个Python模块来共同实现。

注意到这一段：if binary:

return result

text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)

对其进行encoding等处理是交给了TextIOWrapper，TextIOWrapper是一个类（class），关于其encoding部分有下面两个方法：def _get_encoder(self):

make_encoder = codecs.getincrementalencoder(self._encoding)

self._encoder = make_encoder(self._errors)

return self._encoderdef _get_decoder(self):

make_decoder = codecs.getincrementaldecoder(self._encoding)

decoder = make_decoder(self._errors)

if self._readuniversal:

decoder = IncrementalNewlineDecoder(decoder, self._readtranslate)

self._decoder = decoder

return decoder

看到codecs就豁然开朗了

总的来说，Python2中open()直接用C实现，速度上快了不少，但实现的功能很少；需要更多功能可以利用codecs库。但在Python3中，相当于取消了“低端”版本的open()，似乎将codecs版和内建版融合；如果调用open()时不使用那些新参数，大概还是能认为是C实现的，但如果用到了新参数，就相当于在用codecs处理，只是省去了import codecs而已。

weixin_39760650

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python内建函数import-Python3中内建函数open()的一些细节

Python中许多内建函数（built-in Function）都是由C语言写成的（我这里也不太确定，但Python中是找不到内建函数的源码的，在CPython中能够找到具体的C实现的内建函数），其源码在cpython下的Python/bltinmodule.c中，这里还能注意到的是在Python（非CPython）中，如Python34/include文件夹下有bltinmodule.h文件，但...
复制链接

扫一扫

python内建函数import-Python3中内建函数open()的一些细节

“相关推荐”对你有帮助么？