python内建函数import-Python3中内建函数open()的一些细节

Python中许多内建函数(built-in Function)都是由C语言写成的(我这里也不太确定,但Python中是找不到内建函数的源码的,在CPython中能够找到具体的C实现的内建函数),其源码在cpython下的Python/bltinmodule.c中,这里还能注意到的是在Python(非CPython)中,如Python34/include文件夹下有bltinmodule.h文件,但找不到bltinmodule.c。

Python2.7下open():open(name[, mode[, buffering]])name: name is the file name to be opened

mode: mode is a string indicating how the file is to be opened

buffering: The optional buffering argument specifies the file’s desired buffer size: 0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used.

其C实现源码:static PyObject *

builtin_open(PyObject *self, PyObject *args, PyObject *kwds)

{

return PyObject_Call((PyObject*)&PyFile_Type, args, kwds);

}

后面的这步调用我没找到具体位置,但可以肯定open()直接用C实现了。

Python3.4下open():open(file, mode="r", buffering=-1, encoding=None,

errors=None, newline=None, closefd=True, opener=None)

这里不仔细介绍每个参数了,需要注意这个内建函数open()实际上是io.open()。

在io库下实际也有介绍:io.open(file, mode="r", buffering=-1, encoding=None, errors=None,

newline=None, closefd=True, opener=None)This is an alias for the builtin open() function.

自然而然,在C源码中找不到builtin_open()了。

对比两个版本的open()可以发现,3.4版本的open()相比2.7而言增加了相当多的新功能,尤其是encoding加入到了open()中,这样在处理UTF-8时不需要像在Python2中每次都要import codecs了。也可以猜想,当open()用python而不是C实现,其速度肯定会慢下来。

下面看看Python3中的open()实现

open()源码在Python34/Lib/_pyio.py中:def open(file, mode="r", buffering=-1, encoding=None, errors=None,

newline=None, closefd=True, opener=None):

if not isinstance(file, (str, bytes, int)):

raise TypeError("invalid file: %r" % file)

if not isinstance(mode, str):

raise TypeError("invalid mode: %r" % mode)

if not isinstance(buffering, int):

raise TypeError("invalid buffering: %r" % buffering)

if encoding is not None and not isinstance(encoding, str):

raise TypeError("invalid encoding: %r" % encoding)

if errors is not None and not isinstance(errors, str):

raise TypeError("invalid errors: %r" % errors)

modes = set(mode)

if modes - set("axrwb+tU") or len(mode) > len(modes):

raise ValueError("invalid mode: %r" % mode)

creating = "x" in modes

reading = "r" in modes

writing = "w" in modes

appending = "a" in modes

updating = "+" in modes

text = "t" in modes

binary = "b" in modes

if "U" in modes:

if creating or writing or appending:

raise ValueError("can"t use U and writing mode at once")

import warnings

warnings.warn(""U" mode is deprecated",

DeprecationWarning, 2)

reading = True

if text and binary:

raise ValueError("can"t have text and binary mode at once")

if creating + reading + writing + appending > 1:

raise ValueError("can"t have read/write/append mode at once")

if not (creating or reading or writing or appending):

raise ValueError("must have exactly one of read/write/append mode")

if binary and encoding is not None:

raise ValueError("binary mode doesn"t take an encoding argument")

if binary and errors is not None:

raise ValueError("binary mode doesn"t take an errors argument")

if binary and newline is not None:

raise ValueError("binary mode doesn"t take a newline argument")

raw = FileIO(file,

(creating and "x" or "") +

(reading and "r" or "") +

(writing and "w" or "") +

(appending and "a" or "") +

(updating and "+" or ""),

closefd, opener=opener)

result = raw

try:

line_buffering = False

if buffering == 1 or buffering < 0 and raw.isatty():

buffering = -1

line_buffering = True

if buffering < 0:

buffering = DEFAULT_BUFFER_SIZE

try:

bs = os.fstat(raw.fileno()).st_blksize

except (OSError, AttributeError):

pass

else:

if bs > 1:

buffering = bs

if buffering < 0:

raise ValueError("invalid buffering size")

if buffering == 0:

if binary:

return result

raise ValueError("can"t have unbuffered text I/O")

if updating:

buffer = BufferedRandom(raw, buffering)

elif creating or writing or appending:

buffer = BufferedWriter(raw, buffering)

elif reading:

buffer = BufferedReader(raw, buffering)

else:

raise ValueError("unknown mode: %r" % mode)

result = buffer

if binary:

return result

text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)

result = text

text.mode = mode

return result

except:

result.close()

raise

需要注意的是这一段:raw = FileIO(file,

(creating and "x" or "") +

(reading and "r" or "") +

(writing and "w" or "") +

(appending and "a" or "") +

(updating and "+" or ""),

closefd, opener=opener)

result = raw

我没有找到FileIO的源码,但感觉FileIO()就是用C写成的,如果我的猜想没错的话,open()的大部分新功能实际上还是交给了各个Python模块来共同实现。

注意到这一段:if binary:

return result

text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)

对其进行encoding等处理是交给了TextIOWrapper,TextIOWrapper是一个类(class),关于其encoding部分有下面两个方法:def _get_encoder(self):

make_encoder = codecs.getincrementalencoder(self._encoding)

self._encoder = make_encoder(self._errors)

return self._encoderdef _get_decoder(self):

make_decoder = codecs.getincrementaldecoder(self._encoding)

decoder = make_decoder(self._errors)

if self._readuniversal:

decoder = IncrementalNewlineDecoder(decoder, self._readtranslate)

self._decoder = decoder

return decoder

看到codecs就豁然开朗了

总的来说,Python2中open()直接用C实现,速度上快了不少,但实现的功能很少;需要更多功能可以利用codecs库。但在Python3中,相当于取消了“低端”版本的open(),似乎将codecs版和内建版融合;如果调用open()时不使用那些新参数,大概还是能认为是C实现的,但如果用到了新参数,就相当于在用codecs处理,只是省去了import codecs而已。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值