0. 参考文档
参考文档如下:
- python官方文档代码对象: https://docs.python.org/3/c-api/code.html
- python官方文档inspect介绍:https://docs.python.org/3/library/inspect.html
- Github托管的源码: https://github.com/python/cpython/blob/main/Lib/opcode.py
- Github托管的源码: https://github.com/python/cpython/blob/main/Include/opcode.h
- Code Objects: https://nanguage.gitbook.io/inside-python-vm-cn/5.-code-objects
- Python 中的代码对象 code object 说明: https://blog.csdn.net/jpch89/article/details/86764245
- Python的Opcodes的说明: https://unpyc.sourceforge.net/Opcodes.html
1. 了解Code Object对象
在python官方文档中,对python的代码对象的解释:
Code objects are a low-level detail of the CPython implementation. Each one represents a chunk of executable code that hasn’t yet been bound into a function.
中文版本:
代码对象是 CPython 实现的低级细节。 每个代表一块尚未绑定到函数中的可执行代码。
可以看到官网的描述实在是不知所云,下面我们通过实际的例子来说明python的代码对象是什么。
1.1. 简单的示例代码
为了介绍今天介绍的主角Code Object
,我们先看一个示例代码:
def func():
pass
print(type(func))
print(func.__code__) # 重点代码
输出的结果:
<class 'function'>
<code object func at 0x7fc6277269d0, file "/home/xd/project/learn_python/test/test.py", line 1>
示例代码中__code__
属性输出的code object
就是我们要介绍的对象。
1.2. 学习code object
的官方文档
我们可以在python的官方文档中看到与code object
有关的介绍:
python官方inspect文档:https://docs.python.org/3/library/inspect.html
我将其中有关code object
的部分摘录如下:
每一个属性的后面都有关于其含义的解释,这是最权威的介绍,需要时候可以仔细查看。
1.3. 实际验证code object
的属性
我们可以通过dir()
的方式将属性打印出来,如下:
可以看到有相当多的以“co_”开头的属性名称, 这些是我们需要关注的重点。
2. 介绍code object
的各个属性
2.1. 查看python的code object的各个属性:
我们编写一个简单的示例函数,然后将以“co_”开头的属性名称与值打印出来,示例代码如下:
def func(a, b=3, *args, **kwargs):
c = a + b
mm = 111
str_mm = "test test"
print(a + b + mm)
return c
for attr in dir(func.__code__):
if attr.startswith('co_'):
print(f"{attr}:\t{getattr(func.__code__, attr)}")
执行的结果:
co_argcount: 2
co_cellvars: ()
co_code: b'|\x00|\x01\x17\x00}\x04d\x01}\x05d\x02}\x06t\x00|\x00|\x01\x17\x00|\x05\x17\x00\x83\x01\x01\x00|\x04S\x00'
co_consts: (None, 111, 'test test')
co_filename: /home/xd/project/learn_python/test/test.py
co_firstlineno: 1
co_flags: 79
co_freevars: ()
co_kwonlyargcount: 0
co_lnotab: b'\x00\x01\x08\x01\x04\x01\x04\x01\x10\x01'
co_name: func
co_names: ('print',)
co_nlocals: 7
co_posonlyargcount: 0
co_stacksize: 3
co_varnames: ('a', 'b', 'args', 'kwargs', 'c', 'mm', 'str_mm')
这些属性有些好理解,有些不好理解,在下面的例子中我们进行了分类。
2.2. 验证python的code object的各个属性
这部分参考视频资料: https://www.bilibili.com/video/BV12i4y1C7MH/
这个例子中的示例代码与上面的代码几乎一样, 只是将打印的内容进行了分类,并增加了官方说明, 如下:
def func(a, b=3, *args, **kwargs):
c = a + b
mm = 111
str_mm = "test test"
print(a + b + mm)
return c
code = func.__code__
print(f"{code.co_code = }") # string of raw compiled bytecode
print(f"{len(code.co_code) = }")
print(f"{code.co_name = }") # name with which this code object was defined
# co_filename: name of file in which this code object was created
print(f"{code.co_filename = }")
# co_lnotab: encoded mapping of line numbers to bytecode indices
print(f"{code.co_lnotab = }")
# co_flags: bitmap of CO_* flags, read more:
# https://docs.python.org/3/library/inspect.html#inspect-module-co-flags
print(f"{code.co_flags = }")
print(f"{code.co_stacksize = }") # virtual machine stack space required
# number of arguments (not including keyword only arguments, * or ** args)
print(f"{code.co_argcount = }")
# co_posonlyargcount: number of positional only arguments
print(f"{code.co_posonlyargcount = }")
# co_kwonlyargcount: number of keyword only arguments (not including ** arg)
print(f"{code.co_kwonlyargcount = }")
print(f"{code.co_nlocals = }") # number of local variables
# co_varnames: tuple of names of arguments and local variables
print(f"{code.co_varnames = }")
# co_names: tuple of names other than arguments and function locals
print(f"{code.co_names = }")
# co_cellvars: tuple of names of cell variables (referenced by containing scopes)
print(f"{code.co_cellvars = }")
# co_freevars: tuple of names of free variables (referenced via a function’s closure)
print(f"{code.co_freevars = }")
print(f"{code.co_consts = }") # tuple of constants used in the bytecode
执行效果(这里为了显示效果,把注释删除了):
2.3. 从cpython源码中分析字节码对象
上面都是通过实测(以及通过官方文档)进行说明字节码对象有哪些属性,如果我们想要从源头上确认这点。我们深入到Cpython源码中,查看字节码的定义。
说明:下面所有的代码都摘录自: cpython源码中3.8分支的代码; 不同分支中的c代码实现可能不同
python字节码对象的结构体定义在文件Include/code.h
中,如下:
/* Bytecode object */
typedef struct {
PyObject_HEAD
int co_argcount; /* #arguments, except *args */
int co_posonlyargcount; /* #positional only arguments */
int co_kwonlyargcount; /* #keyword only arguments */
int co_nlocals; /* #local variables */
int co_stacksize; /* #entries needed for evaluation stack */
int co_flags; /* CO_..., see below */
int co_firstlineno; /* first source line number */
PyObject *co_code; /* instruction opcodes */
PyObject *co_consts; /* list (constants used) */
PyObject *co_names; /* list of strings (names used) */
PyObject *co_varnames; /* tuple of strings (local variable names) */
PyObject *co_freevars; /* tuple of strings (free variable names) */
PyObject *co_cellvars; /* tuple of strings (cell variable names) */
/* The rest aren't used in either hash or comparisons, except for co_name,
used in both. This is done to preserve the name and line number
for tracebacks and debuggers; otherwise, constant de-duplication
would collapse identical functions/lambdas defined on different lines.
*/
Py_ssize_t *co_cell2arg; /* Maps cell vars which are arguments. */
PyObject *co_filename; /* unicode (where it was loaded from) */
PyObject *co_name; /* unicode (name, for reference) */
PyObject *co_lnotab; /* string (encoding addr<->lineno mapping) See
Objects/lnotab_notes.txt for details. */
void *co_zombieframe; /* for optimization only (see frameobject.c) */
PyObject *co_weakreflist; /* to support weakrefs to code objects */
/* Scratch space for extra data relating to the code object.
Type is a void* to keep the format private in codeobject.c to force
people to go through the proper APIs. */
void *co_extra;
/* Per opcodes just-in-time cache
*
* To reduce cache size, we use indirect mapping from opcode index to
* cache object:
* cache = co_opcache[co_opcache_map[next_instr - first_instr] - 1]
*/
// co_opcache_map is indexed by (next_instr - first_instr).
// * 0 means there is no cache for this opcode.
// * n > 0 means there is cache in co_opcache[n-1].
unsigned char *co_opcache_map;
_PyOpcache *co_opcache;
int co_opcache_flag; // used to determine when create a cache.
unsigned char co_opcache_size; // length of co_opcache.
} PyCodeObject;
我们熟悉的co_argcount
, co_posonlyargcount
, co_kwonlyargcount
, co_code
等等在这里均有定义, 毕竟这里才是最权威的嘛。
3. 对比code object的co_code与反汇编代码
在上面的示例代码中稍加改动:
import dis
def func(a, b=3, *args, **kwargs):
c = a + b
mm = 111
str_mm = "test test"
print(a + b + mm)
return c
code = func.__code__
print(f"{len(code.co_code) = }")
print(f"{'*' * 90}")
print(f"{code.co_code}")
print(f"{'*' * 90}")
print(f"{dis.dis(func)}")
执行效果:
可以看到:
代码对象的
co_code
属性直接读取是极为不便的,通常我们会使用dis.dis()
将其反汇编出来再阅读,
这也是dis.dis()
的主要应用场
说明:
dis
模块包含在python标准库中,提供了反汇编的功能,以便我们更容易阅读python的字节码。
如果想要了解详细信息,请参考我单独写的文章。