python 运行时环境
运行环境是一个全局的概念,而执行环境就是指栈帧
当运行时环境已经准备好的时候,执行第一行代码的函数就是
PyEval_EvalFrame 函数
PyObject *
PyEval_EvalFrame(PyFrameObject *f) {
/* This is for backward compatibility with extension modules that
used this API; core interpreter code should call
PyEval_EvalFrameEx() */
return PyEval_EvalFrameEx(f, 0);
}
PyObject *
PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
{
PyThreadState *tstate = PyThreadState_GET();
return tstate->interp->eval_frame(f, throwflag);
}
PyInterpreterState *
PyInterpreterState_New(void)
{
//...
interp->eval_frame = _PyEval_EvalFrameDefault;
//...
}
PyObject *
_PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
{
//...
co = f->f_code;
names = co->co_names;
consts = co->co_consts;
fastlocals = f->f_localsplus;
freevars = f->f_localsplus + co->co_nlocals;
first_instr = (_Py_CODEUNIT *) PyBytes_AS_STRING(co->co_code);
//...
next_instr = first_instr;
if (f->f_lasti >= 0) {
next_instr += f->f_lasti / sizeof(_Py_CODEUNIT) + 1;
}
stack_pointer = f->f_stacktop;
f->f_stacktop = NULL; /* remains NULL unless yield suspends frame */
}
PyCodeObject 对象的 co_code 域中保存着字节码和字节指令参数,
co_code 是一个 PyStringObject,而其中的字符数组保存了真正的指令。
Python 虚拟机的执行过程就是从 co_code 中
- 取指令
- 执行
- 回到 1
其中 fist_instr 指向第一条指令, next_instr 指向下一条指令, f_lasti 指向上一条
已经执行的指令的位置。
for (;;) {
if (_Py_atomic_load_relaxed(&eval_breaker)) {
if (_Py_OPCODE(*next_instr) == SETUP_FINALLY ||
_Py_OPCODE(*next_instr) == YIELD_FROM) {
goto fast_next_opcode;
}
if (_Py_atomic_load_relaxed(&pendingcalls_to_do)) {
if (Py_MakePendingCalls() < 0)
goto error;
}
#ifdef WITH_THREAD
if (_Py_atomic_load_relaxed(&gil_drop_request)) {
/* Give another thread a chance */
if (PyThreadState_Swap(NULL) != tstate)
Py_FatalError("ceval: tstate mix-up");
drop_gil(tstate);
/* Other threads may run now */
take_gil(tstate);
/* Check if we should make a quick exit. */
if (_Py_Finalizing && _Py_Finalizing != tstate) {
drop_gil(tstate);
PyThread_exit_thread();
}
if (PyThreadState_Swap(tstate) != NULL)
Py_FatalError("ceval: orphan tstate");
}
#endif
/* Check for asynchronous exceptions. */
if (tstate->async_exc != NULL) {
PyObject *exc = tstate->async_exc;
tstate->async_exc = NULL;
UNSIGNAL_ASYNC_EXC();
PyErr_SetNone(exc);
Py_DECREF(exc);
goto error;
}
}
fast_next_opcode:
f->f_lasti = INSTR_OFFSET();
NEXTOPARG();
switch (opcode) {
TARGET(NOP)
FAST_DISPATCH();
TARGET(LOAD_FAST) {
PyObject *value = GETLOCAL(oparg);
if (value == NULL) {
format_exc_check_arg(PyExc_UnboundLocalError,
UNBOUNDLOCAL_ERROR_MSG,
PyTuple_GetItem(co->co_varnames, oparg));
goto error;
}
Py_INCREF(value);
PUSH(value);
FAST_DISPATCH();
}
PREDICTED(LOAD_CONST);
TARGET(LOAD_CONST) {
PyObject *value = GETITEM(consts, oparg);
Py_INCREF(value);
PUSH(value);
FAST_DISPATCH();
}
#....
更进一步,python 在获得一条指令和其需要的参数时候,从 switch 中找到匹配的 case
, 具体 case 就是对该指令的具体实现。 执行 case 中的执行,之后继续循环。
不管程序执行成功或识别,返回值都保存在 why 中
整个执行过程就是一个 for 循环加 switch/case,整个指令执行都在 _PyEval_EvalFrameDefault 中
注:其中 why 就是 python 异常处理机制
线程与进程
通过 PyFrameObject 我们了解了执行一个函数的栈帧,通过 PyCodeObject
了解了代码段, 而代码执行的入口是 _PyEval_EvalFrameDefault 来执行代码,
但是,栈帧之外又是什么呢?了解操作系统执行,我们基本就知道是线程。
Python 通过线程模拟实际的物理 CPU 来执行指令
Python 的线程实际也是操作系统的物理线程,只是在上面封装了一层。
而线程一般都是依存于一个进程,在 Python 中一个进程是 PyInterpreterState 对象。
通常 Python 都是一个 PyInterpreterState 下面多个 PyThreadState,各个线程之间
共享一些资源,多个 PyThreadState 轮流使用一个字节码执行引擎。
PyThreadState 与 PyInterpreterState
一个 interpreter 中维护多个 PyThreadState
Python 中多线程之间的同步通过 GIL(Global Interpreter Lock)
typedef struct _is {
struct _is *next;
struct _ts *tstate_head; //线程的头指针
PyObject *modules;
PyObject *modules_by_index;
PyObject *sysdict;
PyObject *builtins;
PyObject *importlib;
PyObject *codec_search_path;
PyObject *codec_search_cache;
PyObject *codec_error_registry;
int codecs_initialized;
int fscodec_initialized;
#ifdef HAVE_DLOPEN
int dlopenflags;
#endif
PyObject *builtins_copy;
PyObject *import_func;
/* Initialized to PyEval_EvalFrameDefault(). */
_PyFrameEvalFunction eval_frame; //执行引擎
} PyInterpreterState;
typedef struct _ts {
/* See Python/ceval.c for comments explaining most fields */
struct _ts *prev; //上一个线程
struct _ts *next; //下一个线程
PyInterpreterState *interp; //所属进程
struct _frame *frame; //指向栈帧
int recursion_depth;
char overflowed; /* The stack has overflowed. Allow 50 more calls
to handle the runtime error. */
char recursion_critical; /* The current calls must not cause
a stack overflow. */
/* 'tracing' keeps track of the execution depth when tracing/profiling.
This is to prevent the actual trace/profile code from being recorded in
the trace/profile. */
int tracing;
int use_tracing;
Py_tracefunc c_profilefunc;
Py_tracefunc c_tracefunc;
PyObject *c_profileobj;
PyObject *c_traceobj;
PyObject *curexc_type;
PyObject *curexc_value;
PyObject *curexc_traceback;
PyObject *exc_type;
PyObject *exc_value;
PyObject *exc_traceback;
PyObject *dict; /* Stores per-thread state */
int gilstate_counter;
PyObject *async_exc; /* Asynchronous exception to raise */
long thread_id; /* Thread id where this tstate was created */
int trash_delete_nesting;
PyObject *trash_delete_later;
/* Called when a thread state is deleted normally, but not when it
* is destroyed after fork().
* Pain: to prevent rare but fatal shutdown errors (issue 18808),
* Thread.join() must wait for the join'ed thread's tstate to be unlinked
* from the tstate chain. That happens at the end of a thread's life,
* in pystate.c.
* The obvious way doesn't quite work: create a lock which the tstate
* unlinking code releases, and have Thread.join() wait to acquire that
* lock. The problem is that we _are_ at the end of the thread's life:
* if the thread holds the last reference to the lock, decref'ing the
* lock will delete the lock, and that may trigger arbitrary Python code
* if there's a weakref, with a callback, to the lock. But by this time
* _PyThreadState_Current is already NULL, so only the simplest of C code
* can be allowed to run (in particular it must not be possible to
* release the GIL).
* So instead of holding the lock directly, the tstate holds a weakref to
* the lock: that's the value of on_delete_data below. Decref'ing a
* weakref is harmless.
* on_delete points to _threadmodule.c's static release_sentinel() function.
* After the tstate is unlinked, release_sentinel is called with the
* weakref-to-lock (on_delete_data) argument, and release_sentinel releases
* the indirectly held lock.
*/
void (*on_delete)(void *);
void *on_delete_data;
PyObject *coroutine_wrapper;
int in_coroutine_wrapper;
/* Now used from PyInterpreterState, kept here for ABI
compatibility with PyThreadState */
Py_ssize_t _preserve_36_ABI_1;
freefunc _preserve_36_ABI_2[MAX_CO_EXTRA_USERS];
PyObject *async_gen_firstiter;
PyObject *async_gen_finalizer;
/* XXX signal handlers should also be here */
} PyThreadState;
如果你能画出 PyInterpreterState, PyThreadState, PyFrameObject 之间
的关系图,那么就对 Pyhton 的执行引擎有了一个宏观上的认识。
总结
首先,在线程中创建栈帧的时候,会将当期栈帧执行
back = tstate->frame
多个按照链表的方式组织起来,新的栈帧通过 back 访问之前的栈帧
一个进程包含多个线程,各个线程通过双向链表组织起来,每个线程轮流执行指令
开始执行的时候,找到对应的栈帧,调用 eval_frame 来执行栈中
的指令。