0. 背景
之前介绍了Python垃圾回收的简介,内存模型和回收机制,这里会详细讲述上一文回收机制中回收流程的具体实现。
1. 回收流程全景
输入参数generation
表示函数必须指定某一代,在这里我们称为”当代“;返回不可达对象的个数。
/* This is the main function. Read this to understand how the
* collection process works. */
static Py_ssize_t
collect(int generation)
{
//[1]: 初始化定义
//[2]: 将比当代处理的“代”更年轻的“代”的链表合并到当前“代”中
//[3]: 将当代对象中的引用计数抄一份到对象的Py_GC_Head中(gc_refs = ob_refcnt)
//[4]: 通过traverse机制更新对象中的引用计数,最终不可达对象的引用计数会被减为0,不为0即是root object
//[5]: 初始化一个链表叫unreachable,把不可达的对象(引用计数为0)移到该链表中
//[6]: 又初始化一个链表叫finalizers, 把链表unreachable中带有__del__的对象进一步分离到finalizers
//[7]: 将finalizers中的对象又设置为可达对象(gc_refs=GC_REACHABLE)
//[8]: debug信息,可以打印可回收对象
//[9]: 通知所有弱引用到unreachable对象的对象,如果弱引用对象仍然生存则放回
//[10]: 删除unreachable链表中的对象(垃圾),这会打破对象的循环引用
//[11]: debug信息,可以打印不可回收对象
//[12]: 把所有finalizers链表中的对象放到gc.garbage中
//[13]: 返回不可达对象个数
}
a. 步骤[1]
初始化定义了如下几个变量:
//[1]: 初始化定义
Py_ssize_t m = 0; /* # objects collected */
Py_ssize_t n = 0; /* # unreachable objects that couldn't be collected */
PyGC_Head *young; /* the generation we are examining */
PyGC_Head *old; /* next older generation */
PyGC_Head unreachable; /* non-problematic unreachable trash */
PyGC_Head finalizers; /* objects with, & reachable from, __del__ */
PyGC_Head *gc;
m和n之和其实就是不可达对象的个数,只是m个不可达对象会被回收,而n个不可达对象不能被回收。
young
, old
, unreachable
, finalizers
是四个指向PyGC_Head
的链表,具体解释参考代码中的注释。
b. 步骤[2]
第二个步骤是将比当代处理的“代”更年轻的“代”的链表合并到当前“代”中。
代码如下:
//[2]: 将比当代处理的“代”更年轻的“代”的链表合并到当前“代”中
/* merge younger generations with one we are currently collecting */
for (i = 0; i < generation; i++) {
gc_list_merge(GEN_HEAD(i), GEN_HEAD(generation));
}
这个for循环中即是在遍历所有比generation小的“代”,并通过
static void
gc_list_merge(PyGC_Head *from, PyGC_Head *to)
合并到一条链表上。具体代码如下:
/* append list `from` onto list `to`; `from` becomes an empty list */
static void
gc_list_merge(PyGC_Head *from, PyGC_Head *to)
{
PyGC_Head *tail;
assert(from != to);
if (!gc_list_is_empty(from)) {
tail = to->gc.gc_prev;
tail->gc.gc_next = from->gc.gc_next;
tail->gc.gc_next->gc.gc_prev = tail;
to->gc.gc_prev = from->gc.gc_prev;
to->gc.gc_prev->gc.gc_next = to;
}
gc_list_init(from);
}
这个就不细说了,其实就是典型的双向链表合并的代码。下图即表示了对可回收对象链表的合并操作。
图1. 两条链表合并图解
c. 步骤[3]
第三步是将当代对象中的引用计数抄一份到对象的Py_GC_Head
中
//[3]: 将当代对象中的引用计数抄一份到对象的Py_GC_Head中
update_refs(young);
具体代码:
/* Set all gc_refs = ob_refcnt. After this, gc_refs is > 0 for all objects
* in containers, and is GC_REACHABLE for all tracked gc objects not in
* containers.
*/
static void
update_refs(PyGC_Head *containers)
{
PyGC_Head *gc = containers->gc.gc_next;
for (; gc != containers; gc = gc->gc.gc_next) {
assert(gc->gc.gc_refs == GC_REACHABLE);
gc->gc.gc_refs = Py_REFCNT(FROM_GC(gc));
/* Python's cyclic gc should never see an incoming refcount
* of 0: if something decref'ed to 0, it should have been
* deallocated immediately at that time.
* Possible cause (if the assert triggers): a tp_dealloc
* routine left a gc-aware object tracked during its teardown
* phase, and did something-- or allowed something to happen --
* that called back into Python. gc can trigger then, and may
* see the still-tracked dying object. Before this assert
* was added, such mistakes went on to allow gc to try to
* delete the object again. In a debug build, that caused
* a mysterious segfault, when _Py_ForgetReference tried
* to remove the object from the doubly-linked list of all
* objects a second time. In a release build, an actual
* double deallocation occurred, which leads to corruption
* of the allocator's internal bookkeeping pointers. That's
* so serious that maybe this should be a release-build
* check instead of an assert?
*/
assert(gc->gc.gc_refs != 0);
}
}
为了寻找Root Object集合,这里提出有效引用计数的概念,除去循环引用而增加的计数,以此打破循环引用的问题。
Python并不改动真实的引用计数,而是改动引用计数的副本。对于副本无论做任何的改动,都不会影响到对象生命周期的维护,因为这个副本的唯一作用就是寻找root object集合。这个副本就是PyGC_Head
中的gc.gc_ref
。在垃圾收集的第一步,就是遍历可收集对象链表,将每个对象的gc.gc_ref
值设置为其ob_refcnt
值。
d. 步骤[4]
第四步是通过Python的traverse机制更新对象中的引用计数,最终不可达对象的引用计数会被减为0:
//通过Python的traverse机制更新对象中的引用计数
//,最终不可达对象的引用计数会被减为0
subtract_refs(young);
其实这步就是对gc_refs
进行操作,将循环引用从引用中摘除,获得有效引用计数,其源码如下:
/* Subtract internal references from gc_refs. After this, gc_refs is >= 0
* for all objects in containers, and is GC_REACHABLE for all tracked gc
* objects not in containers. The ones with gc_refs > 0 are directly
* reachable from outside containers, and so can't be collected.
*/
static void
subtract_refs(PyGC_Head *containers)
{
traverseproc traverse;
PyGC_Head *gc = containers->gc.gc_next;
for (; gc != containers; gc=gc->gc.gc_next) {
traverse = Py_TYPE(FROM_GC(gc))->tp_traverse;
(void) traverse(FROM_GC(gc),
(visitproc)visit_decref,
NULL);
}
}
其中的traverse
是与特定的PyObejct
对象相关的,在PyObejct
对象类型对象中定义。
一般来说,traverse
的动作都是遍历PyObejct
对象中的每一个引用,然后对引用进行某种动作,而这个动作在subtract_refs
中就是visit_decref
,它以一个回调函数的形式传递到traverse
操作中。
我们来看一下visit_decref
的代码:
/* A traversal callback for subtract_refs. */
static int
visit_decref(PyObject *op, void *data)
{
assert(op != NULL);
if (PyObject_IS_GC(op)) {
PyGC_Head *gc = AS_GC(op);
/* We're only interested in gc_refs for objects in the
* generation being collected, which can be recognized
* because only they have positive gc_refs.
*/
assert(gc->gc.gc_refs != 0); /* else refcount was too small */
if (gc->gc.gc_refs > 0)
gc->gc.gc_refs--;
}
return 0;
}
在完成了subtract_refs
之后,可收集对象链表中所有PyObject
对象之间的循环引用都被摘除了。这时,有一些PyObject
对象的PyGC_Head.gc.gc_refs
还不为0,这就意味着存在这些对象的外部引用,这些对象,就是标记清除的root object集合。
下图展示了例子经过update_refs
和subtract_refs
两步处理所得到的root object
集合。
图2. update_refs与subtract_refs图解
e. 步骤[5]
第五步是初始化一个链表叫unreachable
,把不可达的对象(引用计数为0)移到该链表中:
/* Leave everything reachable from outside young in young, and move
* everything else (in young) to unreachable.
* NOTE: This used to move the reachable objects into a reachable
* set instead. But most things usually turn out to be reachable,
* so it's more efficient to move the unreachable things.
*/
gc_list_init(&unreachable);
move_unreachable(young, &unreachable);
在成功找到root object
集合之后,我们就可以沿着集合出发,沿着全集(当代young
)的链表,一个接着一个标记不能回收的的内存,由于root object
的对象是不能回收的,收到凡是由root obejct
,所以被这些对象直接或是间接引用的内存对象也是不能回收的。
首先在遍历之前,我们需要维护一个unreachable
的链表,遍历当代young
,通过move_unreachable
函数将不可达的对象放到unreachable
链表中,接下来,我们只需对unreachable
链表进行垃圾回收即可。move_unreachable
函数源码如下:
/* Move the unreachable objects from young to unreachable. After this,
* all objects in young have gc_refs = GC_REACHABLE, and all objects in
* unreachable have gc_refs = GC_TENTATIVELY_UNREACHABLE. All tracked
* gc objects not in young or unreachable still have gc_refs = GC_REACHABLE.
* All objects in young after this are directly or indirectly reachable
* from outside the original young; and all objects in unreachable are
* not.
*/
static void
move_unreachable(PyGC_Head *young, PyGC_Head *unreachable)
{
PyGC_Head *gc = young->gc.gc_next;
/* Invariants: all objects "to the left" of us in young have gc_refs
* = GC_REACHABLE, and are indeed reachable (directly or indirectly)
* from outside the young list as it was at entry. All other objects
* from the original young "to the left" of us are in unreachable now,
* and have gc_refs = GC_TENTATIVELY_UNREACHABLE. All objects to the
* left of us in 'young' now have been scanned, and no objects here
* or to the right have been scanned yet.
*/
while (gc != young) {
PyGC_Head *next;
if (gc->gc.gc_refs) {
/* gc is definitely reachable from outside the
* original 'young'. Mark it as such, and traverse
* its pointers to find any other objects that may
* be directly reachable from it. Note that the
* call to tp_traverse may append objects to young,
* so we have to wait until it returns to determine
* the next object to visit.
*/
PyObject *op = FROM_GC(gc);
traverseproc traverse = Py_TYPE(op)->tp_traverse;
assert(gc->gc.gc_refs > 0);
gc->gc.gc_refs = GC_REACHABLE;
// [1]: visit_reachable,由root obejct开始,标志直接或间接引用的object为REACHABLE
(void) traverse(op,
(visitproc)visit_reachable,
(void *)young);
next = gc->gc.gc_next;
if (PyTuple_CheckExact(op)) {
_PyTuple_MaybeUntrack(op);
}
}
else {
/* This *may* be unreachable. To make progress,
* assume it is. gc isn't directly reachable from
* any object we've already traversed, but may be
* reachable from an object we haven't gotten to yet.
* visit_reachable will eventually move gc back into
* young if that's so, and we'll see it again.
*/
next = gc->gc.gc_next;
// [2]: Py_GC_Head.gc_refs设置为 GC_TENTATIVELY_UNREACHABLE,并把此元素A移到unreachable链表中
gc_list_move(gc, unreachable);
gc->gc.gc_refs = GC_TENTATIVELY_UNREACHABLE;
}
gc = next;
}
}
在move_unreachable
中,沿着young
链表向前,并检查元素的Py_GC_Head.gc_refs值。由于它并不是真正的由root object
开始,类似图论中的DFS或BFS的遍历方式,而是链表一直向前的遍历的方式,所以当遇到某个对象A的Py_GC_Head.gc_refs
为0,我们并不能立即判定这个对象A就是垃圾对象(unreachable
对象)。因为在这个对象A之后的young
链表上,也许还会遇到某个root object
,而这个root object
正好引用着对象A,因此在move_unreachable
函数代码[2]处,将其暂时设置为:Py_GC_Head.gc_refs = GC_TENTATIVELY_UNREACHABLE
,并把此元素A移到unreachable
链表中,但是通过代码[1]处,visit_reachable
函数,将被root obejct
直接或间接引用的该对象,标志为REACHABLE
,尽管该对象的refs被标记为GC_TENTATIVELY_UNREACHABLE
:
/* A traversal callback for move_unreachable. */
static int
visit_reachable(PyObject *op, PyGC_Head *reachable)
{
if (PyObject_IS_GC(op)) {
PyGC_Head *gc = AS_GC(op);
const Py_ssize_t gc_refs = gc->gc.gc_refs;
if (gc_refs == 0) {
/* This is in move_unreachable's 'young' list, but
* the traversal hasn't yet gotten to it. All
* we need to do is tell move_unreachable that it's
* reachable.
*/
gc->gc.gc_refs = 1;
}
//[3]: 遇到GC_TENTATIVELY_UNREACHABLE对象就将其移出`unreachable`链表
else if (gc_refs == GC_TENTATIVELY_UNREACHABLE) {
/* This had gc_refs = 0 when move_unreachable got
* to it, but turns out it's reachable after all.
* Move it back to move_unreachable's 'young' list,
* and move_unreachable will eventually get to it
* again.
*/
gc_list_move(gc, reachable);
gc->gc.gc_refs = 1;
}
/* Else there's nothing to do.
* If gc_refs > 0, it must be in move_unreachable's 'young'
* list, and move_unreachable will eventually get to it.
* If gc_refs == GC_REACHABLE, it's either in some other
* generation so we don't care about it, or move_unreachable
* already dealt with it.
* If gc_refs == GC_UNTRACKED, it must be ignored.
*/
else {
assert(gc_refs > 0
|| gc_refs == GC_REACHABLE
|| gc_refs == GC_UNTRACKED);
}
}
return 0;
}
visit_reachable
函数中代码[3]中,如果该对象曾被标记为GC_TENTATIVELY_UNREACHABLE
,那么该对象现在能被root obejct
直接或是间接引用到,所以又将它从unreachable
链表中移出,并且移到reachable
链表(就是young
链表)中。
这样对象就被分割在两条链表中了,young
链表和unreachable
链表。图2显示了被分割后的结果:
f. 步骤[6]
当move_unreachable
完成之后,最初的一条链表变成了两条,在unreachable
链表中,就是我们发现的垃圾对象,是垃圾回收的目标。但是unreachable
中的链表真的能安全回收吗?
答案是不能的。
原因是当我们用Python定义一个class时,可以为这个class定义一个特殊的方法:__del__
,这在Python中称finalizer
。当一个拥有一个finalizer
的实例对象被销毁时,首先会调用这个finalizer
,因为这个__del__
就是Python为开发人员提供的在对象被销毁是进行的某种资源释放的Hook
机制。
这样问题就来了,在unreachable
链表中只存在循环引用的对象,需要被销毁。所以如果在对象A的__del__
中有了对对象B引用,那么必须保证对象B的销毁必须在对象A之后,然而Python并不能保证垃圾回收时对象的回收顺序。
所以Python采取了一种保守的做法:将unreachable
链表中拥有finalizer
的实例对象统统都移到一个名为garbage
的PyListObejct
的对象中。
因此第六步即是完成上面的步骤:
// 对于unreachable链表中的对象,如果带有__del__方法,则不能安全回收
// 需要将这些对象收集到finalizers链表中,因此,这些对象的引用对象也
// 不能回收,也需要放到finalizers链表中
gc_list_init(&finalizers);
move_finalizers(&unreachable, &finalizers);
其中move_finalizers
的函数源码如下:
/* Move the objects in unreachable with __del__ methods into `finalizers`.
* Objects moved into `finalizers` have gc_refs set to GC_REACHABLE; the
* objects remaining in unreachable are left at GC_TENTATIVELY_UNREACHABLE.
*/
static void
move_finalizers(PyGC_Head *unreachable, PyGC_Head *finalizers)
{
PyGC_Head *gc;
PyGC_Head *next;
/* March over unreachable. Move objects with finalizers into
* `finalizers`.
*/
for (gc = unreachable->gc.gc_next; gc != unreachable; gc = next) {
PyObject *op = FROM_GC(gc);
assert(IS_TENTATIVELY_UNREACHABLE(op));
next = gc->gc.gc_next;
// [4] 如果对象中含__del__,则将其移到finalizers链表中
if (has_finalizer(op)) {
gc_list_move(gc, finalizers);
gc->gc.gc_refs = GC_REACHABLE;
}
}
}
在move_finalizers
函数的代码4中,判断对象中含__del__
,若含有,则将对象及其引用的对象都移到finalizers
链表中。
而move_finalizer_reachable
函数则是将含有__del__
的对象的引用对象也标记为不能回收,同时也放到finalizers
链表中,具体参考Python源码。
g. 步骤[7]
步骤七是将finalizers
中的对象又设置为可达对象(gc_refs=GC_REACHABLE
),源代码如下:
/* finalizers contains the unreachable objects with a finalizer;
* unreachable objects reachable *from* those are also uncollectable,
* and we move those into the finalizers list too.
*/
move_finalizer_reachable(&finalizers);
其中move_finalizer_reachable
代码如下:
/* Move objects that are reachable from finalizers, from the unreachable set
* into finalizers set.
*/
static void
move_finalizer_reachable(PyGC_Head *finalizers)
{
traverseproc traverse;
PyGC_Head *gc = finalizers->gc.gc_next;
for (; gc != finalizers; gc = gc->gc.gc_next) {
/* Note that the finalizers list may grow during this. */
traverse = Py_TYPE(FROM_GC(gc))->tp_traverse;
(void) traverse(FROM_GC(gc),
(visitproc)visit_move,
(void *)finalizers);
}
}
其实就是将finalizers
链表遍历一下,并通过visit_move
回调函数将finalizers
中的对象标记为GC_REACHABLE
,并将其移到reachable
链表(young
链表)中。
h. 步骤[8]
步骤八为debug信息,可以打印可回收对象。
i. 步骤[9]
步骤九为处理所有弱引用到unreachable对象的对象,如果弱引用对象仍然生存则放回old
链表中。代码如下:
/* Clear weakrefs and invoke callbacks as necessary. */
m += handle_weakrefs(&unreachable, old);
其中handle_weakrefs
函数代码如下所示:
static int
handle_weakrefs(PyGC_Head *unreachable, PyGC_Head *old)
{
PyGC_Head *gc;
PyObject *op; /* generally FROM_GC(gc) */
PyWeakReference *wr; /* generally a cast of op */
PyGC_Head wrcb_to_call; /* weakrefs with callbacks to call */
PyGC_Head *next;
int num_freed = 0;
// [5]
gc_list_init(&wrcb_to_call);
for (gc = unreachable->gc.gc_next; gc != unreachable; gc = next) {
PyWeakReference **wrlist;
op = FROM_GC(gc);
assert(IS_TENTATIVELY_UNREACHABLE(op));
next = gc->gc.gc_next;
// [6]
if (! PyType_SUPPORTS_WEAKREFS(Py_TYPE(op)))
continue;
/* [7] It supports weakrefs. Does it have any? */
wrlist = (PyWeakReference **)
PyObject_GET_WEAKREFS_LISTPTR(op);
for (wr = *wrlist; wr != NULL; wr = *wrlist) {
PyGC_Head *wrasgc; /* AS_GC(wr) */
assert(wr->wr_object == op);
_PyWeakref_ClearRef(wr);
assert(wr->wr_object == Py_None);
// [8]
if (wr->wr_callback == NULL)
continue; /* no callback */
if (IS_TENTATIVELY_UNREACHABLE(wr))
continue;
assert(IS_REACHABLE(wr));
/* [9] Create a new reference so that wr can't go away
* before we can process it again.
*/
Py_INCREF(wr);
/* Move wr to wrcb_to_call, for the next pass. */
wrasgc = AS_GC(wr);
assert(wrasgc != next); /* wrasgc is reachable, but
next isn't, so they can't
be the same */
// [10]
gc_list_move(wrasgc, &wrcb_to_call);
}
}
// [11]
while (! gc_list_is_empty(&wrcb_to_call)) {
PyObject *temp;
PyObject *callback;
gc = wrcb_to_call.gc.gc_next;
op = FROM_GC(gc);
assert(IS_REACHABLE(op));
assert(PyWeakref_Check(op));
wr = (PyWeakReference *)op;
callback = wr->wr_callback;
assert(callback != NULL);
/*[12] copy-paste of weakrefobject.c's handle_callback() */
temp = PyObject_CallFunctionObjArgs(callback, wr, NULL);
if (temp == NULL)
PyErr_WriteUnraisable(callback);
else
Py_DECREF(temp);
//[13]
Py_DECREF(op);
if (wrcb_to_call.gc.gc_next == gc) {
/*[14] object is still alive -- move it */
gc_list_move(gc, old);
}
else
++num_freed;
}
return num_freed;
}
首先函数定义并初始化了空的链表wrcb_to_call
(代码5),其次遍历unreachable
链表,如果其中的某个元素是否支持弱引用,如果不支持,就跳过了(代码6)。若支持,那么就要拿到弱引用的列表(代码7),在确定元素存在callback
之后(代码8),将该元素的引用加1(代码9),原因是要保证在接下来的处理过程中,该元素是一直存在的,而不会因为引用计数被减为0而被销毁。然后将该元素存入wrcb_to_call
链表中(代码10)。
接着遍历wrcb_to_call
链表(代码11),执行callback
对象中的函数(代码12),调用之后,会出现两种情况:失败报错误,成功对temp
对象减引用。并对该callback
对象进行减引用(代码13),与代码9相对应,接着判断这个对象是否还存在于wrcb_to_call
链表中,如果还存在,则将其放入old
链表中(代码14)。
Python通过handle_weakrefs
函数注册callback
操作,所以这个行为有点类似带有__del__
的实例对象。但是它们还是有本质的不同,weakref
能够被正确的清理掉,虽然必须引入额外繁琐的操作,这些操作都隐藏在handle_weakrefs
函数中。而带有__del__
的实例对象是不能自动被清除的,它最终会被放入garbage
链表中。
j. 步骤[10]
第十步删除unreachable链表中的对象(垃圾),这会打破对象的循环引用。源码如下:
/* Call tp_clear on objects in the unreachable set. This will cause
* the reference cycles to be broken. It may also cause some objects
* in finalizers to be freed.
*/
delete_garbage(&unreachable, old);
函数输入是unreachable
链表和old
链表:在unreachable
链表中存着可以很安全删除的不可达对象,所以delete_garbage
就会遍历链表并且使用对象类型中的函数tp_clear
,将对象实例回收掉,如果在tp_clear
之后还是存活下来的,则将他移到old
链表中。
delete_garbage
源码如下:
/* Break reference cycles by clearing the containers involved. This is
* tricky business as the lists can be changing and we don't know which
* objects may be freed. It is possible I screwed something up here.
*/
static void
delete_garbage(PyGC_Head *collectable, PyGC_Head *old)
{
inquiry clear;
while (!gc_list_is_empty(collectable)) {
PyGC_Head *gc = collectable->gc.gc_next;
PyObject *op = FROM_GC(gc);
assert(IS_TENTATIVELY_UNREACHABLE(op));
// [15]
if (debug & DEBUG_SAVEALL) {
PyList_Append(garbage, op);
}
else {
// [16]
if ((clear = Py_TYPE(op)->tp_clear) != NULL) {
// [17]
Py_INCREF(op);
// [18]
clear(op);
// [19]
Py_DECREF(op);
}
}
if (collectable->gc.gc_next == gc) {
/* [20] object is still alive, move it, it may die later */
gc_list_move(gc, old);
gc->gc.gc_refs = GC_REACHABLE;
}
}
}
在代码15中,如果发现有debug调试,并且需要将其都保留,则不执行删除操作。如果没有debug信息,则需要获取对象的类型(type)中的tp_clear
函数(代码16),其实tp_clear
就是对象类型的析构函数。在代码17中,先将自己的引用计数加1,以保证在tp_clear
内部中,将自己的引用计数不会减为0,即不会在tp_clear
中op
在引用计数的机制被清除掉。然后调用tp_clear
(代码18),析构完成之后,再将op
的引用计数减1(代码19),以保证计数平衡。
在一般情况下,经过tp_clear
和Py_DECREF(op)
后,op
对象会被清除,然后脱离unreachable
链表,但是如果op
没有被清除,没有脱离unreachable
链表,则会满足collectable->gc.gc_next == gc
条件,然后将其放入old
链表中,并将其gc_refs
设置为GC_REACHABLE
(代码20)。
k. 步骤[11]
步骤十一为debug信息,可以打印不可回收对象。
l. 步骤[12]
第十步将unreachable
链表的对象处理了,但是还有finalizers
链表的对象需要处理。所以第十二步为把所有finalizers
链表中的对象放到gc.garbage
中。源码如下:
/* Append instances in the uncollectable set to a Python
* reachable list of garbage. The programmer has to deal with
* this if they insist on creating this type of structure.
*/
(void)handle_finalizers(&finalizers, old);
其中handle_finalizers
源码如下:
static int
handle_finalizers(PyGC_Head *finalizers, PyGC_Head *old)
{
PyGC_Head *gc = finalizers->gc.gc_next;
if (garbage == NULL) {
garbage = PyList_New(0);
if (garbage == NULL)
Py_FatalError("gc couldn't create gc.garbage list");
}
for (; gc != finalizers; gc = gc->gc.gc_next) {
PyObject *op = FROM_GC(gc);
// [21]
if ((debug & DEBUG_SAVEALL) || has_finalizer(op)) {
if (PyList_Append(garbage, op) < 0)
return -1;
}
}
// [22]
gc_list_merge(finalizers, old);
return 0;
}
这个handle_finalizers
函数在会去判断下是否满足DEBUG_SAVEALL
或者是存在__del__
函数条件,如果满足,则将对象op
添加到garbage
链表中(认为这些对象是垃圾,但是不清除内存,代码21)。因为这些对象是uncolletable
的对象,所以最后只能将finalizers
中所有对象移入old
链表中(代码22)。
m. 步骤[13]
第十三步是返回不可达对象个数。
2. 总结
尽管Python采用了最经典的引用计数来作为自动内存管理的方案,但是Python采用了多种方式来弥补引用计数的不足,内存池的大量使用,标记-清除来及收集技术,世代回收技术的使用都极大地完善了Python的内存管理机制。
内存管理和垃圾回收是一门非常精细和繁琐的技术,这里的剖析无法覆盖实际内存管理回收机制的细微之处,有需要的可以翻阅Python源码。