0. 背景
之前介绍了Python垃圾回收的简介,内存模型和回收机制,这里会详细讲述上一文回收机制中回收流程的具体实现。
1. 回收流程全景
输入参数generation
表示函数必须指定某一代,在这里我们称为”当代“;返回不可达对象的个数。
/* This is the main function. Read this to understand how the
* collection process works. */
static Py_ssize_t
collect(int generation)
{
//[1]: 初始化定义
//[2]: 将比当代处理的“代”更年轻的“代”的链表合并到当前“代”中
//[3]: 将当代对象中的引用计数抄一份到对象的Py_GC_Head中(gc_refs = ob_refcnt)
//[4]: 通过traverse机制更新对象中的引用计数,最终不可达对象的引用计数会被减为0,不为0即是root object
//[5]: 初始化一个链表叫unreachable,把不可达的对象(引用计数为0)移到该链表中
//[6]: 又初始化一个链表叫finalizers, 把链表unreachable中带有__del__的对象进一步分离到finalizers
//[7]: 将finalizers中的对象又设置为可达对象(gc_refs=GC_REACHABLE)
//[8]: debug信息,可以打印可回收对象
//[9]: 通知所有弱引用到unreachable对象的对象,如果弱引用对象仍然生存则放回
//[10]: 删除unreachable链表中的对象(垃圾),这会打破对象的循环引用
//[11]: debug信息,可以打印不可回收对象
//[12]: 把所有finalizers链表中的对象放到gc.garbage中
//[13]: 返回不可达对象个数
}
a. 步骤[1]
初始化定义了如下几个变量:
//[1]: 初始化定义
Py_ssize_t m = 0; /* # objects collected */
Py_ssize_t n = 0; /* # unreachable objects that couldn't be collected */
PyGC_Head *young; /* the generation we are examining */
PyGC_Head *old; /* next older generation */
PyGC_Head unreachable; /* non-problematic unreachable trash */
PyGC_Head finalizers; /* objects with, & reachable from, __del__ */
PyGC_Head *gc;
m和n之和其实就是不可达对象的个数,只是m个不可达对象会被回收,而n个不可达对象不能被回收。
young
, old
, unreachable
, finalizers
是四个指向PyGC_Head
的链表,具体解释参考代码中的注释。
b. 步骤[2]
第二个步骤是将比当代处理的“代”更年轻的“代”的链表合并到当前“代”中。
代码如下:
//[2]: 将比当代处理的“代”更年轻的“代”的链表合并到当前“代”中
/* merge younger generations with one we are currently collecting */
for (i = 0; i < generation; i++) {
gc_list_merge(GEN_HEAD(i), GEN_HEAD(generation));
}
这个for循环中即是在遍历所有比generation小的“代”,并通过
static void
gc_list_merge(PyGC_Head *from, PyGC_Head *to)
合并到一条链表上。具体代码如下:
/* append list `from` onto list `to`; `from` becomes an empty list */
static void
gc_list_merge(PyGC_Head *from, PyGC_Head *to)
{
PyGC_Head *tail;
assert(from != to);
if (!gc_list_is_empty(from)) {
tail = to->gc.gc_prev;
tail->gc.gc_next = from->gc.gc_next;
tail->gc.gc_next->gc.gc_prev = tail;
to->gc.gc_prev = from->gc.gc_prev;
to->gc.gc_prev->gc.gc_next = to;
}
gc_list_init(from);
}
这个就不细说了,其实就是典型的双向链表合并的代码。下图即表示了对可回收对象链表的合并操作。
图1. 两条链表合并图解
c. 步骤[3]
第三步是将当代对象中的引用计数抄一份到对象的Py_GC_Head
中
//[3]: 将当代对象中的引用计数抄一份到对象的Py_GC_Head中
update_refs(young);
具体代码:
/* Set all gc_refs = ob_refcnt. After this, gc_refs is > 0 for all objects
* in containers, and is GC_REACHABLE for all tracked gc objects not in
* containers.
*/
static void
update_refs(PyGC_Head *containers)
{
PyGC_Head *gc = containers->gc.gc_next;
for (; gc != containers; gc = gc->gc.gc_next) {
assert(gc->gc.gc_refs == GC_REACHABLE);
gc->gc.gc_refs = Py_REFCNT(FROM_GC(gc));
/* Python's cyclic gc should never see an incoming refcount
* of 0: if something decref'ed to 0, it should have been
* deallocated immediately at that time.
* Possible cause (if the assert triggers): a tp_dealloc
* routine left a gc-aware object tracked during its teardown
* phase, and did something-- or allowed something to happen --
* that called back into Python. gc can trigger then, and may
* see the still-tracked dying object. Before this assert
* was added, such mistakes went on to allow gc to try to
* delete the object again. In a debug build, that caused
* a mysterious segfault, when _Py_ForgetReference tried
* to remove the object from the doubly-linked list of all
* objects a second time. In a release build, an actual
* double deallocation occurred, which leads to corruption
* of the allocator's internal bookkeeping pointers. That's
* so serious that maybe this should be a release-build
* check instead of an assert?
*/
assert(gc->gc.gc_refs != 0);
}
}
为了寻找Root Object集合,这里提出有效引用计数的概念,除去循环引用而增加的计数,以此打破循环引用的问题。