Python的垃圾回收机制(四)之回收流程源码剖析

0. 背景

之前介绍了Python垃圾回收的简介内存模型回收机制,这里会详细讲述上一文回收机制中回收流程的具体实现。

1. 回收流程全景

输入参数generation表示函数必须指定某一代,在这里我们称为”当代“;返回不可达对象的个数。

/* This is the main function.  Read this to understand how the
 * collection process works. */
static Py_ssize_t
collect(int generation)
{
    //[1]: 初始化定义

    //[2]: 将比当代处理的“代”更年轻的“代”的链表合并到当前“代”中

    //[3]: 将当代对象中的引用计数抄一份到对象的Py_GC_Head中(gc_refs = ob_refcnt)

    //[4]: 通过traverse机制更新对象中的引用计数,最终不可达对象的引用计数会被减为0,不为0即是root object

    //[5]: 初始化一个链表叫unreachable,把不可达的对象(引用计数为0)移到该链表中

    //[6]: 又初始化一个链表叫finalizers, 把链表unreachable中带有__del__的对象进一步分离到finalizers

    //[7]: 将finalizers中的对象又设置为可达对象(gc_refs=GC_REACHABLE)

    //[8]: debug信息,可以打印可回收对象

    //[9]: 通知所有弱引用到unreachable对象的对象,如果弱引用对象仍然生存则放回

    //[10]: 删除unreachable链表中的对象(垃圾),这会打破对象的循环引用

    //[11]: debug信息,可以打印不可回收对象

    //[12]: 把所有finalizers链表中的对象放到gc.garbage中

    //[13]: 返回不可达对象个数
}

a. 步骤[1]

初始化定义了如下几个变量:

    //[1]: 初始化定义
    Py_ssize_t m = 0; /* # objects collected */
    Py_ssize_t n = 0; /* # unreachable objects that couldn't be collected */
    PyGC_Head *young; /* the generation we are examining */
    PyGC_Head *old; /* next older generation */
    PyGC_Head unreachable; /* non-problematic unreachable trash */
    PyGC_Head finalizers;  /* objects with, & reachable from, __del__ */
    PyGC_Head *gc;

m和n之和其实就是不可达对象的个数,只是m个不可达对象会被回收,而n个不可达对象不能被回收。

young, old, unreachable, finalizers是四个指向PyGC_Head的链表,具体解释参考代码中的注释。

b. 步骤[2]

第二个步骤是将比当代处理的“代”更年轻的“代”的链表合并到当前“代”中。
代码如下:

    //[2]: 将比当代处理的“代”更年轻的“代”的链表合并到当前“代”中
    /* merge younger generations with one we are currently collecting */
    for (i = 0; i < generation; i++) {
        gc_list_merge(GEN_HEAD(i), GEN_HEAD(generation));
    }

这个for循环中即是在遍历所有比generation小的“代”,并通过

static void
gc_list_merge(PyGC_Head *from, PyGC_Head *to)

合并到一条链表上。具体代码如下:

/* append list `from` onto list `to`; `from` becomes an empty list */
static void
gc_list_merge(PyGC_Head *from, PyGC_Head *to)
{
    PyGC_Head *tail;
    assert(from != to);
    if (!gc_list_is_empty(from)) {
        tail = to->gc.gc_prev;
        tail->gc.gc_next = from->gc.gc_next;
        tail->gc.gc_next->gc.gc_prev = tail;
        to->gc.gc_prev = from->gc.gc_prev;
        to->gc.gc_prev->gc.gc_next = to;
    }
    gc_list_init(from);
}

这个就不细说了,其实就是典型的双向链表合并的代码。下图即表示了对可回收对象链表的合并操作。

这里写图片描述
图1. 两条链表合并图解

c. 步骤[3]

第三步是将当代对象中的引用计数抄一份到对象的Py_GC_Head

//[3]: 将当代对象中的引用计数抄一份到对象的Py_GC_Head中
update_refs(young);

具体代码:

/* Set all gc_refs = ob_refcnt.  After this, gc_refs is > 0 for all objects
 * in containers, and is GC_REACHABLE for all tracked gc objects not in
 * containers.
 */
static void
update_refs(PyGC_Head *containers)
{
    PyGC_Head *gc = containers->gc.gc_next;
    for (; gc != containers; gc = gc->gc.gc_next) {
        assert(gc->gc.gc_refs == GC_REACHABLE);
        gc->gc.gc_refs = Py_REFCNT(FROM_GC(gc));
        /* Python's cyclic gc should never see an incoming refcount
         * of 0:  if something decref'ed to 0, it should have been
         * deallocated immediately at that time.
         * Possible cause (if the assert triggers):  a tp_dealloc
         * routine left a gc-aware object tracked during its teardown
         * phase, and did something-- or allowed something to happen --
         * that called back into Python.  gc can trigger then, and may
         * see the still-tracked dying object.  Before this assert
         * was added, such mistakes went on to allow gc to try to
         * delete the object again.  In a debug build, that caused
         * a mysterious segfault, when _Py_ForgetReference tried
         * to remove the object from the doubly-linked list of all
         * objects a second time.  In a release build, an actual
         * double deallocation occurred, which leads to corruption
         * of the allocator's internal bookkeeping pointers.  That's
         * so serious that maybe this should be a release-build
         * check instead of an assert?
         */
        assert(gc->gc.gc_refs != 0);
    }
}

为了寻找Root Object集合,这里提出有效引用计数的概念,除去循环引用而增加的计数,以此打破循环引用的问题。

Python并不改动真实的引用计数,而是改动引用计数的副本。对于副本无论做任何的改动,都不会影响到对象生命周期的维护,因为这个副本的唯一作用就是寻找root object集合。这个副本就是PyGC_Head中的gc.gc_ref。在垃圾收集的第一步,就是遍历可收集对象链表,将每个对象的gc.gc_ref值设置为其ob_refcnt值。

d. 步骤[4]

第四步是通过Python的traverse机制更新对象中的引用计数,最终不可达对象的引用计数会被减为0:

//通过Python的traverse机制更新对象中的引用计数
//,最终不可达对象的引用计数会被减为0
subtract_refs(young);

其实这步就是对gc_refs进行操作,将循环引用从引用中摘除,获得有效引用计数,其源码如下:

/* Subtract internal references from gc_refs.  After this, gc_refs is >= 0
 * for all objects in containers, and is GC_REACHABLE for all tracked gc
 * objects not in containers.  The ones with gc_refs > 0 are directly
 * reachable from outside containers, and so can't be collected.
 */
static void
subtract_refs(PyGC_Head *containers)
{
    traverseproc traverse;
    PyGC_Head *gc = containers->gc.gc_next;
    for (; gc != containers; gc=gc->gc.gc_next) {
        traverse = Py_TYPE(FROM_GC(gc))->tp_traverse;
        (void) traverse(FROM_GC(gc),
                       (visitproc)visit_decref,
                       NULL);
    }
}

其中的traverse是与特定的PyObejct对象相关的,在PyObejct对象类型对象中定义。

一般来说,traverse的动作都是遍历PyObejct对象中的每一个引用,然后对引用进行某种动作,而这个动作在subtract_refs中就是visit_decref,它以一个回调函数的形式传递到traverse操作中

我们来看一下visit_decref的代码:

/* A traversal callback for subtract_refs. */
static int
visit_decref(PyObject *op, void *data)
{
    assert(op != NULL);
    if (PyObject_IS_GC(op)) {
        PyGC_Head *gc = AS_GC(op);
        /* We're only interested in gc_refs for objects in the
         * generation being collected, which can be recognized
         * because only they have positive gc_refs.
         */
        assert(gc->gc.gc_refs != 0); /* else refcount was too small */
        if (gc->gc.gc_refs > 0)
            gc->gc.gc_refs--;
    }
    return 0;
}

在完成了subtract_refs之后,可收集对象链表中所有PyObject对象之间的循环引用都被摘除了。这时,有一些PyObject对象的PyGC_Head.gc.gc_refs还不为0,这就意味着存在这些对象的外部引用,这些对象,就是标记清除的root object集合。

下图展示了例子经过update_refssubtract_refs两步处理所得到的root object集合。

这里写图片描述
图2. update_refs与subtract_refs图解

e. 步骤[5]

第五步是初始化一个链表叫unreachable,把不可达的对象(引用计数为0)移到该链表中:

/* Leave everything reachable from outside young in young, and move
     * everything else (in young) to unreachable.
     * NOTE:  This used to move the reachable objects into a reachable
     * set instead.  But most things usually turn out to be reachable,
     * so it's more efficient to move the unreachable things.
     */
    gc_list_init(&unreachable);
    move_unreachable(young, &unreachable);

在成功找到root object集合之后,我们就可以沿着集合出发,沿着全集(当代young)的链表,一个接着一个标记不能回收的的内存,由于root object的对象是不能回收的,收到凡是由root obejct,所以被这些对象直接或是间接引用的内存对象也是不能回收的。

首先在遍历之前,我们需要维护一个unreachable的链表,遍历当代young,通过move_unreachable函数将不可达的对象放到unreachable链表中,接下来,我们只需对unreachable链表进行垃圾回收即可。move_unreachable函数源码如下:

/* Move the unreachable objects from young to unreachable.  After this,
 * all objects in young have gc_refs = GC_REACHABLE, and all objects in
 * unreachable have gc_refs = GC_TENTATIVELY_UNREACHABLE.  All tracked
 * gc objects not in young or unreachable still have gc_refs = GC_REACHABLE.
 * All objects in young after this are directly or indirectly reachable
 * from outside the original young; and all objects in unreachable are
 * not.
 */
static void
move_unreachable(PyGC_Head *young, PyGC_Head *unreachable)
{
    PyGC_Head *gc = young->gc.gc_next;

    /* Invariants:  all objects "to the left" of us in young have gc_refs
     * = GC_REACHABLE, and are indeed reachable (directly or indirectly)
     * from outside the young list as it was at entry.  All other objects
     * from the original young "to the left" of us are in unreachable now,
     * and have gc_refs = GC_TENTATIVELY_UNREACHABLE.  All objects to the
     * left of us in 'young' now have been scanned, and no objects here
     * or to the right have been scanned yet.
     */

    while (gc != young) {
        PyGC_Head *next;

        if (gc->gc.gc_refs) {
            /* gc is definitely reachable from outside the
             * original 'young'.  Mark it as such, and traverse
             * its pointers to find any other objects that may
             * be directly reachable from it.  Note that the
             * call to tp_traverse may append objects to young,
             * so we have to wait until it returns to determine
             * the next object to visit.
             */
            PyObject *op = FROM_GC(gc);
            traverseproc traverse = Py_TYPE(op)->tp_traverse;
            assert(gc->gc.gc_refs > 0);
            gc->gc.gc_refs = GC_REACHABLE;
            // [1]: visit_reachable,由root obejct开始,标志直接或间接引用的object为REACHABLE
            (void) traverse(op,
                            (visitproc)visit_reachable,
                            (void *)young);
            next = gc->gc.gc_next;
            if (PyTuple_CheckExact(op)) {
                _PyTuple_MaybeUntrack(op);
            }
        }
        else {
            /* This *may* be unreachable.  To make progress,
             * assume it is.  gc isn't directly reachable from
             * any object we've already traversed, but may be
             * reachable from an object we haven't gotten to yet.
             * visit_reachable will eventually move gc back into
             * young if that's so, and we'll see it again.
             */
            next = gc->gc.gc_next;
            // [2]: Py_GC_Head.gc_refs设置为 GC_TENTATIVELY_UNREACHABLE,并把此元素A移到unreachable链表中
            gc_list_move(gc, unreachable);
            gc->gc.gc_refs = GC_TENTATIVELY_UNREACHABLE;
        }
        gc = next;
    }
}

move_unreachable中,沿着young链表向前,并检查元素的Py_GC_Head.gc_refs值。由于它并不是真正的由root object开始,类似图论中的DFS或BFS的遍历方式,而是链表一直向前的遍历的方式,所以当遇到某个对象APy_GC_Head.gc_refs为0,我们并不能立即判定这个对象A就是垃圾对象(unreachable对象)。因为在这个对象A之后的young链表上,也许还会遇到某个root object,而这个root object正好引用着对象A,因此在move_unreachable函数代码[2]处,将其暂时设置为:Py_GC_Head.gc_refs = GC_TENTATIVELY_UNREACHABLE,并把此元素A移到unreachable链表中,但是通过代码[1]处,visit_reachable函数,将被root obejct直接或间接引用的该对象,标志为REACHABLE,尽管该对象的refs被标记为GC_TENTATIVELY_UNREACHABLE:

/* A traversal callback for move_unreachable. */
static int
visit_reachable(PyObject *op, PyGC_Head *reachable)
{
    if (PyObject_IS_GC(op)) {
        PyGC_Head *gc = AS_GC(op);
        const Py_ssize_t gc_refs = gc->gc.gc_refs;

        if (gc_refs == 0) {
            /* This is in move_unreachable's 'young' list, but
             * the traversal hasn't yet gotten to it.  All
             * we need to do is tell move_unreachable that it's
             * reachable.
             */
            gc->gc.gc_refs = 1;
        }
        //[3]: 遇到GC_TENTATIVELY_UNREACHABLE对象就将其移出`unreachable`链表
        else if (gc_refs == GC_TENTATIVELY_UNREACHABLE) {
            /* This had gc_refs = 0 when move_unreachable got
             * to it, but turns out it's reachable after all.
             * Move it back to move_unreachable's 'young' list,
             * and move_unreachable will eventually get to it
             * again.
             */
            gc_list_move(gc, reachable);
            gc->gc.gc_refs = 1;
        }
        /* Else there's nothing to do.
         * If gc_refs > 0, it must be in move_unreachable's 'young'
         * list, and move_unreachable will eventually get to it.
         * If gc_refs == GC_REACHABLE, it's either in some other
         * generation so we don't care about it, or move_unreachable
         * already dealt with it.
         * If gc_refs == GC_UNTRACKED, it must be ignored.
         */
         else {
            assert(gc_refs > 0
                   || gc_refs == GC_REACHABLE
                   || gc_refs == GC_UNTRACKED);
         }
    }
    return 0;
}

visit_reachable函数中代码[3]中,如果该对象曾被标记为GC_TENTATIVELY_UNREACHABLE,那么该对象现在能被root obejct直接或是间接引用到,所以又将它从unreachable链表中移出,并且移到reachable链表(就是young链表)中。

这样对象就被分割在两条链表中了,young链表和unreachable链表。图2显示了被分割后的结果:

f. 步骤[6]

move_unreachable完成之后,最初的一条链表变成了两条,在unreachable链表中,就是我们发现的垃圾对象,是垃圾回收的目标。但是unreachable中的链表真的能安全回收吗?

答案是不能的。

原因是当我们用Python定义一个class时,可以为这个class定义一个特殊的方法:__del__,这在Python中称finalizer。当一个拥有一个finalizer的实例对象被销毁时,首先会调用这个finalizer,因为这个__del__就是Python为开发人员提供的在对象被销毁是进行的某种资源释放的Hook机制。

这样问题就来了,在unreachable链表中只存在循环引用的对象,需要被销毁。所以如果在对象A__del__中有了对对象B引用,那么必须保证对象B的销毁必须在对象A之后,然而Python并不能保证垃圾回收时对象的回收顺序。

所以Python采取了一种保守的做法:将unreachable链表中拥有finalizer的实例对象统统都移到一个名为garbagePyListObejct的对象中。

因此第六步即是完成上面的步骤:

// 对于unreachable链表中的对象,如果带有__del__方法,则不能安全回收
// 需要将这些对象收集到finalizers链表中,因此,这些对象的引用对象也
// 不能回收,也需要放到finalizers链表中
    gc_list_init(&finalizers);
    move_finalizers(&unreachable, &finalizers);

其中move_finalizers的函数源码如下:

/* Move the objects in unreachable with __del__ methods into `finalizers`.
 * Objects moved into `finalizers` have gc_refs set to GC_REACHABLE; the
 * objects remaining in unreachable are left at GC_TENTATIVELY_UNREACHABLE.
 */
static void
move_finalizers(PyGC_Head *unreachable, PyGC_Head *finalizers)
{
    PyGC_Head *gc;
    PyGC_Head *next;

    /* March over unreachable.  Move objects with finalizers into
     * `finalizers`.
     */
    for (gc = unreachable->gc.gc_next; gc != unreachable; gc = next) {
        PyObject *op = FROM_GC(gc);

        assert(IS_TENTATIVELY_UNREACHABLE(op));
        next = gc->gc.gc_next;

        // [4] 如果对象中含__del__,则将其移到finalizers链表中
        if (has_finalizer(op)) {
            gc_list_move(gc, finalizers);
            gc->gc.gc_refs = GC_REACHABLE;
        }
    }
}

move_finalizers函数的代码4中,判断对象中含__del__,若含有,则将对象及其引用的对象都移到finalizers链表中。

move_finalizer_reachable函数则是将含有__del__的对象的引用对象也标记为不能回收,同时也放到finalizers链表中,具体参考Python源码。

g. 步骤[7]

步骤七是将finalizers中的对象又设置为可达对象(gc_refs=GC_REACHABLE),源代码如下:

    /* finalizers contains the unreachable objects with a finalizer;
     * unreachable objects reachable *from* those are also uncollectable,
     * and we move those into the finalizers list too.
     */
    move_finalizer_reachable(&finalizers);

其中move_finalizer_reachable代码如下:

/* Move objects that are reachable from finalizers, from the unreachable set
 * into finalizers set.
 */
static void
move_finalizer_reachable(PyGC_Head *finalizers)
{
    traverseproc traverse;
    PyGC_Head *gc = finalizers->gc.gc_next;
    for (; gc != finalizers; gc = gc->gc.gc_next) {
        /* Note that the finalizers list may grow during this. */
        traverse = Py_TYPE(FROM_GC(gc))->tp_traverse;
        (void) traverse(FROM_GC(gc),
                        (visitproc)visit_move,
                        (void *)finalizers);
    }
}

其实就是将finalizers链表遍历一下,并通过visit_move回调函数将finalizers中的对象标记为GC_REACHABLE,并将其移到reachable链表(young链表)中。

h. 步骤[8]

步骤八为debug信息,可以打印可回收对象。

i. 步骤[9]

步骤九为处理所有弱引用到unreachable对象的对象,如果弱引用对象仍然生存则放回old链表中。代码如下:

/* Clear weakrefs and invoke callbacks as necessary. */
    m += handle_weakrefs(&unreachable, old);

其中handle_weakrefs函数代码如下所示:

static int
handle_weakrefs(PyGC_Head *unreachable, PyGC_Head *old)
{
    PyGC_Head *gc;
    PyObject *op;               /* generally FROM_GC(gc) */
    PyWeakReference *wr;        /* generally a cast of op */
    PyGC_Head wrcb_to_call;     /* weakrefs with callbacks to call */
    PyGC_Head *next;
    int num_freed = 0;

    // [5]
    gc_list_init(&wrcb_to_call);

    for (gc = unreachable->gc.gc_next; gc != unreachable; gc = next) {
        PyWeakReference **wrlist;

        op = FROM_GC(gc);
        assert(IS_TENTATIVELY_UNREACHABLE(op));
        next = gc->gc.gc_next;

        // [6]
        if (! PyType_SUPPORTS_WEAKREFS(Py_TYPE(op)))
            continue; 

        /* [7] It supports weakrefs.  Does it have any? */
        wrlist = (PyWeakReference **)
                                PyObject_GET_WEAKREFS_LISTPTR(op);


        for (wr = *wrlist; wr != NULL; wr = *wrlist) {
            PyGC_Head *wrasgc;                  /* AS_GC(wr) */

            assert(wr->wr_object == op);
            _PyWeakref_ClearRef(wr);
            assert(wr->wr_object == Py_None);

            // [8]
            if (wr->wr_callback == NULL)
                continue;                       /* no callback */


            if (IS_TENTATIVELY_UNREACHABLE(wr))
                continue;
            assert(IS_REACHABLE(wr));

            /* [9] Create a new reference so that wr can't go away
             * before we can process it again.
             */
            Py_INCREF(wr);

            /* Move wr to wrcb_to_call, for the next pass. */
            wrasgc = AS_GC(wr);
            assert(wrasgc != next); /* wrasgc is reachable, but
                                       next isn't, so they can't
                                       be the same */
            // [10]
            gc_list_move(wrasgc, &wrcb_to_call);
        }
    }

    // [11]
    while (! gc_list_is_empty(&wrcb_to_call)) {
        PyObject *temp;
        PyObject *callback;

        gc = wrcb_to_call.gc.gc_next;
        op = FROM_GC(gc);
        assert(IS_REACHABLE(op));
        assert(PyWeakref_Check(op));
        wr = (PyWeakReference *)op;
        callback = wr->wr_callback;
        assert(callback != NULL);

        /*[12] copy-paste of weakrefobject.c's handle_callback() */
        temp = PyObject_CallFunctionObjArgs(callback, wr, NULL);
        if (temp == NULL)
            PyErr_WriteUnraisable(callback);
        else
            Py_DECREF(temp);

        //[13]
        Py_DECREF(op);
        if (wrcb_to_call.gc.gc_next == gc) {
            /*[14] object is still alive -- move it */
            gc_list_move(gc, old);
        }
        else
            ++num_freed;
    }

    return num_freed;
}

首先函数定义并初始化了空的链表wrcb_to_call(代码5),其次遍历unreachable链表,如果其中的某个元素是否支持弱引用,如果不支持,就跳过了(代码6)。若支持,那么就要拿到弱引用的列表(代码7),在确定元素存在callback之后(代码8),将该元素的引用加1(代码9),原因是要保证在接下来的处理过程中,该元素是一直存在的,而不会因为引用计数被减为0而被销毁。然后将该元素存入wrcb_to_call链表中(代码10)。

接着遍历wrcb_to_call链表(代码11),执行callback对象中的函数(代码12),调用之后,会出现两种情况:失败报错误,成功对temp对象减引用。并对该callback对象进行减引用(代码13),与代码9相对应,接着判断这个对象是否还存在于wrcb_to_call链表中,如果还存在,则将其放入old链表中(代码14)。

Python通过handle_weakrefs函数注册callback操作,所以这个行为有点类似带有__del__的实例对象。但是它们还是有本质的不同,weakref能够被正确的清理掉,虽然必须引入额外繁琐的操作,这些操作都隐藏在handle_weakrefs函数中。而带有__del__的实例对象是不能自动被清除的,它最终会被放入garbage链表中。

j. 步骤[10]

第十步删除unreachable链表中的对象(垃圾),这会打破对象的循环引用。源码如下:

    /* Call tp_clear on objects in the unreachable set.  This will cause
     * the reference cycles to be broken.  It may also cause some objects
     * in finalizers to be freed.
     */
    delete_garbage(&unreachable, old);

函数输入是unreachable链表和old链表:在unreachable链表中存着可以很安全删除的不可达对象,所以delete_garbage就会遍历链表并且使用对象类型中的函数tp_clear,将对象实例回收掉,如果在tp_clear之后还是存活下来的,则将他移到old链表中。

delete_garbage源码如下:

/* Break reference cycles by clearing the containers involved.  This is
 * tricky business as the lists can be changing and we don't know which
 * objects may be freed.  It is possible I screwed something up here.
 */
static void
delete_garbage(PyGC_Head *collectable, PyGC_Head *old)
{
    inquiry clear;

    while (!gc_list_is_empty(collectable)) {
        PyGC_Head *gc = collectable->gc.gc_next;
        PyObject *op = FROM_GC(gc);

        assert(IS_TENTATIVELY_UNREACHABLE(op));
        // [15]
        if (debug & DEBUG_SAVEALL) {
            PyList_Append(garbage, op);
        }
        else {
            // [16]
            if ((clear = Py_TYPE(op)->tp_clear) != NULL) {
                // [17]
                Py_INCREF(op);
                // [18]
                clear(op);
                // [19]
                Py_DECREF(op);
            }
        }
        if (collectable->gc.gc_next == gc) {
            /* [20] object is still alive, move it, it may die later */
            gc_list_move(gc, old);
            gc->gc.gc_refs = GC_REACHABLE;
        }
    }
}

代码15中,如果发现有debug调试,并且需要将其都保留,则不执行删除操作。如果没有debug信息,则需要获取对象的类型(type)中的tp_clear函数(代码16),其实tp_clear就是对象类型的析构函数。在代码17中,先将自己的引用计数加1,以保证在tp_clear内部中,将自己的引用计数不会减为0,即不会在tp_clearop在引用计数的机制被清除掉。然后调用tp_clear代码18),析构完成之后,再将op的引用计数减1(代码19),以保证计数平衡。

在一般情况下,经过tp_clearPy_DECREF(op)后,op对象会被清除,然后脱离unreachable链表,但是如果op没有被清除,没有脱离unreachable链表,则会满足collectable->gc.gc_next == gc条件,然后将其放入old链表中,并将其gc_refs设置为GC_REACHABLE代码20)。

k. 步骤[11]

步骤十一为debug信息,可以打印不可回收对象。

l. 步骤[12]

第十步将unreachable链表的对象处理了,但是还有finalizers链表的对象需要处理。所以第十二步为把所有finalizers链表中的对象放到gc.garbage中。源码如下:

/* Append instances in the uncollectable set to a Python
     * reachable list of garbage.  The programmer has to deal with
     * this if they insist on creating this type of structure.
     */
    (void)handle_finalizers(&finalizers, old);

其中handle_finalizers源码如下:

static int
handle_finalizers(PyGC_Head *finalizers, PyGC_Head *old)
{
    PyGC_Head *gc = finalizers->gc.gc_next;

    if (garbage == NULL) {
        garbage = PyList_New(0);
        if (garbage == NULL)
            Py_FatalError("gc couldn't create gc.garbage list");
    }
    for (; gc != finalizers; gc = gc->gc.gc_next) {
        PyObject *op = FROM_GC(gc);
        // [21]
        if ((debug & DEBUG_SAVEALL) || has_finalizer(op)) {
            if (PyList_Append(garbage, op) < 0)
                return -1;
        }
    }
    // [22]
    gc_list_merge(finalizers, old);
    return 0;
}

这个handle_finalizers函数在会去判断下是否满足DEBUG_SAVEALL或者是存在__del__函数条件,如果满足,则将对象op添加到garbage链表中(认为这些对象是垃圾,但是不清除内存,代码21)。因为这些对象是uncolletable的对象,所以最后只能将finalizers中所有对象移入old链表中(代码22)。

m. 步骤[13]

第十三步是返回不可达对象个数。

2. 总结

尽管Python采用了最经典的引用计数来作为自动内存管理的方案,但是Python采用了多种方式来弥补引用计数的不足,内存池的大量使用,标记-清除来及收集技术,世代回收技术的使用都极大地完善了Python的内存管理机制。

内存管理和垃圾回收是一门非常精细和繁琐的技术,这里的剖析无法覆盖实际内存管理回收机制的细微之处,有需要的可以翻阅Python源码

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值