Python - Garbage Collection in Python 3.6

分享一个大牛的人工智能教程。零基础!通俗易懂!风趣幽默!希望你也加入到人工智能的队伍中来!请点击人工智能教程

This article describes garbage collection (GC) in Python 3.6.

Usually, you do not need to worry about memory management when the objects are no longer needed Python automatically reclaims the memory from them. However, understanding how GC works can help you write better Python programs.

Memory management

Unlike many other languages, Python does not necessarily release the memory back to the Operating System. Instead, it has a specialized object allocator for small objects (smaller or equal to 512 bytes), which keeps some chunks of already allocated memory for further use in future. The amount of memory that Python holds depending on the usage patterns, in some cases all allocated memory is never released.

Therefore, if a long-running Python process takes more memory over time, it does not necessarily mean that you have memory leaks.

Garbage collection algorithms

Standard CPython's garbage collector has two components, the reference counting collector and the generational garbage collector, known as gc module.

The reference counting algorithm is incredibly efficient and straightforward, but it cannot detect reference cycles. That is why Python has a supplemental algorithm called generational cyclic GC, that deals with reference cycles.

The reference counting is fundamental to Python and can't be disabled, whereas the cyclic GC is optional and can be used manually.

Reference counting

Reference counting is a simple technique in which objects are deallocated when there is no reference to them in a program.

Every variable in Python is a reference (a pointer) to an object and not the actual value itself. For example, the assignment statement just adds a new reference to the right-hand side.

To keep track of references every object (even integer) has an extra field called reference count that is increased or decreased when a pointer to the object is copied or deleted. See Objects, Types and Reference Counts section, for a detailed explanation.

Examples, where the reference count increases:

  • assignment operator
  • argument passing
  • appending an object to a list (object's reference count will be increased)

If reference counting field reaches zero, CPython automatically calls the object-specific deallocation function. If an object contains references to other objects, then their reference count is decremented too. Thus other objects may be deallocated in turn. For example, when a list is deleted the reference count of all its items is decreased.

Variables, which declared outside of functions, classes, and blocks are called globals. Usually, such variables live until the end of the Python's process. Thus, the reference count of objects, which are referred by globals, never drops to 0.

Variables, which are defined inside blocks (e.g., in a function or class) have a local scope (i.e., they are local to its block). If Python interpreter exits from the block, it destroys all references created inside the block.

You can always check the number of current references using sys.getrefcount function.

Here is a simple example:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author: Chimomo


import sys

# 2 references: 1 from the foo var and 1 from getrefcount
foo = []
print(sys.getrefcount(foo))


# 4 references: from the foo var, function argument, getrefcount and Python's function stack
def bar(a):
    print(sys.getrefcount(a))


bar(foo)

# 2 references, the function scope is destroyed
print(sys.getrefcount(foo))

''' ------ Running Results ------
2
4
2

'''

The main reason why CPython uses reference counting is historical. There are a lot of debates nowadays about weaknesses of such technique. Some people claim that modern garbage collection algorithms can be more efficient without reference counting at all. The reference counting algorithm has a lot of issues, such as circular references, thread locking and memory and performance overhead.

The main advantage of such approach is that the objects can be immediately destroyed after they are no longer needed.

Generational garbage collector

Why do we need additional garbage collector when we have reference counting?

Unfortunately, classical reference counting has a fundamental problem — it cannot detect reference cycles. A reference cycle occurs when one or more objects are referencing each other.

Here are two examples:

As we can see, the 'lst' object is pointing to itself, moreover, object 1 and object 2 are pointing to each other. The reference count for such objects is always at least 1.

To get a better idea you can play with a simple Python example:

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author: Chimomo


import ctypes
import gc


# We are using ctypes to access our unreachable objects by memory address
class PyObject(ctypes.Structure):
    _fields_ = [("refcnt", ctypes.c_long)]


# Disable generational gc
gc.disable()

lst = []
lst.append(lst)

# Store address of the list
lst_address = id(lst)

# Destroy the lst reference
del lst

object_1 = dict()
object_2 = dict()
object_1['obj2'] = object_2
object_2['obj1'] = object_1

obj_address = id(object_1)

# Destroy references
del object_1, object_2

# Uncomment if you want to manually run garbage collection process
# gc.collect()

# Check the reference count
print(PyObject.from_address(obj_address).refcnt)
print(PyObject.from_address(lst_address).refcnt)

''' ------ Running Results ------
1
1

'''

In the example above, the del statement removes the references to our objects (i.e., decreases reference count by 1). After Python executes the del statement, our objects are no longer accessible from Python code. However, such objects are still sitting in the memory, that's because they are still referencing each other and the reference count of each object is 1. You can visually explore such relations using objgraph module.

To resolve this issue, the additional cycle-detecting algorithm was introduced in Python 1.5. The gc module is responsible for this and exists only for dealing with such problem.

Reference cycles can only occur in container objects (i.e., in objects which can contain other objects), such as lists, dictionaries, classes, tuples. The GC does not track all immutable types except for a tuple. Tuples and dictionaries containing only immutable objects can also be untracked depending on certain conditions. Thus, the reference counting technique handles all non-circular references.

When does the generational GC trigger

Unlike the reference counting, the cyclic GC does not work in real-time and runs periodically. To reduce the frequency of GC calls and pauses CPython uses various heuristics.

The GC classifies container objects into three generations. Every new object starts in the first generation. If an object survives a garbage collection round, it moves to the older (higher) generation. Lower generations are collected more often than higher. Because most of the newly created objects die young, it improves GC performance and reduces the GC pause time.

In order to decide when to run, each generation has an individual counter and threshold. The counter stores the number of object allocations minus deallocations since the last collection. Every time you allocate a new container object, CPython checks whenever the counter of the first generation exceeds the threshold value. If so Python initiates the сollection process.

If we have two or more generations that currently exceed the threshold, GC chooses the oldest one. That is because oldest generations are also collecting all previous (younger) generations. To reduce performance degradation for long-living objects the third generation has additional requirements in order to be chosen.

The standard threshold values are set to (700, 10, 10) respectively, but you can always check them using the gc.get_threshold function.

How to find reference cycles

It is hard to explain the reference cycle detection algorithm in a few paragraphs. But basically, the GC iterates over each container object and temporarily removes all references to container objects it references. After full iteration, all objects which reference count lower than two are unreachable from Python's code and thus can be collected.

To fully understand the cycle-finding algorithm I recommend you to read collect function from CPython's source code.

Performance tips

Cycles can easily happen in real life. Typically you encounter them in graphs, linked lists or in structures, in which you need to keep track of relations between objects. If your program has intensive workload and requires low latency, you should avoid reference cycles as possible.

To avoid circular references in your code, you need to use weak references, which are implemented in the weakref module. Unlike the usual references, the weakref.ref doesn't increase the reference count and returns None if an object was destroyed.

In some cases, it is useful to disable GC and use it manually. The automatic collection can be disabled by calling gc.disable(). To manually run collection process you need to use gc.collect().

How to find and debug reference cycles

Debugging reference cycles can be very frustrating especially when you use a lot of third-party libraries.

The standard gc module provides a lot of useful helpers that can help in debugging. If you set debugging flags to DEBUG_SAVEALL, all unreachable objects found will be appended to gc.garbage list.

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Author: Chimomo


import gc

gc.set_debug(gc.DEBUG_SAVEALL)

print(gc.get_count())
lst = []
lst.append(lst)
list_id = id(lst)
del lst
gc.collect()
print("There are %s item(s) in gc.garbage" % len(gc.garbage))
for item in gc.garbage:
    if list_id == id(item):
        print("Found ref circle, its id is: %s" % list_id)

''' ------ Running Results ------
(604, 9, 0)
There are 1 item(s) in gc.garbage
Found ref circle, its id is: 4531080

'''

Once you have identified a problematic spot in your code you can visually explore object's relations using objgraph.

Conclusion

Most of the garbage collection is done by reference counting algorithm, which we cannot tune at all. So, be aware of implementation specifics, but don't worry about potential GC problems prematurely.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值