python 对象占用内存
A memory problem may arise when a large number of objects are active in RAM during the execution of a program, especially if there are restrictions on the total amount of available memory.
当程序执行期间在RAM中有大量对象处于活动状态时,尤其是在可用内存总量受到限制的情况下,可能会出现内存问题。
Below is an overview of some methods of reducing the size of objects, which can significantly reduce the amount of RAM needed for programs in pure Python.
下面概述了一些减小对象大小的方法,这些方法可以显着减少纯Python程序所需的RAM数量。
Note: This is english version of my original post (in russian).
注意: 这是我原始帖子的英文版本(俄语)。
For simplicity, we will consider structures in Python to represent points with the coordinates x
,y
, z
with access to the coordinate values by name.
为简单起见,我们将考虑使用Python中的结构来表示具有坐标x
, y
和z
点, z
名称访问坐标值。
辞典 (Dict)
In small programs, especially in scripts, it is quite simple and convenient to use the built-in dict
to represent structural information:
在小型程序中,尤其是在脚本中,使用内置的dict
表示结构信息非常简单方便:
>>> ob = {'x':1, 'y':2, 'z':3}
>>> x = ob['x']
>>> ob['y'] = y
With the advent of a more compact implementation in Python 3.6 with an ordered set of keys, dict
has become even more attractive. However, let's look at the size of its footprint in RAM:
随着在Python 3.6中使用一组有序键的更紧凑的实现的出现, dict
变得更加有吸引力。 但是,让我们看一下它在RAM中的占用空间大小:
>>> print(sys.getsizeof(ob))
240
It takes a lot of memory, especially if you suddenly need to create a large number of instances:
这会占用大量内存,尤其是当您突然需要创建大量实例时:
Number of instances | Size of objects |
---|---|
1 000 000 | 240 Mb |
10 000 000 | 2.40 Gb |
100 000 000 | 24 Gb |
实例数 | 物件大小 |
---|---|
1000000 | 240兆 |
一亿 | 2.40 Gb |
一亿 | 24 Gb |
类实例 (Class instance)
For those who like to clothe everything in classes, it is preferable to define structures as a class with access by attribute name:
对于那些喜欢将所有东西都穿上衣服的人,最好将结构定义为可以通过属性名称访问的类:
class Point:
#
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
>>> ob = Point(1,2,3)
>>> x = ob.x
>>> ob.y = y
The structure of the class instance is interesting:
类实例的结构很有趣:
Field | Size (bytes) |
---|---|
PyGC_Head | 24 |
PyObject_HEAD | 16 |
__weakref__ | 8 |
__dict__ | 8 |
TOTAL: | 56 |
领域 | 大小(字节) |
---|---|
PyGC_Head | 24 |
PyObject_HEAD | 16 |
__弱引用__ | 8 |
__dict__ | 8 |
总: | 56 |
Here __weakref__
is a reference to the list of so-called weak references to this object, the field__dict__
is a reference to the class instance dictionary, which contains the values of instance attributes (note that 64-bit references platform occupy 8 bytes). Starting in Python 3.3, the shared space is used to store keys in the dictionary for all instances of the class. This reduces the size of the instance trace in RAM:
这里__weakref__
是对该对象的所谓弱引用列表的引用,字段__dict__
是对类实例字典的引用,该类实例字典包含实例属性的值(请注意,64位引用平台占用8个字节)。 从Python 3.3开始,共享空间用于在类的所有实例的字典中存储键。 这样可以减少RAM中实例跟踪的大小:
>>> print(sys.getsizeof(ob), sys.getsizeof(ob.__dict__))
56 112
As a result, a large number of class instances have a smaller footprint in memory than a regular dictionary (dict
):
结果,大量的类实例在内存中的占用空间比常规字典( dict
)小:
Number of instances | Size |
---|---|
1 000 000 | 168 Mb |
10 000 000 | 1.68 Gb |
100 000 000 | 16.8 Gb |
实例数 | 尺寸 |
---|---|
1000000 | 168兆字节 |
一亿 | 1.68 Gb |
一亿 | 16.8 Gb |
It is easy to see that the size of the instance in RAM is still large due to the size of the dictionary of the instance.
不难发现,由于实例字典的大小,RAM中实例的大小仍然很大。
具有__slots__的类的实例 (Instance of class with __slots__)
A significant reduction in the size of a class instance in RAM is achieved by eliminating __dict__
and__weakref__
. This is possible with the help of a "trick" with __slots__
:
通过消除__dict__
和__weakref__
,可以大大减少RAM中类实例的大小。 这可以通过__slots__
的“技巧”来实现:
class Point:
__slots__ = 'x', 'y', 'z'
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
>>> ob = Point(1,2,3)
>>> print(sys.getsizeof(ob))
64
The object size in RAM has become significantly smaller:
RAM中的对象大小已大大缩小:
Field | Size (bytes) |
---|---|
PyGC_Head | 24 |
PyObject_HEAD | 16 |
x | 8 |
y | 8 |
z | 8 |
TOTAL: | 64 |
领域 | 大小(字节) |
---|---|
PyGC_Head | 24 |
PyObject_HEAD | 16 |
X | 8 |
ÿ | 8 |
ž | 8 |
总: | 64 |
Using __slots__
in the class definition causes the footprint of a large number of instances in memory to be significantly reduced:
在类定义中使用__slots__
可以显着减少内存中大量实例的占用空间:
Number of instances | Size |
---|---|
1 000 000 | 64 Mb |
10 000 000 | 640 Mb |
100 000 000 | 6.4 Gb |
实例数 | 尺寸 |
---|---|
1000000 | 64兆 |
一亿 | 640 Mb |
一亿 | 6.4 Gb |
Currently, this is the main method of substantially reducing the memory footprint of an instance of a class in RAM.
当前,这是实质上减少RAM中某个类实例的内存占用量的主要方法。
This reduction is achieved by the fact that in the memory after the title of the object, object references are stored — the attribute values, and access to them is carried out using special descriptors that are in the class dictionary:
这种减少是通过以下事实实现的:在对象标题之后的存储器中存储了对象引用(属性值),并使用类字典中的特殊描述符执行对它们的访问:
>>> pprint(Point.__dict__)
mappingproxy(
....................................
'x': <member 'x' of 'Point' objects>,
'y': <member 'y' of 'Point' objects>,
'z': <member 'z' of 'Point' objects>})
To automate the process of creating a class with __slots__
, there is a library [namedlist] (https://pypi.org/project/namedlist). The namedlist.namedlist
function creates a class with __slots__
:
为了自动化使用__slots__
创建类的过程,有一个库[namedlist]( https://pypi.org/project/namedlist )。 namedlist.namedlist
函数使用__slots__
创建一个类:
>>> Point = namedlist('Point', ('x', 'y', 'z'))
Another package [attrs] (https://pypi.org/project/attrs) allows you to automate the process of creating classes both with and without __slots__
.
另一个包[attrs]( https://pypi.org/project/attrs )允许您自动创建带有和不带有__slots__
类的过程。
元组 (Tuple)
Python also has a built-in type tuple
for representing immutable data structures. A tuple is a fixed structure or record, but without field names. For field access, the field index is used. The tuple fields are once and for all associated with the value objects at the time of creating the tuple instance:
Python还具有一个内置的tuple
用于表示不可变的数据结构。 元组是固定的结构或记录,但没有字段名称。 对于字段访问,使用字段索引。 在创建元组实例时,元组字段与值对象一劳永逸:
>>> ob = (1,2,3)
>>> x = ob[0]
>>> ob[1] = y # ERROR
Instances of tuple are quite compact:
元组的实例非常紧凑:
>>> print(sys.getsizeof(ob))
72
They occupy 8 bytes in memory more than instances of classes with __slots__
, since the tuple trace in memory also contains a number of fields:
它们在内存中比带有__slots__
的类的实例多占用8个字节,因为内存中的元组跟踪还包含许多字段:
Field | Size (bytes) |
---|---|
PyGC_Head | 24 |
PyObject_HEAD | 16 |
ob_size | 8 |
[0] | 8 |
[1] | 8 |
[2] | 8 |
TOTAL: | 72 |
领域 | 大小(字节) |
---|---|
PyGC_Head | 24 |
PyObject_HEAD | 16 |
ob_size | 8 |
[0] | 8 |
[1] | 8 |
[2] | 8 |
总: | 72 |
元组 (Namedtuple)
Since the tuple is used very widely, one day there was a request that you could still have access to the fields and by name too. The answer to this request was the module collections.namedtuple
.
由于元组的使用非常广泛,因此有一天,有人要求您仍然可以访问字段,也可以按名称访问。 该请求的答案是模块collections.namedtuple
。
The namedtuple
function is designed to automate the process of generating such classes:
namedtuple
函数旨在使生成此类的过程自动化:
>>> Point = namedtuple('Point', ('x', 'y', 'z'))
It creates a subclass of tuple, in which descriptors are defined for accessing fields by name. For our example, it would look something like this:
它创建一个元组的子类,其中定义了描述符,用于按名称访问字段。 对于我们的示例,它看起来像这样:
class Point(tuple):
#
@property
def _get_x(self):
return self[0]
@property
def _get_y(self):
return self[1]
@property
def _get_z(self):
return self[2]
#
def __new__(cls, x, y, z):
return tuple.__new__(cls, (x, y, z))
All instances of such classes have a memory footprint identical to that of a tuple. A large number of instances leave a slightly larger memory footprint:
这样的类的所有实例都具有与元组相同的内存占用量。 大量实例留下的内存占用量略大:
Number of instances | Size |
---|---|
1 000 000 | 72 Mb |
10 000 000 | 720 Mb |
100 000 000 | 7.2 Gb |
实例数 | 尺寸 |
---|---|
1000000 | 72兆 |
一亿 | 720 Mb |
一亿 | 7.2 Gb |
记录类:不带循环GC的可变namedtuple (Recordclass: mutable namedtuple without cyclic GC)
Since the tuple
and, accordingly, namedtuple
-classes generate immutable objects in the sense that attribute ob.x
can no longer be associated with another value object, a request for a mutable namedtuple variant has arisen. Since there is no built-in type in Python that is identical to the tuple that supports assignments, many options have been created. We will focus on [recordclass] (https://pypi.org/project/recordclass), which received a rating of [stackoverflow] (https://stackoverflow.com/questions/29290359/existence-of-mutable-named-tuple-in -python / 29419745). In addition it can be used to reduce the size of objects in RAM compared to the size of tuple
-like objects..
由于tuple
,并且因此, namedtuple
-班生成在感觉属性不可变对象ob.x
可以不再与另一个值对象相关联,一个可变namedtuple变体的请求已经产生。 由于Python中没有与支持分配的元组相同的内置类型,因此创建了许多选项。 我们将重点关注[recordclass]( https://pypi.org/project/recordclass ),它获得了[stackoverflow]( https://stackoverflow.com/questions/29290359/existence-of-mutable-named-元组 -python / 29419745)。 此外,与tuple
的对象相比,它可用于减少RAM中的对象大小。
The package recordclass introduces the type recordclass.mutabletuple
, which is almost identical to the tuple, but also supports assignments. On its basis, subclasses are created that are almost completely identical to namedtuples, but also support the assignment of new values to fields (without creating new instances). The recordclass
function, like thenamedtuple
function, allows you to automate the creation of these classes:
包recordclass引入了recordclass.mutabletuple
类型,该类型与元组几乎相同,但也支持赋值。 在此基础上,创建的子类几乎与namedtuples完全相同,但也支持将新值分配给字段(无需创建新实例)。 该recordclass
功能,如namedtuple
功能,可以让你自动完成这些课程的创建:
>>> Point = recordclass('Point', ('x', 'y', 'z'))
>>> ob = Point(1, 2, 3)
Class instances have same structure as tuple
, but only withoutPyGC_Head
:
类实例具有与tuple
相同的结构,但仅不PyGC_Head
:
Field | Size (bytes) |
---|---|
PyObject_HEAD | 16 |
ob_size | 8 |
x | 8 |
y | 8 |
y | 8 |
TOTAL: | 48 |
领域 | 大小(字节) |
---|---|
PyObject_HEAD | 16 |
ob_size | 8 |
X | 8 |
ÿ | 8 |
ÿ | 8 |
总: | 48 |
By default, the recordclass
function create a class that does not participate in the cyclic garbage collection mechanism. Typically, namedtuple
andrecordclass
are used to generate classes representing records or simple (non-recursive) data structures. Using them correctly in Python does not generate circular references. For this reason, in the wake of instances of classes generated by recordclass
, by default, the
PyGC_Headfragment is excluded, which is necessary for classes supporting the cyclic garbage collection mechanism (more precisely: in the
PyTypeObjectstructure, corresponding to the created class, in the
flagsfield, by default, the flag
Py_TPFLAGS_HAVE_GC` is not set).
默认情况下, recordclass
函数创建一个不参与循环垃圾收集机制的类。 典型地, namedtuple
和recordclass
被用来产生表示记录或简单(非递归)的数据结构的类。 在Python中正确使用它们不会生成循环引用。 出于这个原因, default, the
由recordclass
生成的类的实例唤醒后default, the
PyGC_Head fragment is excluded, which is necessary for classes supporting the cyclic garbage collection mechanism (more precisely: in the
PyTypeObject structure, corresponding to the created class, in the
标志field, by default, the flag
未设置field, by default, the flag
Py_TPFLAGS_HAVE_GC`)。
The size of the memory footprint of a large number of instances is smaller than that of instances of the class with __slots__
:
大量实例的内存占用空间小于具有__slots__
的类的实例的内存占用空间:
Number of instances | Size |
---|---|
1 000 000 | 48 Mb |
10 000 000 | 480 Mb |
100 000 000 | 4.8 Gb |
实例数 | 尺寸 |
---|---|
1000000 | 48兆 |
一亿 | 480兆位 |
一亿 | 4.8 Gb |
数据对象 (Dataobject)
Another solution proposed in the recordclass library is based on the idea: use the same storage structure in memory as in class instances with __slots__
, but do not participate in the cyclic garbage collection mechanism. Such classes are generated using the recordclass.make_dataclass
function:
记录类库中提出的另一种解决方案基于以下想法:在内存中使用与具有__slots__
类实例中相同的存储结构,但不参与循环垃圾收集机制。 这些类是使用recordclass.make_dataclass
函数生成的:
>>> Point = make_dataclass('Point', ('x', 'y', 'z'))
The class created in this way, by default, creates mutable instances.
默认情况下,以这种方式创建的类将创建可变实例。
Another way – use class declaration with inheritance from recordclass.dataobject
:
另一种方法–使用具有从recordclass.dataobject
继承的类声明:
class Point(dataobject):
x:int
y:int
z:int
Classes created in this way will create instances that do not participate in the cyclic garbage collection mechanism. The structure of the instance in memory is the same as in the case with __slots__
, but without the PyGC_Head
:
以这种方式创建的类将创建不参与循环垃圾收集机制的实例。 内存中实例的结构与使用__slots__
的情况相同,但没有PyGC_Head
:
Field | Size (bytes) |
---|---|
PyObject_HEAD | 16 |
x | 8 |
y | 8 |
y | 8 |
TOTAL: | 40 |
领域 | 大小(字节) |
---|---|
PyObject_HEAD | 16 |
X | 8 |
ÿ | 8 |
ÿ | 8 |
总: | 40 |
>>> ob = Point(1,2,3)
>>> print(sys.getsizeof(ob))
40
To access the fields, special descriptors are also used to access the field by its offset from the beginning of the object, which are located in the class dictionary:
要访问这些字段,还使用特殊的描述符通过其距对象开头的偏移量来访问该字段,这些描述符位于类字典中:
mappingproxy({'__new__': <staticmethod at 0x7f203c4e6be0>,
.......................................
'x': <recordclass.dataobject.dataslotgetset at 0x7f203c55c690>,
'y': <recordclass.dataobject.dataslotgetset at 0x7f203c55c670>,
'z': <recordclass.dataobject.dataslotgetset at 0x7f203c55c410>})
The sizeo of the memory footprint of a large number of instances is the minimum possible for CPython:
对于CPython,大量实例的内存占用量是最小的:
Number of instances | Size |
---|---|
1 000 000 | 40 Mb |
10 000 000 | 400 Mb |
100 000 000 | 4.0 Gb |
实例数 | 尺寸 |
---|---|
1000000 | 40兆 |
一亿 | 400兆 |
一亿 | 4.0 Gb |
赛顿 (Cython)
There is one approach based on the use of [Cython] (https://cython.org). Its advantage is that the fields can take on the values of the C language atomic types. Descriptors for accessing fields from pure Python are created automatically. For example:
有一种基于[Cython]( https://cython.org )使用的方法。 它的优点是这些字段可以采用C语言原子类型的值。 将自动创建用于从纯Python访问字段的描述符。 例如:
cdef class Python:
cdef public int x, y, z
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z
In this case, the instances have an even smaller memory size:
在这种情况下,实例的内存甚至更小:
>>> ob = Point(1,2,3)
>>> print(sys.getsizeof(ob))
32
The instance trace in memory has the following structure:
内存中的实例跟踪具有以下结构:
Field | Size (bytes) |
---|---|
PyObject_HEAD | 16 |
x | 4 |
y | 4 |
y | 4 |
пусто | 4 |
TOTAL: | 32 |
领域 | 大小(字节) |
---|---|
PyObject_HEAD | 16 |
X | 4 |
ÿ | 4 |
ÿ | 4 |
пусто | 4 |
总: | 32 |
The size of the footprint of a large number of copies is less:
大量副本的占用空间较小:
Number | Size |
---|---|
1 000 000 | 32 Mb |
10 000 000 | 320 Mb |
100 000 000 | 3.2 Gb |
数 | 尺寸 |
---|---|
1000000 | 32兆位 |
一亿 | 320兆 |
一亿 | 3.2 Gb |
However, it should be remembered that when accessing from Python code, a conversion from int
to a Python object and vice versa will be performed every time.
但是,应该记住,从Python代码访问时,每次都会执行从int
到Python对象的转换,反之亦然。
脾气暴躁的 (Numpy)
Using multidimensional arrays or arrays of records for a large amount of data gives a gain in memory. However, for efficient processing in pure Python, you should use processing methods that focus on the use of functions from the numpy
package.
对大量数据使用多维数组或记录数组可提高内存利用率。 但是,为了在纯Python中进行高效处理,您应该使用专注于numpy
包中函数使用的处理方法。
>>> Point = numpy.dtype(('x', numpy.int32), ('y', numpy.int32), ('z', numpy.int32)])
An array of N
elements, initialized with zeros, is created using the function:
使用以下函数创建由零初始化的N
元素的数组:
>>> points = numpy.zeros(N, dtype=Point)
The size of the array in memory is the minimum possible:
内存中数组的大小是最小的:
Number of objects | Size |
---|---|
1 000 000 | 12 Mb |
10 000 000 | 120 Mb |
100 000 000 | 1.20 Gb |
物件数量 | 尺寸 |
---|---|
1000000 | 12兆 |
一亿 | 120兆 |
一亿 | 1.20 Gb |
Normal access to array elements and rows will require convertion from a Python object to a C int
value and vice versa. Extracting a single row results in the creation of an array containing a single element. Its trace will not be so compact anymore:
对数组元素和行的常规访问将需要从Python对象转换为C int
值,反之亦然。 提取单个行将导致创建包含单个元素的数组。 它的踪迹不再那么紧凑:
>>> sys.getsizeof(points[0])
68
Therefore, as noted above, in Python code, it is necessary to process arrays using functions from the numpy
package.
因此,如上所述,在Python代码中,有必要使用numpy
包中的函数处理数组。
结论 (Conclusion)
On a clear and simple example, it was possible to verify that the Python programming language (CPython) community of developers and users has real possibilities for a significant reduction in the amount of memory used by objects.
在一个简单明了的示例中,可以验证开发人员和用户的Python编程语言(CPython)社区是否确实有可能大幅减少对象使用的内存量。
python 对象占用内存