目标:
弄清楚python list的源码以及内存分配机制
1 主入口
https://github.com/python/cpython/blob/master/Include/listobject.h
https://github.com/python/cpython/blob/master/Objects/listobject.c
代码如下:
#ifndef Py_LIMITED_API
typedef struct {
PyObject_VAR_HEAD
/* Vector of pointers to list elements. list[0] is ob_item[0], etc. */
PyObject **ob_item;
/* ob_item contains space for 'allocated' elements. The number
* currently in use is ob_size.
* Invariants:
* 0 <= ob_size <= allocated
* len(list) == ob_size
* ob_item == NULL implies ob_size == allocated == 0
* list.sort() temporarily sets allocated to -1 to detect mutations.
*
* Items must normally not be NULL, except during construction when
* the list is not yet visible outside the function that builds it.
*/
Py_ssize_t allocated;
} PyListObject;
#endif
分析:
1.1) 列表参数有allocated和ob_size。
allocated:记录已经申请的内存大小
ob_size: 记录当前列表中实际使用的大小
1.2) 列表创建
通过PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size); 方法来创建
该方法需要指定 创建列表容量的大小
代码大小:
PyList_New(Py_ssize_t size)
{
PyListObject *op;
#ifdef SHOW_ALLOC_COUNT
static int initialized = 0;
if (!initialized) {
Py_AtExit(show_alloc);
initialized = 1;
}
#endif
if (size < 0) {
PyErr_BadInternalCall();
return NULL;
}
if (numfree) {
numfree--;
op = free_list[numfree];
_Py_NewReference((PyObject *)op);
#ifdef SHOW_ALLOC_COUNT
count_reuse++;
#endif
} else {
op = PyObject_GC_New(PyListObject, &PyList_Type);
if (op == NULL)
return NULL;
#ifdef SHOW_ALLOC_COUNT
count_alloc++;
#endif
}
if (size <= 0)
op->ob_item = NULL;
else {
op->ob_item = (PyObject **) PyMem_Calloc(size, sizeof(PyObject *));
if (op->ob_item == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();
}
}
Py_SIZE(op) = size;
op->allocated = size;
_PyObject_GC_TRACK(op);
return (PyObject *) op;
}
分析:
1.2.1) 新建一个列表的逻辑为:
检查新建列表大小不能超过内存大小
如果没有超过,则申请内存创建列表,申请内存的写法类似
C语言中malloc的样式。
设置allocated为传入新建列表的参数大小。
1.3 向列表中添加元素的处理
通过 PyList_Append(PyObject *op, PyObject *newitem) 方法来实现
代码如下:
PyList_Append(PyObject *op, PyObject *newitem)
{
if (PyList_Check(op) && (newitem != NULL))
return app1((PyListObject *)op, newitem);
PyErr_BadInternalCall();
return -1;
}
分析:
1.3.1)调用了app1(PyListObject *self, PyObject *v)
代码如下:
static int
app1(PyListObject *self, PyObject *v)
{
Py_ssize_t n = PyList_GET_SIZE(self);
assert (v != NULL);
if (n == PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"cannot add more objects to list");
return -1;
}
if (list_resize(self, n+1) < 0)
return -1;
Py_INCREF(v);
PyList_SET_ITEM(self, n, v);
return 0;
}
分析:
调用了list_resize方法。
如果返回list_resize返回<0表示内存分配失败,
否则表示内存分配成功,此时设置list上指定位置元素为待添加元素。
1.3.2) 分析list_resize方法
static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
PyObject **items;
size_t new_allocated, num_allocated_bytes;
Py_ssize_t allocated = self->allocated;
/* Bypass realloc() when a previous overallocation is large enough
to accommodate the newsize. If the newsize falls lower than half
the allocated size, then proceed with the realloc() to shrink the list.
*/
if (allocated >= newsize && newsize >= (allocated >> 1)) {
assert(self->ob_item != NULL || newsize == 0);
Py_SIZE(self) = newsize;
return 0;
}
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
* Note: new_allocated won't overflow because the largest possible value
* is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
*/
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
if (new_allocated > (size_t)PY_SSIZE_T_MAX / sizeof(PyObject *)) {
PyErr_NoMemory();
return -1;
}
if (newsize == 0)
new_allocated = 0;
num_allocated_bytes = new_allocated * sizeof(PyObject *);
items = (PyObject **)PyMem_Realloc(self->ob_item, num_allocated_bytes);
if (items == NULL) {
PyErr_NoMemory();
return -1;
}
self->ob_item = items;
Py_SIZE(self) = newsize;
self->allocated = new_allocated;
return 0;
}
分析:
1.3.1.1)
对应处理逻辑如下:
调用list_resize方法检查是否需要申请内存,添加元素。
具体就是当:
注: newsize是加入新的元素之后的大小,allocated是当前已经给list申请的内存大小
allocated开根号<=newsize<=allocated,则表示内存充足,直接返回0;
否则,令新分配的大小是:
new_allocated = newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6));
这样做是为了让内存不会超过最大可分配的大小;
然后通过
num_allocated_bytes = new_allocated * sizeof(PyObject *);
计算真正需要分配的内存大小,调用PyMem_Realloc真正分配内存,
最后令list的大小为newsize,令申请的内存allocated为刚才申请的内存。
2 总结list
list在cpython中表现为一个结构体,包含ob_size和allocated成员变量。
其中ob_size表示list中实际数组元素多少,allocated表示数组已经分配的内存。
list作为可变数组,其实现可变数组的核心在于,每次添加新的元素时,会检查内存是否可以
容纳新添加的元素,如果不能,则会按照如下方式分配新的内存:
new_allocated = size + (size >> 3) + (size < 9 ? 3 : 6)
然后将新元素添加到指定数组指定位置。
这样做的原因是:
最大可申请大小是:
PY_SSIZE_T_MAX * (9 / 8) + 6
这样申请,确保不会溢出。
可以看到,这个和C++中类似每次翻倍申请内存的方式并不一样(印象中是,不确定,欢迎纠正)。
但是原理是相通的。
需要指出,因为是可变数组,所以都是不断申请新的内存,那么这个操作就是在堆上进行,
而不是在栈上。
参考:
https://blog.csdn.net/lucky404/article/details/79596319
https://github.com/python/cpython/blob/master/Include/listobject.h
https://github.com/python/cpython/blob/master/Objects/listobject.c