python 3.8.2 / 内置的数据结构 / list （类似于 STL 中的 vector）

最新推荐文章于 2022-01-15 15:11:16 发布

Ruo_Xiao

最新推荐文章于 2022-01-15 15:11:16 发布

阅读量1.1k

点赞数 1

分类专栏： Python

本文链接：https://blog.csdn.net/itworld123/article/details/104966308

版权

Python 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

一、特点

（1）相对于 tuple 来说，list 是动态的（mutable），即：各个元素都是可变的。

（2）可以通过索引进行查询。

（3）list 中的元素可以是 python 中的任何对象。例如：list、tuple、dict、set、字符串和整数，并且可以任意混合。

（4）所有元素由一个中括号“[ ]”包裹。

二、相关操作

1、增

a、append() ，在 list 尾部插入元素。

b、insert()，在 list 的指定位置插入元素。

2、删

a、pop()，弹出 list 尾部元素并返回，与 append 对应。

b、remove()，删除指定元素。

c、del ，删除指定的位置的元素。

3、改

直接用“=”修改指定元素即可。

4、查

类似于数组的索引。

栗子：

if __name__ == '__main__':
    testlist = ['one', 'two', 'six']

    # 增
    # append 在 list 末尾添加元素。
    testlist.append('three')

    # insert 在指定的位置插入元素。
    testlist.insert(1, 'four')

    # 删
    # pop 删除 list 尾部元素。
    t = testlist.pop()

    # remove 删除 list 中的元素。
    testlist.remove('two')

    # del 删除 list 中指定范围的元素。
    del testlist[0:2]

    # 改
    testlist[0] = 'hello world!'

三、实现原理

1、底层的实现方式是 PyObject 类型的二维指针数组。在 python 世界中，一切都是对象，无论是 int 、string 还是 list 等，这些都继承于 PyObject ，所以 list 保存的是各个 PyObject 的指针，传指针相对于传对象效率更高。

2、空 list 的大小是 40B，即：一个描述 list 的结构体的大小。该结构体如下所示：

typedef struct {
    PyObject_VAR_HEAD      // 用来保存已使用的内存槽的数量。
    PyObject **ob_item;    // 用来保存对象的指针的指针数组。
    Py_ssize_t allocated;  // 预先分配的内存槽的总容量，即：ob_item 数组的容量。
} PyListObject;

3、当一个空的 list 执行 append 时，list 实际上会多分配一些内存，这样可以提高下次 append 的高效性，即：时间复杂度为 O(1)。分配内存的算法如下：

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

总体效果是随着对象的数量增多，单次分配的内存槽逐渐变大。

4、append() 源码实现

（1）过程简述

判断是否需要重新对 list 的内存槽数组（ob_item）进行重新分配。若需要，则申请新的内存槽数组、将旧内存槽数组中的数据拷贝到新的内存槽数组中，释放旧的内存槽数组。
将新对象的指针加入到 ob_item 的尾部。

（2）append() 函数实际上调用的是 app1() 函数。

/**
 * 向 list 的尾部添加对象。
*/
static PyObject *list_append(PyListObject *self, PyObject *object)
{
    if (app1(self, object) == 0)
        Py_RETURN_NONE;
    return NULL;
}

（3）app1 函数的执行过程如下：

static int app1(PyListObject *self, PyObject *v)
{
    /**
     * 返回当前 list 中对象的数量。
    */
    Py_ssize_t n = PyList_GET_SIZE(self);

    assert(v != NULL);
    if (n == PY_SSIZE_T_MAX)
    {
        PyErr_SetString(PyExc_OverflowError,
                        "cannot add more objects to list");
        return -1;
    }
    /**
     * 重新调整 list 的内存。
     * 创建新的内存槽数组、释放旧的内存槽数组。
    */
    if (list_resize(self, n + 1) < 0)
        return -1;
    /**
     * 对象 v 的引用计数 + 1 。
    */
    Py_INCREF(v);

    /**
     * 将对象指针插入到 list 的 ob_item 指针数据中。
    */
    PyList_SET_ITEM(self, n, v);
    return 0;
}

（4）精髓就在 list_resize() 函数中了，该函数完成了内存的重新分配。

static int list_resize(PyListObject *self, Py_ssize_t newsize)
{
    PyObject **items;
    size_t new_allocated, num_allocated_bytes;
    Py_ssize_t allocated = self->allocated;

    /* Bypass realloc() when a previous overallocation is large enough
       to accommodate the newsize.  If the newsize falls lower than half
       the allocated size, then proceed with the realloc() to shrink the list.
    */
    /**
     * （1）若 newsize <  已分配的内存槽的数量的 1/2，则重新分配内存槽数组（缩容）。
     * （2）若 newsize >= 已分配的内存槽的数量的 1/2，且 newsize =< 已分配的内存槽的数量，则无需调整。
     * （3）若 newsize >  已分配的内存槽的数量，则重新分配内存槽数组（扩容）。
     * （注意，新的 list 和旧的 list 不在同一个内存起始地址。）
    */
    if (allocated >= newsize && newsize >= (allocated >> 1))
    {
        assert(self->ob_item != NULL || newsize == 0);
        Py_SIZE(self) = newsize;
        return 0;
    }

    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
     * Note: new_allocated won't overflow because the largest possible value
     *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
     */
    /**
     * 内存分配方案，获取新的内存槽的数量。
    */
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
    if (new_allocated > (size_t)PY_SSIZE_T_MAX / sizeof(PyObject *))
    {
        PyErr_NoMemory();
        return -1;
    }

    if (newsize == 0)
        new_allocated = 0;
    /**
     * 新的内存槽数组的字节总数。
    */
    num_allocated_bytes = new_allocated * sizeof(PyObject *);
    /**
     * （1）创建新的内存槽数组；
     * （2）将旧内存槽数组中的数据拷贝到新的内存槽数组中；
     * （3）释放旧的内存槽数组。
     * obmalloc.c PyMem_realloc() 函数。
    */
    items = (PyObject **)PyMem_Realloc(self->ob_item, num_allocated_bytes);
    if (items == NULL)
    {
        PyErr_NoMemory();
        return -1;
    }
    /**
     * 将新的状态更新到 list 对象中。
    */
    self->ob_item = items;
    Py_SIZE(self) = newsize;
    self->allocated = new_allocated;
    return 0;
}

5、insert() 源码实现

（1）过程简述

判断是否需要重新对 list 的内存槽数组（ob_item）进行重新分配。若需要，则申请新的内存槽数组、将旧内存槽数组中的数据拷贝到新的内存槽数组中，释放旧的内存槽数组。
将 where 之后的元素全部向后挪一位。
将 new item 放到 where 的位置上。

（2）append() 函数实际上调用的是 ins1() 函数。

int PyList_Insert(PyObject *op, Py_ssize_t where, PyObject *newitem)
{
    if (!PyList_Check(op))
    {
        PyErr_BadInternalCall();
        return -1;
    }
    return ins1((PyListObject *)op, where, newitem);
}

（3）ins1() 源码

static int ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
{
    /**
     * 返回当前已用的内存槽的数量，即：list 中已有的元素的数量。
    */
    Py_ssize_t i, n = Py_SIZE(self);
    PyObject **items;
    if (v == NULL)
    {
        PyErr_BadInternalCall();
        return -1;
    }
    if (n == PY_SSIZE_T_MAX)
    {
        PyErr_SetString(PyExc_OverflowError,
                        "cannot add more objects to list");
        return -1;
    }
    /**
     * 调整内存槽数组。
    */
    if (list_resize(self, n + 1) < 0)
        return -1;

    if (where < 0)
    {
        where += n;
        if (where < 0)
            where = 0;
    }
    if (where > n)
        where = n;
    items = self->ob_item;
    /**
     * 将 where 之后的数据向后挪一位。
    */
    for (i = n; --i >= where;)
        items[i + 1] = items[i];
    Py_INCREF(v);
    items[where] = v;
    return 0;
}