数据结构和算法（4）：C#中的顺序存储——数组、List

JTWEI

已于 2023-08-09 15:03:48 修改

阅读量1.6k

点赞数 1

分类专栏：数据结构和算法（C#）文章标签：算法数据结构 c#

于 2023-08-09 15:01:28 首次发布

本文链接：https://blog.csdn.net/qq_37816847/article/details/131871466

版权

数据结构和算法（C#）专栏收录该内容

18 篇文章 1 订阅

订阅专栏

顺序存储结构是一种数据的物理存储方式，它将数据元素按照其逻辑顺序依次存储在一片连续的存储空间中。常见的顺序存储结构有数组和线性表。

数组

数组是一种最常见的顺序存储结构，它可以存储相同类型的数据元素，并且这些元素在内存中是连续存储的。通过使用索引来访问数组中的元素，可以快速地定位和操作数组中的数据。在数组中，插入和删除的操作可能会导致移动其他元素的位置，因此这些操作的效率较低。

数组作为一种常见的数据结构，具有一些明显的优点和缺点。

优点：

高效的随机访问：由于数组中的元素在内存中连续存储，可以通过下标直接访问元素，时间复杂度为O(1)。
存储密度高：数组只存储元素本身，不需要额外的指针或链接。这使得数组具有高存储密度，能够有效利用内存空间，尤其对于大量元素的情况下，可以节省存储开销。
多维支持：数组不仅可以是一维的，还可以是多维的。多维数组可以用于表示矩阵、图像等复杂的数据结构，并可以方便地进行元素的索引和操作。

缺点：

固定大小：数组在创建时需要指定固定的大小，一旦分配了空间，大小就无法改变。如果需要存储的元素数量超过了初始分配的大小，可能需要重新分配更大的内存空间并复制元素，这将产生额外的时间和空间开销。
插入和删除低效：由于数组大小固定且元素在内存中连续存储，插入和删除操作可能需要移动其他元素。在最坏情况下，即在数组的开头插入/删除元素时，需要将其他所有元素移动，时间复杂度为O(n)，其中n是数组的大小。
内存浪费：因为数组在创建时需要分配固定大小的连续内存区域，如果数组的大小超过了实际存储的元素数量，可能会浪费一部分内存空间。

动态数组 List（列表）

在实际的开发过程中时，我们常常无法事先知道数组的个数，并且数组的个数也不是固定的，此时，我们就会用到框架中封装好的List，用它来替代数组，因为它是可伸缩的，所以我们在写的时候不用手动去分配数组的大小。甚至有时我们也会拿它当链表使用。究竟是如何实现的呢？我们一起来分析一下C#中List的实现方法。

List<T> 源码：https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs

List<T> API： https://learn.microsoft.com/zh-cn/dotnet/api/system.collections.generic.list-1?view=net-6.0

List的概要

观察源码我们发现，List的内部是使用数组（_items）存储的,默认容量（长度）是4。

        private const int _defaultCapacity = 4;

        private T[] _items;
        [ContractPublicPropertyName("Count")]
        private int _size;
        private int _version;
        [NonSerialized]
        private Object _syncRoot;
        
        static readonly T[]  _emptyArray = new T[0];

List的构造

在C#中，List<T>类具有几个不同的构造函数，用于创建List对象并初始化其属性和字段。以下是对List<T>构造函数的分析：

1. 默认构造函数

构造一个List。列表最初是空的，容量为0。在向列表中添加第一个元素时，容量增加到16，然后根据需要以2的倍数增加。

// Constructs a List. The list is initially empty and has a capacity
// of zero. Upon adding the first element to the list the capacity is
// increased to 16, and then increased in multiples of two as required.

public List() {
    _items = _emptyArray;
}

2. 容量构造函数

构造具有给定初始容量的List。该列表最初为空，但在需要重新分配之前，将为给定数量的元素留出空间，用于存储元素，避免频繁的数组重新分配操作。

// Constructs a List with a given initial capacity. The list is
// initially empty, but will have room for the given number of elements
// before any reallocations are required.
public List(int capacity) {
    if (capacity < 0) ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity, ExceptionResource.ArgumentOutOfRange_NeedNonNegNum);
    Contract.EndContractBlock();

    if (capacity == 0)
        _items = _emptyArray;
    else
        _items = new T[capacity];
}

3. 集合构造函数

集合构造函数接受一个实现了IEnumerable<T>接口的集合对象，如Array、List、HashSet等。复制给定集合的内容。新列表的大小和容量都等于给定集合的大小。

        // Constructs a List, copying the contents of the given collection. The
        // size and capacity of the new list will both be equal to the size of the
        // given collection.
        public List(IEnumerable<T> collection) {
            if (collection==null)
                ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
            Contract.EndContractBlock();

            ICollection<T> c = collection as ICollection<T>;
            if( c != null) {
                int count = c.Count;
                if (count == 0)
                {
                    _items = _emptyArray;
                }
                else {
                    _items = new T[count];
                    c.CopyTo(_items, 0);
                    _size = count;
                }
            }    
            else {                
                _size = 0;
                _items = _emptyArray;
                // This enumerable could be empty.  Let Add allocate a new array, if needed.
                // Note it will also go to _defaultCapacity first, not 1, then 2, etc.
                
                using(IEnumerator<T> en = collection.GetEnumerator()) {
                    while(en.MoveNext()) {
                        Add(en.Current);                                    
                    }
                }
            }
        }
    }

看构造部分，我们明确了，List内部是用数组实现的，而不是链表，并且当没有给予指定容量时，初始的容量为0。当元素被添加到List中时，List的容量会根据需要自动增加，通过重新分配内部数组。

List的读取方法

1.索引器

List类实现了索引器，可以通过索引来访问和读取List中的元素。索引从0开始递增，表示元素在List中的位置。使用索引器可以像数组一样直接通过索引访问元素。例如，可以使用list[0]来读取第一个元素。

public T this[int index] {
    get {
        // Following trick can reduce the range check by one
        if ((uint) index >= (uint)_size) {
            ThrowHelper.ThrowArgumentOutOfRangeException();
        }
        Contract.EndContractBlock();
        return _items[index]; 
    }

    set {
        if ((uint) index >= (uint)_size) {
            ThrowHelper.ThrowArgumentOutOfRangeException();
        }
        Contract.EndContractBlock();
        _items[index] = value;
        _version++;
    }
}

2.枚举器

List实现了IEnumerable<T>接口，通过GetEnumerator方法返回一个枚举器（enumerator），用于遍历List中的元素。

// Returns an enumerator for this list with the given
// permission for removal of elements. If modifications made to the list 
// while an enumeration is in progress, the MoveNext and 
// GetObject methods of the enumerator will throw an exception.
//
//使用指定的权限返回此列表的枚举器，允许删除元素。如果在进行枚举时对列表进行了修改，
//则枚举器的MoveNext和GetObject方法将抛出异常。
public Enumerator GetEnumerator() {
    return new Enumerator(this);
}


[Serializable]
public struct Enumerator : IEnumerator<T>, System.Collections.IEnumerator
{
    // 声明私有字段
    private List<T> list;   // 原始List对象
    private int index;      // 当前索引
    private int version;    // List的版本号
    private T current;      // 当前元素

    // 构造函数
    internal Enumerator(List<T> list) {
        this.list = list;
        index = 0;
        version = list._version;   // 获取List的版本号
        current = default(T);
    }

    // 释放资源（接口方法）
    public void Dispose() {
    }

    // 移动到下一个元素（接口方法）
    public bool MoveNext() {
        List<T> localList = list;

        // 检查版本号和索引的有效性
        if (version == localList._version && ((uint)index < (uint)localList._size)) 
        {                                                     
            current = localList._items[index];   // 获取当前元素
            index++;   // 移动到下一个索引
            return true;   // 返回成功标志
        }
        return MoveNextRare();   // 如果版本号或索引无效，调用MoveNextRare方法进行处理
    }

    // 处理版本号或索引无效的情况
    private bool MoveNextRare()
    {                
        if (version != list._version) {
            ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
        }

        index = list._size + 1;   // 索引设置为超出范围
        current = default(T);    // 重置当前元素
        return false;   // 返回失败标志
    }

    // 获取当前元素（接口方法）
    public T Current {
        get {
            return current;
        }
    }

    // 获取当前元素（非泛型接口方法）
    Object System.Collections.IEnumerator.Current {
        get {
            if( index == 0 || index == list._size + 1) {
                 ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumOpCantHappen);
            }
            return Current;
        }
    }

    // 重置枚举器（接口方法）
    void System.Collections.IEnumerator.Reset() {
        if (version != list._version) {
            ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
        }
        
        index = 0;   // 索引重置为起始位置
        current = default(T);   // 重置当前元素
    }
}

上述代码定义了List的枚举器（Enumerator），主要用于遍历List中的元素。下面是代码的逻辑解释：

1. 首先，定义了私有字段来存储枚举器所需的状态信息，包括原始List对象、当前索引、List的版本号和当前元素。
2. 构造函数用于初始化枚举器对象。它接收一个List作为参数，并将List的相关信息赋值给枚举器的字段。
3. 实现了Dispose方法，用于释放资源，这里为空方法。
4. 实现了MoveNext方法，用于将枚举器移动到下一个元素。它首先检查List的版本号和当前索引是否有效，如果有效则获取当前元素，并将索引移动到下一个位置，然后返回true；否则调用MoveNextRare方法进行处理，返回false。
5. MoveNextRare方法用于处理版本号或索引无效的情况。如果枚举器在遍历期间发现List的版本号已经发生变化（即List在遍历过程中被修改），则抛出异常。将索引设置为超出范围，当前元素重置为默认值，并返回false。
6. 实现了Current属性，用于获取当前元素。
7. 实现了非泛型接口中的Current属性，它通过调用Current属性来获取当前元素，并在索引无效的情况下抛出异常。
8. 实现了非泛型接口中的Reset方法，用于重置枚举器的状态，将索引重置为起始位置，并重置当前元素。

重点：

其中我们需要注意 Enumerator 这个结构，每次获取迭代器时，Enumerator 每次都是被new出来，如果大量使用迭代器的话，比如foreach就会造成大量的垃圾对象，这也是为什么我们常常告诫程序员们，尽量不要用foreach，因为 List 的 foreach 会增加有新的 Enumerator 实例，最后由GC垃圾回收掉。

List的Add方法

 // Adds the given object to the end of this list. The size of the list is
// increased by one. If required, the capacity of the list is doubled
// before adding the new element.
// 将给定对象添加到此列表的末尾。列表的大小增加1。如果需要，则在添加新元素之前将列表的容量加倍。
public void Add(T item) {
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    _items[_size++] = item;
    _version++;
}

// Ensures that the capacity of this list is at least the given minimum
// value. If the currect capacity of the list is less than min, the
// capacity is increased to twice the current capacity or to min,
// whichever is larger.
private void EnsureCapacity(int min) {
    if (_items.Length < min) {
        int newCapacity = _items.Length == 0? _defaultCapacity : _items.Length * 2;
        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
        // 允许列表在遇到溢出之前增长到最大可能容量(~2G元素)。
        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
        // 注意，即使由于(uint)强制转换导致 _items.Length 溢出，此检查也能工作
        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
        if (newCapacity < min) newCapacity = min;
        Capacity = newCapacity;
    }
}

每次容量不够的时候，整个数组的容量都会扩充一倍，_defaultCapacity 是容量的默认值为4。因此整个扩充的路线为4，8，16，32，64，128，256，512，1024…依次类推。

List使用数组形式作为底层数据结构，好处是使用索引方式提取元素很快，但在扩容的时候就会很糟糕，每次new数组都会造成内存垃圾，这给垃圾回收GC带来了很多负担。

这里按2指数扩容的方式，可以为GC减轻负担，但是如果当数组连续被替换掉也还是会造成GC的不小负担，特别是代码中List频繁使用的Add时。另外，如果数量不得当也会浪费大量内存空间，比如当元素数量为 520 时，List 就会扩容到1024个元素，如果不使用剩余的504个空间单位，就造成了大部分的内存空间的浪费。

List的Remove方法

// Removes the element at the given index. The size of the list is
// decreased by one.
// 移除给定索引处的元素。列表的大小减少1。
public bool Remove(T item) {
    int index = IndexOf(item);
    if (index >= 0) {
        RemoveAt(index);
        return true;
    }

    return false;
}

void System.Collections.IList.Remove(Object item)
{
    if(IsCompatibleObject(item)) {            
        Remove((T) item);
    }
}
// Returns the index of the first occurrence of a given value in a range of
// this list. The list is searched forwards from beginning to end.
// The elements of the list are compared to the given value using the
// Object.Equals method.
// 返回给定值在此列表范围中第一次出现的索引。
// 该列表从开始到结束向前搜索。使用 Object.Equals 方法将列表中的元素与给定值进行比较。
// This method uses the Array.IndexOf method to perform the
// search.
// 这个方法使用数组。方法来执行搜索。
public int IndexOf(T item) {
    Contract.Ensures(Contract.Result<int>() >= -1);
    Contract.Ensures(Contract.Result<int>() < Count);
    return Array.IndexOf(_items, item, 0, _size);
}

int System.Collections.IList.IndexOf(Object item)
{
    if(IsCompatibleObject(item)) {            
        return IndexOf((T)item);
    }
    return -1;
}


// Removes the element at the given index. The size of the list is
// decreased by one.
// 移除给定索引处的元素。列表的大小减少1。
public void RemoveAt(int index) {
    if ((uint)index >= (uint)_size) {
        ThrowHelper.ThrowArgumentOutOfRangeException();
    }
    Contract.EndContractBlock();
    _size--;
    if (index < _size) {
        Array.Copy(_items, index + 1, _items, index, _size - index);
    }
    _items[_size] = default(T);
    _version++;
}

Remove接口中包含了 IndexOf 和 RemoveAt，其中用 IndexOf 函数是位了找到元素的索引位置，用 RemoveAt 可以删除指定位置的元素。

从源码中我们可以看到，元素删除的原理其实就是用 Array.Copy 对数组进行覆盖。IndexOf 启用的是 Array.IndexOf 接口来查找元素的索引位置，这个接口本身内部实现是就是按索引顺序从0到n对每个位置的比较，复杂度为O(n)。

List的插入Insert方法

// Inserts an element into this list at a given index. The size of the list
// is increased by one. If required, the capacity of the list is doubled
// before inserting the new element.
// 将一个元素插入到给定索引处的列表中。列表的大小增加1。
// 如果需要，则在插入新元素之前将列表的容量加倍。
public void Insert(int index, T item) {
    // Note that insertions at the end are legal.
    if ((uint) index > (uint)_size) {
        ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index, ExceptionResource.ArgumentOutOfRange_ListInsert);
    }
    Contract.EndContractBlock();
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    if (index < _size) {
        Array.Copy(_items, index, _items, index + 1, _size - index);
    }
    _items[index] = item;
    _size++;            
    _version++;
}

与Add接口一样，先检查容量是否足够，不足则扩容。从源码中获悉，Insert插入元素时，使用的用拷贝数组的形式，将数组里的指定元素后面的元素向后移动一个位置。

看到这里，可以我们明白了List的Add，Insert，IndexOf，Remove接口都是没有做过任何形式的优化，都使用的是顺序迭代的方式，如果过于频繁使用的话，会导致效率降低，也会造成不少内存的冗余，使得垃圾回收(GC)时承担了更多的压力。

List的Clear 方法

// Clears the contents of List.
// 清除List的内容。
public void Clear() {
    if (_size > 0)
    {
        //Don't need to doc this but we clear the elements 
        //so that the gc can reclaim the references.
        //不需要记录这一点，但我们清除元素，以便gc可以回收引用。
        Array.Clear(_items, 0, _size); 
        _size = 0;
    }
    _version++;
}

List的 Contains 方法

// Contains returns true if the specified element is in the List.
// It does a linear, O(n) search.  Equality is determined by calling
// item.Equals().
//
public bool Contains(T item) {
    if ((Object) item == null) {
        for(int i=0; i<_size; i++)
            if ((Object) _items[i] == null)
                return true;
        return false;
    }
    else {
        EqualityComparer<T> c = EqualityComparer<T>.Default;
        for(int i=0; i<_size; i++) {
            if (c.Equals(_items[i], item)) return true;
        }
        return false;
    }
}

从源代码中我们可以看到，Contains 接口使用的是线性查找方式比较元素，对数组进行迭代，比较每个元素与参数的实例是否一致，如果一致则返回true，全部比较结束还没有找到，则认为查找失败。

List的 ToArray 方法

// ToArray returns a new Object array containing the contents of the List.
// This requires copying the List, which is an O(n) operation.
//ToArray返回一个包含List内容的新Object数组。这需要复制List，这是一个O(n)操作。
public T[] ToArray() {
    Contract.Ensures(Contract.Result<T[]>() != null);
    Contract.Ensures(Contract.Result<T[]>().Length == Count);

    T[] array = new T[_size];
    Array.Copy(_items, 0, array, 0, _size);
    return array;
}

ToArray接口中，重新new了一个指定大小的数组，再将本身数组上的内容考别到新数组上，再返回出来。