C＃中的适当列表支持

最新推荐文章于 2021-08-20 10:16:34 发布

cunhan4654

最新推荐文章于 2021-08-20 10:16:34 发布

阅读量158

点赞数

文章标签：数据结构 java python 数据库大数据

原文链接：https://www.codeproject.com/Articles/5251439/Proper-List-Support-in-Csharp

版权

介绍 (Introduction)

This article is about coding collection and list classes. It will walk you through the particulars of implementing IEnumerable<T>, IEnumerator<T>, ICollection<T>, and IList<T> over custom data structures.

本文是关于编码收集和列表类的。它将引导您详细介绍在自定义数据结构上实现IEnumerable<T> ， IEnumerator<T> ， ICollection<T>和IList<T>的细节。

The reason for this article is the topic isn't well covered and there are several pitfalls when working with these interfaces that aren't clearly documented by Microsoft. For that reason, we will be breaking down the development of such a class into manageable steps.

本文的原因是该主题没有很好地涵盖，并且使用这些接口时，Microsoft并未明确记录这些陷阱。因此，我们将把此类的开发分解为可管理的步骤。

背景 (Background)

List classes in .NET provide foreach enumeration, indexed addressing, and general storage for an arbitrary, variable count of items. They work much like an array in .NET except they are intrinsically resizable. In .NET, arrays themselves implement IList<T>.

.NET中的列表类提供foreach枚举，索引寻址和用于任意可变项计数的常规存储。它们的工作方式与.NET中的array非常相似，不同之处在于它们本质上可调整大小。在.NET中， arrays本身实现IList<T> 。

The primary .NET class for using lists is List<T>. It provides a full featured list over an internal array, which is resized as needed.

使用列表的主要.NET类是List<T> 。它在内部阵列上提供了完整的功能列表，并根据需要调整大小。

Often, this is efficient and appropriate, but in some cases, it may not be. Microsoft provides an alternative in its LinkedList<T> class which stores values as a linked list instead of an array.

通常，这是有效且适当的，但在某些情况下可能并非如此。 Microsoft在其LinkedList<T>类中提供了另一种方法，该方法将值存储为链接列表而不是数组。

We'll be implementing our own linked list. While using Microsoft's highly optimized class is preferable, this class will serve us well as an illustration. In doing so, we'll cover all of the major components and pitfalls of implementing a list so that your real world lists can work well and efficiently in practice.

我们将实现自己的链接列表。虽然最好使用Microsoft高度优化的类，但该类将很好地帮助我们进行说明。在此过程中，我们将介绍实现列表的所有主要组成部分和陷阱，以便您的现实世界列表在实践中可以良好且高效地工作。

编码此混乱 (Coding this Mess)

I've broken up the major interfaces into several partial classes in order to keep things clear.

为了将内容弄清楚，我将主要接口分为几个子类。

Let's start with our data structure itself in LinkedList.cs.

让我们从LinkedList.cs中的数据结构本身开始。

// the basic LinkedList<T> core
public partial class LinkedList<T>
{
    // a node class holds the actual entries.
    private class _Node
    {
        // holds the value of the current entry
        public T Value = default(T);
        // holds the next instance in the list
        public _Node Next = null;
    }
    // we must always point to the root, or null
    // if empty
    _Node _root=null;
}

By itself, this isn't a list at all! It implements none of the relevant interfaces, but it does contain a nested class which holds our node structure. This node is a single node in the linked list. It holds a value of type T and the next node as fields. The outer class also contains a field that holds the first node in our list, or null if the list is empty.

就其本身而言，这根本不是列表！它没有实现任何相关接口，但确实包含一个嵌套类，该类保留了我们的节点结构。该节点是链接列表中的单个节点。它拥有类型T的值和下一个节点作为字段。外部类还包含一个字段，用于保存列表中的第一个节点；如果列表为空，则为null 。

I like to start coding the list interfaces by implementing IEnumerable<T>/IEnumerator<T> which enables foreach because it's such a fundamental operation, and it doesn't have any code dependencies (usually) except for the data structure itself. Let's visit LinkedList.Enumerable.cs:

我喜欢通过实现IEnumerable<T> / IEnumerator<T>开始编码列表接口，这启用了foreach因为它是一个基本操作，除了数据结构本身，它没有任何代码依赖性(通常)。让我们访问LinkedList.Enumerable.cs ：

partial class LinkedList<T> : IEnumerable<T>
{
    // versioning is used so that if the collection is changed
    // during a foreach/enumeration, the enumerator will know to
    // throw an exception. Every time we add, remove, insert,
    // set, or clear, we increment the version. This is used in 
    // turn by the enumeration class, which checks it before 
    // performing any significant operation.
    int _version = 0;

    // all this does is return a new instance of our Enumeration
    // struct
    public IEnumerator<T> GetEnumerator()
    {
        return new Enumerator(this);
    }
    // legacy collection support (required)
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        => GetEnumerator();

    // this is the meat of our enumeration capabilities
    struct Enumerator : IEnumerator<T>
    {
        // we need a link to our outer class for finding the
        // root node and for versioning.
        // see the _version field comments in LinkedList<T>
        LinkedList<T> _outer;
        // hold the value of _outer._version we got when the
        // enumerator was created. This is compared to the 
        // _outer._version field to see if it has changed.
        int _version;
        // this is our state so we know where we are while
        // enumerating. -1 = Initial, -2 = Past end,
        // -3 = Disposed, and 0 means enumerating
        int _state;
        // the current node we're on.
        _Node _current;
        
        public Enumerator(LinkedList<T> outer)
        {
            _outer = outer;
            _version = outer._version;
            _current = null;
            _state = -1;
        }
        // reset the enumeration to the initial state
        // if not disposed.
        public void Reset()
        {
            _CheckDisposed();
            _current = null;
            _state = -1;
        }
        // just set the state to inform the class
        void IDisposable.Dispose()
        {
            _state = -3;
        }
        // performs the meat of our _Node
        // traversal
        public bool MoveNext()
        {
            // can't enum if disposed
            _CheckDisposed();
            switch (_state)
            {
                case -1: // initial
                    _CheckVersion();
                    // just set the current to the root
                    _current = _outer._root;
                    if (null == _current)
                    {
                        // our enumeration is empty
                        _state = -2;
                        return false;
                    }
                    // we're enumerating
                    _state = 0;
                    break;
                case -2: // past the end
                    return false;
                case 0: // enumerating
                    _CheckVersion();
                    // just move to the next
                    // _Node instance
                    // if it's null, stop enumerating
                    _current = _current.Next;
                    if (null == _current)
                    {
                        _state = -2;
                        return false;
                    }
                    break;
            }
            return true;
        }
        public T Current {
            get {
                switch (_state)
                {
                    // throw the appropriate
                    // error if necessary
                    case -1:
                        throw new InvalidOperationException
                              ("The enumeration is before the start position.");
                    case -2:
                        throw new InvalidOperationException
                              ("The enumeration is past the end position.");
                    case -3:
                        // always throws
                        _CheckDisposed(); 
                        break;
                }
                _CheckVersion();
                return _current.Value;
            }
        }
        // legacy support (required)
        object System.Collections.IEnumerator.Current => Current;
        // throws if disposed
        void _CheckDisposed()
        {
            if (-3 == _state)
                throw new ObjectDisposedException(GetType().FullName);
        }
        // throws if versions don't match
        void _CheckVersion()
        {
            if (_version != _outer._version)
                throw new InvalidOperationException("The enumeration has changed.");
        }
    }
}

This is a lot to chew on, and you might be wondering why we don't simply use C# iterators via the yield syntax, but there's a very good reason, and that reason is versioning and resetting.

这有很多需要注意的地方，您可能想知道为什么我们不仅仅通过yield语法使用C＃迭代器，但是有一个很好的理由，那就是版本控制和重置 。

To provide a proper implementation of IEnumerator<T> over a collection, one must keep track of the collection to make sure it hasn't changed while we're enumerating items. This is done using the _version fields. Each time the collection is changed, the version is incremented. This field is compared with the enumerator's copy to see if they are the same. If they are not, the enumerator will throw. C# iterators do not provide this functionality, but it is necessary for a complete, proper, and robust implementation of a list.

为了在一个集合上提供IEnumerator<T>的正确实现，必须跟踪该集合以确保在枚举项目时它没有发生变化。这是使用_version字段完成的。每次更改集合时，版本都会增加。将该字段与枚举器的副本进行比较，以查看它们是否相同。如果不是，则枚举器将抛出。 C＃迭代器不提供此功能，但对于完整，正确且健壮的列表实现而言，这是必需的。

In addition, iterators do not support Reset(). While this is not used by foreach, it can be used by other consumers, and generally, it is expected to work.

此外，迭代器不支持Reset() 。尽管foreach不会使用它，但其他使用者也可以使用它，并且通常可以使用。

If you really don't want to implement all of this, you can use C# iterator support but keep in mind, that's not a canonical nor complete implementation of a list's enumerator.

如果您确实不想实现所有这些功能，则可以使用C＃迭代器支持，但请记住，这不是列表枚举器的规范或完整实现。

Note the use of the _state field. Its primary purpose is for error handling, so we can distinguish between being at the beginning, the end, or disposed. Note also that we're always checking the version and checking for disposal on virtually every operation. This is important to make the enumerator robust.

请注意_state字段的使用。它的主要目的是用于错误处理，因此我们可以区分是在开始，结束还是处置。另请注意，我们始终在检查版本并检查几乎所有操作的处置情况。这对于使枚举器变得强大很重要。

As with all enumerator implementations, MoveNext() is the heart of the enumerator and its job is to advance the cursor by one. It returns true if there are more items to read, or false to indicate that there are no more items. In this routine, we're simply iterating through the linked list, and updating the state as appropriate. Meanwhile, Current, Reset(), and Dispose() are all straightforward.

与所有枚举器实现一样， MoveNext()是枚举器的核心，其工作是将游标前进一个。如果有更多项目要读取，则返回true否则返回false以指示没有更多项目。在此例程中，我们只是简单地遍历链表，并适当地更新状态。同时， Current ， Reset()和Dispose()都很简单。

Next, we have the ICollection<T> which essentially provides a basic interface for storing and retrieving items in our class.

接下来，我们有ICollection<T> ，它本质上提供了用于存储和检索类中项目的基本接口。

// linked list collection interface implementation
partial class LinkedList<T> : ICollection<T>
{
    // we keep a cached _count field so that
    // we don't need to enumerate the entire 
    // list to find the count. it is altered
    // whenever items are added, removed, or
    // inserted.
    int _count=0;
    public int Count {
        get {
            return _count;
        }
    }
    // returns true if one of the nodes has 
    // the specified value
    public bool Contains(T value)
    {
        // start at the root
        var current = _root;
        while(null!=current)
        {
            // enumerate and check each value
            if (Equals(value, current.Value))
                return true;
            current = current.Next;
        }
        return false;
    }
    public void CopyTo(T[] array,int index)
    {
        var i = _count;
        // check our parameters for validity
        if (null == array)
            throw new ArgumentNullException(nameof(array));
        if (1 != array.Rank || 0 != array.GetLowerBound(0))
            throw new ArgumentException("The array is not an SZArray", nameof(array));
        if (0 > index)
            throw new ArgumentOutOfRangeException(nameof(index), 
                  "The index cannot be less than zero.");
        if (array.Length<=index)
            throw new ArgumentOutOfRangeException(nameof(index), 
                  "The index cannot be greater than the length of the array.");
        if (i > array.Length + index)
            throw new ArgumentException
            ("The array is not big enough to hold the collection entries.", nameof(array));
        i = 0;
        var current = _root;
        while (null != current)
        {
            // enumerate the values and set
            // each array element
            array[i + index] = current.Value;
            ++i;
            current = current.Next;
        }
    }
    // required for ICollection<T> but we don't really need it
    bool ICollection<T>.IsReadOnly => false;

    // adds an item to the linked list
    public void Add(T value)
    {
        _Node prev = null;
        // start at the root
        var current = _root;
        // find the final element
        while (null != current)
        {
            prev = current;
            current = current.Next;
        }
        if (null == prev)
        {
            // is the root
            _root = new _Node();
            _root.Value = value;
        }
        else
        {
            // add to the end
            var n = new _Node();
            n.Value = value;
            prev.Next = n;
        }
        // increment count and version
        ++_count;
        ++_version;
    }

    // simply clears the list
    public void Clear()
    {
        _root = null;
        _count = 0;
        ++_version;
    }
    // removes the first item with the 
    // specified value
    public bool Remove(T value)
    {
        _Node prev = null;
        var current = _root;
        while (null != current)
        {
            // find the value.
            if(Equals(value,current.Value))
            {
                if(null!=prev) // not the root
                    // set the previous next pointer
                    // to the current's next pointer
                    // effectively eliminating 
                    // current
                    prev.Next = current.Next;
                else // set the root
                    // we just want the next value
                    // it will eliminate current
                    _root = current.Next;
                // decrement the count
                --_count;
                // increment the version
                ++_version;
                return true;
            }
            // iterate
            prev = current;
            current = current.Next;
        }
        // couldn't find the value
        return false;
    }
}

Note the addition of the _count field. Collection consumers will expect the Count property to be very fast. However, in order to retrieve the count of items in a linked list, you'd normally have to traverse each item in the list in order to find it. That's not desirable for this scenario, so each time we add and remove items from the list, we update the _count field accordingly. That way, we avoid unnecessary performance degradation. This is a very common pattern when implementing custom collections.

请注意_count字段的添加。集合消费者希望Count属性很快。但是，为了检索链接列表中的项目计数，通常必须遍历列表中的每个项目才能找到它。在这种情况下，这是不希望的，因此，每次我们从列表中添加和删除项目时，我们都会相应地更新_count字段。这样，我们避免了不必要的性能下降。在实现自定义集合时，这是一种非常常见的模式。

The rest of the members are self explanatory, either at face value, or through the comments above. Note the judicious use of error handling, and the careful bookkeeping of the _count and _version fields whenever the collection is updated. This is critical. The one thing to be aware of here is in the CopyTo() method, the index refers to the index in array to start copying - not the index of the collection. That is, the index is the destination index.

其余成员可以从表面上或通过上面的评论进行自我解释。请注意，应谨慎使用错误处理，并在更新集合时仔细记录_count和_version字段。这很关键。有一两件事要注意的是这里的CopyTo()方法，该指数指的是index在array开始复制-而不是收集的指标。也就是说， index是目标索引 。

Now on to the implementation of IList<T> in LinkedList.List.cs:

现在继续执行LinkedList.List.cs中的IList<T> ：

    // gets or sets the value at the specified index
    public T this[int index] {
        get {
            // check the index for validity
            if (0 > index || index >= _count)
                throw new IndexOutOfRangeException();
            // start at the root
            var current = _root;
            // enumerate up to the index
            for (var i = 0;i<index;++i)
                current = current.Next;
            // return the value at the index
            return current.Value;
        }
        set {
            // check for value index
            if (0 > index || index >= _count)
                throw new IndexOutOfRangeException();
            // start at the root
            var current = _root;
            // enumerate up to the index
            for (var i = 0; i < index; ++i)
                current = current.Next;
            // set the value at the current index
            current.Value=value;
            // increment the version
            ++_version;
        }
    }
    // returns the index of the specified value
    public int IndexOf(T value)
    {
        // track the index
        var i = 0;
        // start at the root
        var current = _root;
        while (null!=current)
        {
            // enumerate checking the value
            if (Equals(current.Value, value))
                return i; // found
            // increment the current index
            ++i;
            // iterate
            current = current.Next;
        }
        // not found
        return -1;
    }
    // inserts an item *before* the specified position
    public void Insert(int index,T value)
    {
        // check index for validity
        // the index can be the count in order to insert
        // as the last item.
        if (0 > index || index > _count)
            throw new ArgumentOutOfRangeException(nameof(index));

        if (0 == index) // insert at the beginning
        {
            _Node n = new _Node();
            n.Next = _root;
            n.Value = value;
            _root = n;
        }
        else if (_count == index) // insert at the end
        {
            Add(value);
            return;
        }
        else // insert in the middle somewhere
        {
            // start at the root
            var current = _root;
            _Node prev = null;
            // iterate up to the index
            for (var i = 0; i < index; ++i)
            {
                prev = current;
                current = current.Next;
            }
            // insert a new node at the position
            var n = new _Node();
            n.Value = value;
            n.Next = current;
            prev.Next = n;
        }
        // update the version and increment the count
        ++_version;
        ++_count;
    }
    // removes the item at the specified index
    public void RemoveAt(int index)
    {
        // check the index for validity
        if (0 > index || index >= _count)
            throw new ArgumentOutOfRangeException(nameof(index));
        // start at the root
        var current = _root;
        _Node prev = null;
        if (1 == _count) // special case for single item
            _root = null;
        else
        {
            // iterate through the nodes up to index
            for (var i = 0; i < index; ++i)
            {
                prev = current;
                current = current.Next;
            }
            // replace the previous next node with
            // current next node, effectively removing
            // current
            if (null == prev)
                _root = _root.Next;
            else
                prev.Next = current.Next;
        }
        // decrement the count
        --_count;
        // increment the version
        ++_version;
    }
}

Much of the code here is self explanatory. Just be careful to update _version on the this[int index].set property! Insert() is only as complicated as it is because of how linked lists work. The only caveats with Insert() are that index refers to the index after the new item and can go as high as _count rather than _count-1. So to insert should be conceptually thought of as "insert before index."

这里的许多代码是不言自明的。请注意在this[int index].set属性上更新_version ！由于链接列表的工作方式， Insert()复杂程度仅此而已。 Insert()的唯一警告是， index引用了新项之后的index ，并且可以高达_count而不是_count-1 。因此，插入在概念上应被视为“在索引之前插入”。

One thing to note about IList<T>: It isn't always appropriate to implement. In fact, because of the way a linked list works, it doesn't really support direct indexing, unlike an array. Sure we've emulated it, but it's not really efficient because we have to iterate. Because of this, it's not necessarily good practice to implement this interface on top of a linked list. We've done so here simply for demonstration purposes. In the real world, you will choose the collection interfaces that best suit your data structure. Here, ICollection<T> might have been appropriate without implementing IList<T>.

关于IList<T>需要注意的一件事：它并不总是适合实现。事实上，因为这样一个链表的作品，它并没有真正支持直接索引，不象一个数组。当然我们已经模拟了它，但是它并不是真正有效的，因为我们必须进行迭代。因此，不一定要在链表顶部实现此接口。我们在此只是出于演示目的。在现实世界中，您将选择最适合您的数据结构的收集接口。在这里，如果不实现IList<T> ，则ICollection<T>可能是合适的。

Finally, onto our constructors which can be found in LinkedList.Constructors.cs:

最后，进入LinkedList.Constructors.cs中的构造函数：

partial class LinkedList<T>
{
    // optimized constructor for adding a collection
    public LinkedList(IEnumerable<T> collection)
    {
        // check for collection validity
        if (null == collection)
            throw new ArgumentNullException(nameof(collection));
        // track the count            
        int c = 0;
        // track the current node
        _Node current = null;
        // enumerate the collection
        foreach(var item in collection)
        {
            // if this is the first item
            if(null==current)
            {
                // set the root
                _root = new _Node();
                _root.Value = item;
                current = _root;
            } else
            {
                // set the next item
                var n = new _Node();
                n.Value = item;
                current.Next = n;
                current = n;
            }
            ++c;
        }
        // set the count
        _count = c;
    }
    // default constructor
    public LinkedList() {
    }
}

Here, we've provided a default constructor and a constructor that copies from an existing collection or array. These are really the minimum constructors you want to provide. Others might be appropriate depending on your internal data structure. For example, if it was array based, you might have a capacity, while a B-tree might have an order. In this case, the constructor that takes a collection is optimized, but you'll generally want to provide it even if all it does is call Add() on itself in a loop.

在这里，我们提供了一个默认构造函数和一个从现有collection或array中复制的构造函数。这些确实是您要提供的最少的构造函数。其他类型可能更合适，具体取决于您的内部数据结构。例如，如果它是基于数组的，则您可能具有capacity ，而B树可能具有order 。在这种情况下，采用集合的构造函数已经过优化，但是您通常希望提供它，即使它所做的只是在循环中自己调用Add() 。

And that's it! Now you have a robust list implementation that handles errors properly, is efficient, and handles the various pitfalls of implementing lists properly. That being said, do not use this linked list in production. Microsoft's implementation is probably more optimized, and certainly more tested. This however, should provide you a path forward for implementing your own data structures.

就是这样！现在，您有了一个健壮的列表实现，可以正确地处理错误，高效并正确处理列表的各种陷阱。话虽如此，请不要在生产中使用此链表。 Microsoft的实现可能会更优化，并且肯定会经过更多测试。但是，这应该为您实现自己的数据结构提供一条前进的道路。