Effective C#之Item 40:Match Your Collection to Your Needs

最新推荐文章于 2024-09-16 21:00:00 发布

MaybeHelios

最新推荐文章于 2024-09-16 21:00:00 发布

阅读量1.2k

点赞数

分类专栏： Effective C# 文章标签： c# arrays collections initialization 存储 compiler

Effective C# 专栏收录该内容

54 篇文章 0 订阅

订阅专栏

rel="File-List" href="file:///C:%5CDOCUME%7E1%5CHelios%5CLOCALS%7E1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_filelist.xml"> rel="themeData" href="file:///C:%5CDOCUME%7E1%5CHelios%5CLOCALS%7E1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_themedata.thmx"> rel="colorSchemeMapping" href="file:///C:%5CDOCUME%7E1%5CHelios%5CLOCALS%7E1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_colorschememapping.xml">

Item 40: Match Your Collection to Your Needs

根据需求选择集合

To the question of "Which collection is best?," the answer is "It depends." Different collections have different performance characteristics and are optimized for different actions. The .NET Framework supports many of the familiar collections: lists, arrays, queue, stack, and others. C# supports multidimensional arrays, which have performance characteristics that differ from either single-dimensional arrays or jagged arrays. The framework also includes many specialized collections; look through those before you build your own. You can find all the collections quickly because all collections implement the ICollection interface. The documentation for ICollection lists all classes that implement that interface. Those 20-odd classes are the collections at your disposal.

对于问题“哪个集合最好呢”，答案是“看情况”。不同集合有不同的性能特征，对不同的行为做了优化。.NET框架支持很多相似的集合：列表，数组，队列，栈和其它。C#支持多维数组，与一维数组和交错数组相比有不同的性能特征。框架同时也包含了很多特定的集合：在构建自己的集合之前，浏览一下这些。你可以很快速的找到这些集合，因为它们都实现了ICollection接口。ICollection文档列举了所有实现该接口的类。这20左右个类规你支配。

To pick the right collection for your proposed use, you need to consider the actions you'll most often perform on that collection. To produce a resilient program, you'll rely in the interfaces that are implemented by the collection classes so that you can substitute a different collection when you find that your assumptions about the usage of the collection were incorrect (see Item 19).

为了挑选你想使用的合适的集合，你需要考虑你最可能在那个集合上执行的操作。为了创建有弹性的程序，需要依赖于接口，该接口由集合类实现，那样的话，当你发现对该集合的使用不正确时，你可以将其替换成不同的集合。

The .NET Framework has three different kinds of collections: arrays, arraylike collections, and hash-based containers. Arrays are the simplest and generally the fastest, so let's start there. This is the collection type you'll use most often.

.NET框架有3个不同种类的集合：数组，类似数组的集合和基于hash的容器。数组是最简单而且一般是最快的，因此让我们从它开始。这是一个你最经常使用的集合。

Your first choice should often be the System.Array classor, more correctly, a type-specific array class. The first and most significant reason for choosing the array class is that arrays are type-safe. All other collection classes store System.Object references, until C# 2.0 introduces generics (see Item 49). When you declare any array, the compiler creates a specific System.Array derivative for the type you specify. For example, this declaration creates an array of integers:

你最开始的选择应该是System.Array类，更准确的说，是一个指定了类型的数组类。选择数组类的最首要最明显的原因是数组是类型安全的。所有其它集合存储的是System.Object引用，直到C#2.0引入了泛型(见Item 49)。当你声明任何数组时，编译器就创建一个特定的System.Array，派生出你指定的类型。例如，下面这个声明创建一个整型数组：

private int [] _numbers = new int[100];

The array stores integers, not System.Object. That's significant because you avoid the boxing and unboxing penalty when you add, access, or remove value types from the array (see Item 17). That initialization creates a single-dimensional array with 100 integers stored in it. All the memory occupied by the array has a 0 bit pattern stored in it. Arrays of value types are all 0s, and arrays of reference types are all null. Each item in the array can be accessed by its index:

该数组存储整型，而不是System.Object。这很重要，因为当你对数组添加、访问、移除一个值的时候，可以避免装箱和拆箱的代价（见Item17）。刚才的初始化创建了一个一维数组，存储了100个整型。所有被该数组占据的内存都有一个0位模式，值类型的数组都是0，引用类型的数组都是null。数组中的每一项都可以通过索引来访问：

int j = _numbers[ 50 ];

In addition to the array access, you can iterate the array using foreach or an enumerator (see Item 11):

除了对数组的访问，你可以使用foreach或者枚举器(见Item11)对该数组进行迭代：

 
 foreach ( int i in _numbers )
  Console.WriteLine( i.ToString( ) );
// or:
IEnumerator it = _numbers.GetEnumerator( );
while( it.MoveNext( ))
{
  int i = (int) it.Current;
  Console.WriteLine( i.ToString( ) );
}
 

If you are storing a single sequence of like objects, you should store them in an array. But often, your data structures are more complicated collections. It's tempting to quickly fall back on the C-style jagged array, an array that contains other arrays. Sometimes, this is exactly what you need. Each element in the outer collection is an array along the inner direction:

如果你要存储单一次序的对象，那么应该使用数组。但是很多情况下，你的数据结构是更复杂的集合。这很快让我们落回C风格的交错数据，即数组里面包含其他数组。有时，这正是你所需要的。每个外层集合的元素都是内层结构的一个数组：

 
 public class MyClass
{
  // Declare a jagged array:
  private int[] [] _jagged;
 
  public MyClass()
  {
    // Create the outer array:
    _jagged = new int[5][];
 
    // Create each inner array:
    _jagged[0] = new int[5];
    _jagged[1] = new int[10];
    _jagged[2] = new int[12];
    _jagged[3] = new int[7];
    _jagged[4] = new int[23];
  }
}
 

Each inner single-dimension array can be a different size than the outer arrays. Use jagged arrays when you need to create differently sized arrays of arrays. The drawback to jagged arrays is that a column-wise traversal is inefficient. Examining the third column in each row of a jagged array requires two lookups for each access. There is no relationship between the locations of the element at row 0, column 3 and the element at row 1, column 3. Only multidimensional arrays can perform column-wise traversals more efficiently. Old-time C and C++ programmers made their own two- (or more) dimensional arrays by mapping them onto a single-dimension array. For old-time C and C++ programmers, this notation is clear:

每个内部的一维数组都可以是任意大小，和外部数组没有关系。当你需要创建包含不同大小数组的数组时，使用交错数组。交错数组的缺点就是列方向的遍历是低效的。检查一个交错数组每一行的第三列，要求对每次访问要检查2次。在第0行第3列的元素和第1行第3列的元素没有任何关系。只有多维数组才能在列上进行高效的遍历。原来的C和C++程序员将它们的二维或者多维数组映射到一个一维数组上，对他们来说，这个标识是清晰的：

double num = MyArray[ i * rowLength + j ];

The rest of the world would prefer this:

世界上其他人可能更喜欢这样：

double num = MyArray[ i, j ];

But C and C++ did not support multidimensional arrays. C# does. Use the multidimensional array syntax: It's clearer to both you and the compiler when you mean to create a true multidimensional structure. You create multidimensional arrays using an extension of the familiar single-dimension array notation:

但是C和C++不支持多维数组。C#支持。使用多维数组语法：当你想创建一个真正的多维数据结构的时候，对你和编译器来说都是清晰的。通过对熟悉的一位数组标识符的扩展来创建多维数组：

private int[ , ] _multi = new int[ 10, 10 ];

The previous declaration creates a two-dimensional array, a 10x10 array with 100 elements. The length of each dimension in a multidimensional array is always constant. The compiler utilizes this property to generate more efficient initialization code. Initializing the jagged array requires multiple initialization statements. In my simple example earlier, you need five statements. Larger arrays or arrays with more dimensions require more extensive initialization code. You must write code by hand. However, multidimensional arrays with more dimensions merely require more dimension specifiers in the initialization statement. Furthermore, the multidimensional array initializes all array elements more efficiently. Arrays of value types are initialized to contain a value at each valid index in the array. The contents of the value are all 0. Arrays of reference types contain null at each index in the array. Arrays of arrays contain null inner arrays.

前面的声明创建了一个2维数组，一个含有100个元素的10乘10的数组。在多维数组里面的每个维度上，长度总是恒定的。编译器利用该属性来生成更高效的初始化代码。对交错数组进行初始化要求多个初始化语句。在我这个简单的例子里面，你需要5条语句。更大的数组或者更多维的数组需要更多的初始化代码。你必须手工编写代码。然而，具有多个维度的多维数组，在初始化的时候，很少要求更具体的初始化，只需要指定有多少维就行了。另外，多维数组初始化时更高效。值类型的数组被初始化为：在每个有效的索引上包含一个值。这些值的内容都是0。引用类型的数组在每个有效索引上包含一个null，数组的数组在内部数组上都包含null。

Traversing multidimensional arrays is almost always faster than traversing jagged arrays, especially for by-column or diagonal traversals. The compiler can use pointer arithmetic on any dimension of the array. Jagged arrays require finding each correct value for each single-dimension array.

在多维数组上进行遍历比在交错数组上进行遍历要快很多，尤其是列或者对角线的遍历。编译器可以在每个维度上使用指针算法。交错数组要求找到每个单独维度数组上的每个正确的值。

Multidimensional arrays can be used like any collection in many ways. Suppose you are building a game based on a checker board. You'd make a board of 64 Squares laid out in a grid:

多维数组在很多方面都可以像其他集合一样使用。假设你正在构建一个基于棋盘的游戏。在一个网格上分布64个方格。

private Square[ , ] _theBoard = new Square[ 8, 8 ];

This initialization creates the array storage for the Squares. Assuming that Square is a reference type, the Squares themselves are not yet created, and each array element is null. To initialize the elements, you must look at each dimension in the array:

该初始化创建了存储方格的数组。假设方格是引用类型，方格本身还没有被创建，每个元素都是null。为了初始化这些元素，你必须查看该数组的每个维度：

 
 for ( int i = 0; i < _theBoard.GetLength( 0 ); i++ )
  for( int j = 0; j < _theBoard.GetLength( 1 ); j++ )
    _theBoard[ i, j ] = new Square( );
 

But you have more flexibility in traversing the elements in a multidimensional array. You can get an individual element using the array indexes:

但是在一个多维数组里面进行遍历的时候，你有更大的灵活性。可以通过使用数组索引来获得每个单独的元素：

Square sq = _theBoard[ 4, 4 ];

If you need to iterate the entire collection, you can use an iterator:

如果你需要迭代整个集合，你可以使用迭代器：

 
 foreach( Square sq in _theBoard )
  sq.PaintSquare( );

Contrast that with what you would write for jagged arrays:

和交错数组做个对比：

 
 foreach( Square[] row in _theBoard )
  foreach( Square sq in row )
    sq.PaintSquare( );
 

Every new dimension in a jagged array introduces another foreach statement. However, with a multidimensional array, a single foreach statement generates all the code necessary to check the bounds of each dimension and get each element of the array. The foreach statement generates specific code to iterate the array by using each array dimension. The foreach loop generates the same code as if you had written this:

在交错数组的每个新的维度上，引入另一个foreach语句。然而，对于一个多维数组，一个单独的foreach语句就生成了所有必须的代码来检查每个维度的边界，获得数组的每个元素。Foreach语句生成特定的代码，通过使用每个数组维度，来迭代数组。Foreach循环生成的代码和你像下面这样写生成的代码是一样的：

 
 for ( int i = _theBoard.GetLowerBound( 0 );  i <= _theBoard.GetUpperBound( 0 ); i++ )
  for( int j = _theBoard.GetLowerBound( 1 ); j <= _theBoard.GetUpperBound( 1 ); j++ )
    _theBoard[ i, j ].PaintSquare( );
 

This looks inefficient, considering all those calls to GetLowerBound and GetUpperBound inside the loop statement, but it's actually the most efficient construct. The JIT compiler knows enough about the array class to cache the boundaries and to recognize that internal bounds checking can be omitted (see Item 11).

如果考虑在循环内部，所有对GetLowerBound和GetUpperBound的调用，那么这看起来是低效的。但是，实际上它是最高效的结构。JIT编译器对数组类了解的足够多，能够捕捉它的边界，并且意识到，内部边界检查可以被省略(见Item11)。

Two major disadvantages to the array class will make you examine the other collection classes in the .NET Framework. The first affects resizing the arrays: Arrays cannot be dynamically resized. If you need to modify the size of any dimension of an array, you must create a new array and copy all the existing elements to it. Resizing takes time: A new array must be allocated, and all the elements must be copied from the existing array into the new array. Although this copying and moving is not as expensive on the managed heap as it was in the C and C++ days, it still costs time. More important, it can result in stale data being used. Consider this code fragment:

数组类的2个主要的劣势会让你检查.Net框架里面的其他集合类。第一个影响数组大小的改变：数据不能被动态的调整大小。如果你需要修改数组任何一维的大小，都需要创建新的数组，再拷贝所有存在的元素。改变大小需要时间：一个新的数组必须被分配，所有的元素都需要从现存的数组拷贝到新的数组里面。虽然这个拷贝和移动是在托管堆上进行的，不像在C和C++一样，但是仍然花费时间。更重要的是，它会导致陈旧的数据。考虑这段代码：

 
 private string [] _cities = new string[ 100 ];
 
public void SetDataSources( )
{
  myListBox.DataSource = _cities;
}
 
public void AddCity( string CityName )
{
  String[] tmp = new string[ _cities.Length + 1 ];
  _cities.CopyTo( tmp, 0 );
  tmp[ _cities.Length ] = CityName;
 
  _cities = tmp; // swap the storage.
}
 

Even after AddCity is called, the list box uses the old copy of the _cities array for its data source. Your new city never shows up in the list box.

甚至在AddCity被调用后，myListBox仍然在使用_cities的旧拷贝来存储数据。你的新city从不会在myListBox里面出现。

The ArrayList class is a higher-level abstraction built on an array. The ArrayList collection mixes the semantics of a single-dimension array with the semantics of a linked list. You can perform inserts on an ArrayList, and you can resize an ArrayList. The ArrayList delegates almost all of its responsibilities to the contained array, which means that the ArrayList class has very similar performance characteristics to the Array class. The main advantage of ArrayList over Array is that ArrayList is easier to use when you don't know exactly how large your collection will be. ArrayList can grow and shrink over time. You still pay the performance penalty of moving and copying items, but the code for those algorithms has already been written and tested. Because the internal storage for the array is encapsulated in the ArrayList object, the problem of stale data does not exist: Clients point to the ArrayList object instead of the internal array. The ArrayList collection is the .NET Framework's version of the C++ Standard Library vector class.

ArrayList类是构建在数组基础上的更高层次的抽象。ArrayList集合混合了单维数组和链表的语义。你可以在ArrayList上面执行插入，可以改变ArrayList的大小。ArrayList将几乎所有的职责都委托给内部的数组，这意味着ArrayList和Array类有非常相似的性能特性。ArrayList比Array的最主要的优势是，当你不能精确的知道集合的大小时，ArrayList更容易使用。ArrayList可以随着时间增大或者缩小。你仍然需要为移动和复制元素付出性能代价，但是这些算法的代码已经写好并通过测试了。因为该数组的内部存储被封装在ArrayList对象里面，陈旧数据的问题就不再存在了：客户指向ArrayList对象，而不是内部数组。ArrayList是C++标准库里面vector类在.Net框架下的版本。

The Queue and the Stack classes provide specialized interfaces on top of the System.Array class. The specific interfaces for those classes build custom interfaces for the first-in, first-out queue and the last-in, first-out stack. Always remember that these collections are built using a single-dimension array as their internal storage. The same performance penalty applies when you resize them.

队列和栈这两个类在System.Array上提供特定的接口。这些类的特定接口为先进先出的队列和后进先出的栈构建了自定义的接口。永远记住，这些集合在构建的时候，是使用一维数组作为内部存储结构的。当你改变它们大小的时候，同样要付出性能的代价。

The .NET collections don't contain a linked list structure. The efficiency of the garbage collector minimizes the times when a list structure is really the best choice. If you really need linked list behavior, you have two options. If you are using a list because you expect to add and remove items often, you can use the dictionary classes with null values. Simply store the keys. You can use the ListDictionary class, which implements a single linked list of key/value pairs. Or, you can use the HybridDictionary class, which uses the ListDictionary for small collections and switches to a Hashtable for larger collections. These collections and a host of others are in the System.Collections.Specialized namespace. However, if you want to use a list structure because of a user-controllable order, you can use the ArrayList collection. The ArrayList can perform inserts at any location, even though it uses an array as its internal storage.

.Net集合不包含链表结构。垃圾回收器的效率减少了列表结构作为最好的选择的机会。如果你确实需要链表的行为，有2个选择。如果因为希望经常添加和移除元素而要使用列表，那么你可以使用带有null值的dictionary类。如果是简单的存储键值，那么可以使用ListDictionary类，它实现了单独的关于键/值对的链表。或者，你可以使用HybridDictionary类，它使用ListDictionary类来存储小集合；存储大集合时，它将转换成Hashtable。这些集合以及对其他集合的宿主都在System.Collections.Specialized命名空间下面。然而，如果你因为用户可控的顺序而想要使用列表结构的话，那么你可以使用ArrayList。ArrayList可以在任何位置执行插入操作，虽然它使用数组作为内部存储。

Two other classes support dictionary-based collections: SortedList and Hashtable. Both contain key/value pairs. SortedList orders the keys, whereas Hashtable does not. Hashtable performs searches for a given key faster, but SortedList provides an ordered iteration of the elements by key. Hashtable finds keys using the hash value of the key object. It searches by a constant time operation, O(1), if the hash key is very efficient. The sorted list uses a binary search algorithm to find the keys. This is a logarithmic operation: O(ln n).

还有2个支持基于字典的集合：SortedList和Hashtable。两者都包含键/值对。SortedList对键进行排序，而HashTable不这样做。Hashtable在对一个给定的键进行搜索时，执行的更快；SortedList对所有的元素提供根据键进行有序的迭代。Hashtable使用键对象的hash值对键进行查找。如果键是非常高效的话，它花费的查找时间是个常量，O(1)。SortedList使用二叉查找算法来寻找键，它的算法花费是：O(ln n)。

Finally, there is the BitArray class. As its name suggests, this holds bit values. The storage for the BitArray is an array of ints. Each int stores 32 binary values. This makes the BitArray class compact, but it can also decrease performance. Each get or set operation in the BitArray performs bit manipulations on the int value that stores the sought value and 31 other bits. BitArray contains methods that apply Boolean operations to many values at once: OR, XOR, AND, and NOT. These methods take a BitArray as a parameter and can be used to quickly mask multiple bits in the BitArray. The BitArray is the optimized container for bit operations; use it when you are storing a collection of bitflags that are often manipulated using masks. Do not use it as a substitute for a general-purpose array of Boolean values.

最后，还有BitArray类。正如它的名字所暗示的，它用来存储位值。BitArray的存储结构是一个整型数组。每个整型存储32个二进制值。这使得BitArray很紧缩，但是它同时也降低了效率。BitArray的每个get或者set操作，都在存储数值的整型上面以及其它31位上面执行位操作。BitArray包含了一次对多个值进行布尔操作的方法：OR, XOR, AND和 NOT。这些方法将BitArray作为一个参数，可以快速的在BitArray上掩饰多个位。BitArray是对位操作的优化过的容器，当你存储关于位标志的集合时，并且有很多掩码操作时，就使用BitArray。对于一般目的的布尔数值数组，不要使用它。

With the exception of the Array class, none of the collection classes in the 1.x release of C# is strongly typed. They all store references to Object. C# generics will contain new versions of all these topologies that are built on generics. That will be the best way to create type-safe collections. In the meantime, the current System.Collections namespace contains abstract base classes that you can use to build your own type-safe interfaces on top of the type-unsafe collections: CollectionBase and ReadOnlyCollectionBase provide base classes for a list or vector structure. DictionaryBase provides a base class for key/value pairs. The DictionaryBase class is built using a Hashtable implementation; its performance characteristics are consistent with the Hashtable.

除了Array类之外，C#1.0版本里面的其他集合类都不是强类型的。它们都存储Object的引用。C#泛型将包含这些拓扑结构的，构建在泛型上的新版本。这将是创建类型安全的集合的最好的方式。同时，当前的System.Collections命名空间包含了抽象基类，你可以在非类型安全的集合基础上，构建自己的类型安全的接口：CollectionBase和ReadOnlyCollectionBase为列表或者向量结构提供了基类。DictionaryBase为键/值对提供了基类。DictionaryBase使用了HashTable来实现，它的性能特征和HashTable是一样的。

Anytime your classes contain collections, you'll likely want to expose that collection to the users of your class. You do this in two ways: with indexers and the IEnumerable interface. Remember that, early in this item, I showed you that you can directly access items in an array using [] notation, and you can iterate the items in the array using foreach.

任何时候，你的类包含集合时，你都可能希望将该集合暴露给你的类的用户。有2个方法：使用索引或者迭代器接口。记住，在该条款的前面部分，我向你展示了：可以使用[]标志直接访问的数据元素，可以使用foreach迭代数组里面的元素。

You can create multidimensional indexers for your classes. These are analogous to the overloaded operator [] that you could write in C++. As with arrays in C#, you can create multidimensional indexers:

可以为你的类创建多维索引。在C++里面，这是对[]操作符的重载。在C#里面的数组上，你相应的可以创建多维索引：

 
 public int this [ int x, int y ]
{
  get
  {
    return ComputeValue( x, y );
  }
}
 

Adding indexer support usually means that your type contains a collection. That means you should support the IEnumerable interface. IEnumerable provides a standard mechanism for iterating all the elements in your collection:

添加索引支持意味着你的类型包含了一个集合。也就意味着你需要支持IEnumerable接口。IEnumerable为在你的集合里面迭代所有的元素提供了一种机制：

 
 public interface IEnumerable
{
  IEnumerator GetEnumerator( );
}
 

The GetEnumerator method returns an object that implements the IEnumerator interface. The IEnumerator interface supports traversing a collection:

GetEnumerator方法返回实现了IEnumerator接口的对象。IEnumerator支持对集合的遍历：

 
 public interface IEnumerator
{
  object Current
  { get; }
 
  bool MoveNext( );
 
  void Reset( );
} 
 

In addition to the IEnumerable interface, you should consider the IList or ICollection interfaces if your type models an array. If your type models a dictionary, you should consider implementing the IDictionary interface. You could create the implementations for these large interfaces yourself, and I could spend several more pages explaining how. But there is an easier solution: Derive your class from CollectionBase or DictionaryBase when you create your own specialized collections.

对于IEnumerable接口，如果你的类型建立在数组模型上，你还应该考虑IList和 ICollection接口。如果你的类型模型是字典，你应该考虑实现IDictionary接口。你可以对这些大接口创建自己的实现，我将会花更多的页面来解释如何做。但是有个简单的解决方法：如果你创建自己特定的集合的话，从CollectionBase或者DictionaryBase派生。

Let's review what we've covered. The best collection depends on the operations you perform and the overall goals of space and speed for your application. In most situations, the Array provides the most efficient container. The addition of multidimensional arrays in C# means that it is easier to model multidimensional structures clearly without sacrificing performance. When your program needs more flexibility in adding and removing items, use one of the more robust collections. Finally, implement indexers and the IEnumerable interface whenever you create a class that models a collection.

让我们复习下讲了什么。最好的集合取决于你要执行的操作，以及应用程序对空间和速度的全局考虑。在多数情况下，Array会提供最高效的容器。C#里面的多维数组意味着：不需要牺牲性能，就能清晰的建造多维结构的模型。当你的程序需要更大的灵活性来添加、移除元素时，使用集合里面更健壮的一个。最后，当你创建一个以集合为模型的类时，实现索引和IEnumerable接口。