NET 本质论 - 了解 C# foreach 的内部工作原理和使用 yield 的自定义迭代器

最新推荐文章于 2023-01-12 17:12:25 发布

dz45693

最新推荐文章于 2023-01-12 17:12:25 发布

阅读量2.9k

点赞数

分类专栏： C#.NET基础

C#.NET基础专栏收录该内容

184 篇文章 8 订阅

订阅专栏

本文转自https://msdn.microsoft.com/zh-cn/magazine/mt797654 和 https://msdn.microsoft.com/zh-cn/magazine/mt809121

虽然 foreach 语句编码起来很容易，但很少有开发者了解它的内部工作原理，这让我感到非常惊讶。例如，你是否注意到 foreach 对数组的运行方式不同于 IEnumberable<T> 集合吗？你对 IEnumerable<T> 和 IEnumerator<T> 之间关系的熟悉程度如何？而且，就算你了解可枚举接口，是否熟练掌握使用 yield 语句实现此类接口呢？

集合类的关键要素

根据定义，Microsoft .NET Framework 集合是至少可实现 IEnumerable<T>（或非泛型 IEnumerable 类型）的类。此接口至关重要，因为至少必须实现 IEnumerable<T> 的方法，才支持迭代集合。

foreach 语句语法十分简单，开发者无需知道元素数量，避免编码过于复杂。不过，运行时并不直接支持 foreach 语句。C# 编译器会转换代码，接下来的部分会对此进行介绍。

foreach 和数组：下面展示了简单的 foreach 循环，用于迭代整数数组，然后将每个整数打印输出到控制台中：

 
     int[] array = new int[]{1, 2, 3, 4, 5, 6};
foreach (int item in array)
{
  Console.WriteLine(item);
}

在此代码中，C# 编译器为 for 循环创建了等同的 CIL：

 
    
 
     int[] tempArray;
int[] array = new int[]{1, 2, 3, 4, 5, 6};
tempArray = array;
for (int counter = 0; (counter < tempArray.Length); counter++)
{
  int item = tempArray[counter];
  Console.WriteLine(item);
}
 
    

在此示例中，请注意，foreach 依赖对 Length 属性和索引运算符 ([]) 的支持。借助 Length 属性，C# 编译器可以使用 for 语句迭代数组中的每个元素。

foreach 和 IEnumerable<T> 集合：虽然前面的代码适用于长度固定且始终支持索引运算符的数组，但并不是所有类型集合的元素数量都是已知的。此外，许多集合类（包括 Stack<T>、Queue<T> 和 Dictionary<TKey and TValue>）都不支持按索引检索元素。因此，需要使用一种更为通用的方法来迭代元素集合。迭代器模式就派上用场了。假设可以确定第一个、第二个和最后一个元素，那么就没有必要知道元素数量，也没有必要支持按索引检索元素。

System.Collections.Generic.IEnumerator<T> 和非泛型 System.Collections.IEnumerator 接口旨在启用迭代器模式（而不是前面介绍的长度索引模式）来迭代元素集合。它们的关系类图如图 1 所示。

IEnumerator 和 IEnumerator 接口的类图
图 1：IEnumerator<T> 和 IEnumerator 接口的类图

IEnumerator<T> 派生自的 IEnumerator 包含三个成员。第一个成员是布尔型 MoveNext。使用这种方法，可以在集合中从一个元素移到下一个元素，同时检测是否已枚举完所有项。第二个成员是只读属性 Current，用于返回当前处理的元素。Current 在 IEnumerator<T> 中进行重载，提供按类型分类的实现代码。借助集合类中的这两个成员，只需使用 while 循环，即可迭代集合：

 
    
 
     System.Collections.Generic.Stack<int> stack =
  new System.Collections.Generic.Stack<int>();
int number;
// ...
// This code is conceptual, not the actual code.
while (stack.MoveNext())
{
  number = stack.Current;
  Console.WriteLine(number);
}
 
    

在此代码中，当移到集合末尾时，MoveNext 方法返回 false。这样一来，便无需在循环的同时计算元素数量。

（Reset 方法通常会抛出 NotImplementedException，因此不得进行调用。如果需要重新开始枚举，只要新建一个枚举器即可。）

前面的示例展示的是 C# 编译器输出要点，但实际上并非按此方式进行编译，因为其中略去了两个重要的实现细节：交错和错误处理。

状态为共享：前面示例中展示的实现代码存在一个问题，即如果两个此类循环彼此交错（一个 foreach 在另一个循环内，两个循环使用相同的集合），集合必须始终有当前元素的状态指示符，以便在调用 MoveNext 时，可以确定下一个元素。在这种情况下，交错的一个循环可能会影响另一个循环。（对于多个线程执行的循环，也是如此。）

为了解决此问题，集合类不直接支持 IEnumerator<T> 和 IEnumerator 接口。而是直接支持另一种接口 IEnumerable<T>，其唯一方法是 GetEnumerator。此方法用于返回支持 IEnumerator<T> 的对象。不必使用始终指示状态的集合类，而是可以使用另一种类，通常为嵌套类，这样便有权访问集合内部，从而支持 IEnumerator<T> 接口，并始终指示迭代循环的状态。枚举器就像是序列中的“游标”或“书签”。可以有多个“书签”，移动其中任何一个都可以枚举集合，与其他枚举器互不影响。使用此模式，foreach 循环的 C# 等同代码如图 2 所示。

图 2：迭代期间始终指示状态的独立枚举器

 
    
 
     System.Collections.Generic.Stack<int> stack =
  new System.Collections.Generic.Stack<int>();
int number;
System.Collections.Generic.Stack<int>.Enumerator
  enumerator;
// ...
// If IEnumerable<T> is implemented explicitly,
// then a cast is required.
// ((IEnumerable<int>)stack).GetEnumerator();
enumerator = stack.GetEnumerator();
while (enumerator.MoveNext())
{
  number = enumerator.Current;
  Console.WriteLine(number);
}
 
    

迭代后清除状态：由于实现 IEnumerator<T> 接口的类始终指示状态，因此有时需要在退出循环后清除状态（因为要么所有迭代均已完成，要么抛出异常）。为此，从 IDisposable 派生 IEnumerator<T> 接口。实现 IEnumerator 的枚举器不一定实现 IDisposable，但如果实现了，同样也会调用 Dispose。这样可以在退出 foreach 循环后调用 Dispose。因此，最终 CIL 的 C# 等同代码如图 3 所示。

图 3：对集合执行 foreach 的编译结果

 
    
 
     System.Collections.Generic.Stack<int> stack =
  new System.Collections.Generic.Stack<int>();
System.Collections.Generic.Stack<int>.Enumerator
  enumerator;
IDisposable disposable;
enumerator = stack.GetEnumerator();
try
{
  int number;
  while (enumerator.MoveNext())
  {
    number = enumerator.Current;
    Console.WriteLine(number);
  }
}
finally
{
  // Explicit cast used for IEnumerator<T>.
  disposable = (IDisposable) enumerator;
  disposable.Dispose();
  // IEnumerator will use the as operator unless IDisposable
  // support is known at compile time.
  // disposable = (enumerator as IDisposable);
  // if (disposable != null)
  // {
  //   disposable.Dispose();
  // }
}
 
    

请注意，由于 IEnumerator<T> 支持 IDisposable 接口，因此 using 语句可以将图 3 中的代码简化为图 4 中的代码。

图 4：使用 using 执行错误处理和资源清除

 
    
 
     System.Collections.Generic.Stack<int> stack =
  new System.Collections.Generic.Stack<int>();
int number;
using(
  System.Collections.Generic.Stack<int>.Enumerator
    enumerator = stack.GetEnumerator())
{
  while (enumerator.MoveNext())
  {
    number = enumerator.Current;
    Console.WriteLine(number);
  }
}
 
    

然而，重新调用 CIL 并不直接支持 using 关键字。因此，图 3 中的代码实际上是用 C# 更精准表示的 foreach CIL 代码。

在不实现 IEnumerable 的情况下使用 foreach： C# 不要求必须实现 IEnumerable/IEnumerable<T> 才能使用 foreach 迭代数据类型。编译器改用鸭子类型这一概念；它使用 Current 属性和 MoveNext 方法查找可返回类型的 GetEnumerator 方法。鸭子类型涉及按名称搜索，而不依赖接口或显式方法调用。（“鸭子类型”一词源自将像鸭子一样的鸟视为鸭子的怪诞想法，对象必须仅实现 Quack 方法，无需实现 IDuck 接口。）如果鸭子类型找不到实现的合适可枚举模式，编译器便会检查集合是否实现接口。

迭代器简介

至此，你已了解 foreach 的内部实现代码，是时候了解如何使用迭代器创建 IEnumerator<T>、IEnumerable<T> 和自定义集合对应的非泛型接口的自定义实现代码了。迭代器提供明确的语法，用于指定如何迭代集合类中的数据，尤其是使用 foreach 循环。这样一来，集合的最终用户就可以浏览其内部结构，而无需知道相应结构。

枚举模式存在的问题是，手动实现起来不方便，因为必须始终指示描述集合中的当前位置所需的全部状态。对于列表集合类型类，指示这种内部状态可能比较简单；当前位置的索引就足够了。相比之下，对于需要递归遍历的数据结构（如二叉树），指示状态可能就会变得相当复杂。为了减少实现此模式所带来的挑战，C# 2.0 新增了 yield 上下文关键字，这样类就可以更轻松地决定 foreach 循环如何迭代其内容。

定义迭代器：迭代器是更为复杂的枚举器模式的快捷语法，用于实现类的方法。如果 C# 编译器遇到迭代器，它会将其内容扩展到实现枚举器模式的 CIL代码中。因此，实现迭代器时没有运行时依赖项。由于 C# 编译器通过生成 CIL 代码处理实现代码，因此使用迭代器无法获得真正的运行时性能优势。不过，使用迭代器取代手动实现枚举器模式可以大大提高程序员的工作效率。为了理解这一优势，我将先思考一下，如何在代码中定义迭代器。

迭代器语法：迭代器提供迭代器接口（IEnumerable<T> 和 IEnumerator<T> 接口的组合）的简单实现代码。图 5 通过创建 GetEnumerator 方法，声明了泛型 BinaryTree<T> 类型的迭代器（尽管还没有实现代码）。

图 5：迭代器接口模式

 
    
 
     using System;
using System.Collections.Generic;
public class BinaryTree<T>:
  IEnumerable<T>
{
  public BinaryTree ( T value)
  {
    Value = value;
  }
  #region IEnumerable<T>
  public IEnumerator<T> GetEnumerator()
  {
    // ...
  }
  #endregion IEnumerable<T>
  public T Value { get; }  // C# 6.0 Getter-only Autoproperty
  public Pair<BinaryTree<T>> SubItems { get; set; }
}
public struct Pair<T>: IEnumerable<T>
{
  public Pair(T first, T second) : this()
  {
    First = first;
    Second = second;
  }
  public T First { get; }
  public T Second { get; }
  #region IEnumerable<T>
  public IEnumerator<T> GetEnumerator()
  {
    yield return First;
    yield return Second;
  }
  #endregion IEnumerable<T>
  #region IEnumerable Members
  System.Collections.IEnumerator
    System.Collections.IEnumerable.GetEnumerator()
  {
    return GetEnumerator();
  }
  #endregion
  // ...
}
 
    

通过迭代器生成值：迭代器接口类似于函数，不同之处在于一次生成一系列值，而不是返回一个值。如果为 BinaryTree<T>，迭代器会生成一系列为 T 提供的类型参数值。如果使用非泛型版本 IEnumerator，生成的值将改为类型对象。

为了正确实现迭代器模式，必须始终指示某内部状态，以便在枚举集合的同时跟踪当前位置。如果为 BinaryTree<T>，跟踪树中哪些元素已枚举，以及哪些元素尚未枚举。编译器将迭代器转换成“状态机”，用于跟踪当前位置，并确定如何“将自身移”到下一个位置。

每当迭代器遇到 yield return 语句，都会生成值；控制权会立即重归请求获取此项的调用方。当调用方请求获取下一项时，之前执行的 yield return 语句后面紧接着的代码便会开始执行。在图 6 中，C# 内置数据类型关键字依序返回。

图 6：依序生成一些 C# 关键字

 
    
 
     using System;
using System.Collections.Generic;
public class CSharpBuiltInTypes: IEnumerable<string>
{
  public IEnumerator<string> GetEnumerator()
  {
    yield return "object";
    yield return "byte";
    yield return "uint";
    yield return "ulong";
    yield return "float";
    yield return "char";
    yield return "bool";
    yield return "ushort";
    yield return "decimal";
    yield return "int";
    yield return "sbyte";
    yield return "short";
    yield return "long";
    yield return "void";
    yield return "double";
    yield return "string";
  }
    // The IEnumerable.GetEnumerator method is also required
    // because IEnumerable<T> derives from IEnumerable.
  System.Collections.IEnumerator
    System.Collections.IEnumerable.GetEnumerator()
  {
    // Invoke IEnumerator<string> GetEnumerator() above.
    return GetEnumerator();
  }
}
public class Program
{
  static void Main()
  {
    var keywords = new CSharpBuiltInTypes();
    foreach (string keyword in keywords)
    {
      Console.WriteLine(keyword);
    }
  }
}
 
    

图 6 的结果如图 7 所示，即 C# 内置类型的列表。

图 7：图 6 中代码输出的一些 C# 关键字的列表

 
    
 
     object
byte
uint
ulong
float
char
bool
ushort
decimal
int
sbyte
short
long
void
double
string
 
    

很显然，这需要有更多说明，但由于本期专栏的空间有限，我将在下一期专栏中对此进行说明，给大家留个悬念。我只想说，借助迭代器，可以神奇般地将集合创建为属性，如图图 8 所示。在此示例中，依赖 C# 7.0 元组只是因为这样做比较有趣。若要进一步了解，可以查看源代码，也可以参阅我的“C# 本质论”一书的第 16 章。

图 8：使用 yield return 实现 IEnumerable<T> 属性

 
    
IEnumerable<(string City, string Country)> CountryCapitals
{
  get
  {
    yield return ("Abu Dhabi","United Arab Emirates");
    yield return ("Abuja", "Nigeria");
    yield return ("Accra", "Ghana");
    yield return ("Adamstown", "Pitcairn");
    yield return ("Addis Ababa", "Ethiopia");
    yield return ("Algiers", "Algeria");
    yield return ("Amman", "Jordan");
    yield return ("Amsterdam", "Netherlands");
    // ...
  }
}
 
      
 
     

深入研究了 C# foreach 语句的工作方式，并解释了 C# 编译器如何通过公共中间语言 (CIL) 实现 foreach 功能。我还通过举例简单地提了一下 yield 关键字（见图 1），但几乎未做任何解释。

图 1：依序生成一些 C# 关键字

 
   
 
    using System.Collections.Generic;
public class CSharpBuiltInTypes: IEnumerable<string>
{
  public IEnumerator<string> GetEnumerator()
  {
    yield return "object";
    yield return "byte";
    yield return "uint";
    yield return "ulong";
    yield return "float";
    yield return "char";
    yield return "bool";
    yield return "ushort";
    yield return "decimal";
    yield return "int";
    yield return "sbyte";
    yield return "short";
    yield return "long";
    yield return "void";
    yield return "double";
    yield return "string";
  }
    // The IEnumerable.GetEnumerator method is also required
    // because IEnumerable<T> derives from IEnumerable.
  System.Collections.IEnumerator
    System.Collections.IEnumerable.GetEnumerator()
  {
    // Invoke IEnumerator<string> GetEnumerator() above.
    return GetEnumerator();
  }
}
public class Program
{
  static void Main()
  {
    var keywords = new CSharpBuiltInTypes();
    foreach (string keyword in keywords)
    {
      Console.WriteLine(keyword);
    }
  }
}
 
   

本文将在上一篇文章的基础之上，继续详细介绍 yield 关键字及其用法。

迭代器和状态

通过在图 1 中的 GetEnumerator 方法开头添加断点，可以看到 GetEnumerator 在 foreach 语句开头处得到调用。此时，将创建迭代器对象，它的状态会初始化成特殊的“开始”状态，表示迭代器中尚未执行任何代码，因而也尚未生成任何值。至此以后，只要调用站点上的 foreach 语句继续执行，迭代器就会保持其状态（位置）。每当循环请求获取下一个值时，控制权都会授予迭代器，并接着上次的循环进度继续执行；迭代器对象中存储的状态信息用于确定必须在哪里恢复控制权。当调用站点上的 foreach 语句终止时，将不再保存迭代器的状态。图 2 展示了所发生事件的简要序列图。请注意，MoveNext 方法出现在 IEnumerator<T> 接口上。

在图 2 中，调用站点上的 foreach 语句对称为关键字的 CSharpBuiltInTypes 实例调用 GetEnumerator。可以看到，再次调用 GetEnumerator 始终都是安全的；将根据需要创建“新的”枚举器。鉴于迭代器引用的迭代器实例，foreach 通过调用 MoveNext 开始每次迭代。在迭代器中，生成返回给调用站点上的 foreach 语句的值。在 yield return 语句之后，GetEnumerator 方法貌似在出现下一个 MoveNext 请求之前一直暂停。再回到循环体，foreach 语句在屏幕上显示生成的值。然后，循环回再次对迭代器调用 MoveNext。请注意，第二次控制权会授予第二个 yield return 语句。foreach 将再次在屏幕上显示 CSharpBuiltInTypes 生成的值，并重新开始循环。这个过程会一直持续下去，直到迭代器中没有其他任何 yield return 语句时为止。这时，调用站点上的 foreach 循环将终止，因为 MoveNext 返回了 false。

图 2：含 yield return 语句的序列图

另一迭代器示例

以类似示例为例，其中包含我在上一篇文章中介绍过的 BinaryTree<T>。为了实现 BinaryTree<T>，我需要先让 Pair<T> 支持使用迭代器的 IEnumerable<T> 接口。图 3 中的示例展示了如何生成 Pair<T> 中的每个元素。

在图 3 中，Pair<T> 数据类型的迭代循环两次：第一次是通过 yield return First，第二次是通过 yield return Second。每当在 GetEnumerator 中遇到 yield return 语句，就会保存状态，而且执行似乎会从 GetEnumerator 方法上下文中“跳出”并进入循环体。当第二次迭代开始时，GetEnumerator 会再次开始执行 yield return Second 语句。

图 3：使用 Yield 实现 BinaryTree<T>

 
    
 
     public struct Pair<T>: IPair<T>,
  IEnumerable<T>
{
  public Pair(T first, T second) : this()
  {
    First = first;
    Second = second;
  }
  public T First { get; }  // C# 6.0 Getter-only Autoproperty
  public T Second { get; } // C# 6.0 Getter-only Autoproperty
  #region IEnumerable<T>
  public IEnumerator<T> GetEnumerator()
  {
    yield return First;
    yield return Second;
  }
#endregion IEnumerable<T>
  #region IEnumerable Members
  System.Collections.IEnumerator
    System.Collections.IEnumerable.GetEnumerator()
  {
    return GetEnumerator();
  }
  #endregion
}
 
    

实现 IEnumerable 及 IEnumerable<T>

System.Collections.Generic.IEnumerable<T> 继承自 System.Collections.IEnumerable。因此，实现 IEnumerable<T> 时，还必须实现 IEnumerable。在图 3 中，实现为显式完成，仅涉及调用 IEnumerable<T> GetEnumerator 实现代码。由于 IEnumerable<T> 和 IEnumerable 之间的类型兼容性（通过继承），从 IEnumerable.GetEnumerator 调用 IEnumerable<T>.GetEnumerator 将始终有效。因为两个 GetEnumerator 的签名完全相同（返回类型并不区分签名），所以其中一个或两个实现代码必须为显式。鉴于 IEnumerable<T> 版本提供的附加类型安全性，IEnumerable 实现代码应为显式。

下面的代码使用 Pair<T>.GetEnumerator 方法，并在连续两行中显示“Inigo”和“Montoya”：

 
     var fullname = new Pair<string>("Inigo", "Montoya");
foreach (string name in fullname)
{
  Console.WriteLine(name);
}

将 yield return 语句置于循环内

无需对每个 yield return 语句进行硬编码，就像我在 CSharpPrimitiveTypes 和 Pair<T> 中所做的一样。使用 yield return 语句，可以从循环构造内部返回值。图 4 使用了 foreach 循环。每当在 GetEnumerator 中执行 foreach 时，都会返回下一个值。

图 4：将 yield return 语句置于循环内

 
    
 
     public class BinaryTree<T>: IEnumerable<T>
{
  // ...
  #region IEnumerable<T>
  public IEnumerator<T> GetEnumerator()
  {
    // Return the item at this node.
    yield return Value;
    // Iterate through each of the elements in the pair.
    foreach (BinaryTree<T> tree in SubItems)
    {
      if (tree != null)
      {
        // Because each element in the pair is a tree,
        // traverse the tree and yield each element.
        foreach (T item in tree)
        {
          yield return item;
        }
      }
    }
  }
  #endregion IEnumerable<T>
  #region IEnumerable Members
  System.Collections.IEnumerator
    System.Collections.IEnumerable.GetEnumerator()
  {
    return GetEnumerator();
  }
  #endregion
}
 
    

在图 4 中，第一个迭代返回二叉树中的根元素。在第二次迭代期间，将遍历这对子元素。如果子元素对包含非 null 值，将遍历相应的子节点并生成其元素。请注意，foreach (T item in tree) 是对子节点的递归调用。

就像使用 CSharpBuiltInTypes 和 Pair<T> 一样，现在可以使用 foreach 循环迭代 BinaryTree<T>。图 5 展示了此过程。

图 5：结合使用 foreach 和 BinaryTree<string>

 
    
 
     // JFK
var jfkFamilyTree = new BinaryTree<string>(
  "John Fitzgerald Kennedy");
jfkFamilyTree.SubItems = new Pair<BinaryTree<string>>(
  new BinaryTree<string>("Joseph Patrick Kennedy"),
  new BinaryTree<string>("Rose Elizabeth Fitzgerald"));
// Grandparents (Father's side)
jfkFamilyTree.SubItems.First.SubItems =
  new Pair<BinaryTree<string>>(
  new BinaryTree<string>("Patrick Joseph Kennedy"),
  new BinaryTree<string>("Mary Augusta Hickey"));
// Grandparents (Mother's side)
jfkFamilyTree.SubItems.Second.SubItems =
  new Pair<BinaryTree<string>>(
  new BinaryTree<string>("John Francis Fitzgerald"),
  new BinaryTree<string>("Mary Josephine Hannon"));
foreach (string name in jfkFamilyTree)
{
  Console.WriteLine(name);
}
 
    

生成的结果如下：

 
     John Fitzgerald Kennedy
Joseph Patrick Kennedy
Patrick Joseph Kennedy
Mary Augusta Hickey
Rose Elizabeth Fitzgerald
John Francis Fitzgerald
Mary Josephine Hannon

迭代器的起源

1972 年，Barbara Liskov 和麻省理工学院的一群科学家开始研究编程方法，将重点放在了用户定义的数据抽象上。为了证明他们完成的大量工作，他们创建了一种叫做 CLU 的语言，提出了名为“群集”的概念（CLU 就是“群集”英文单词的前三个字母）。群集是程序员当今使用的主要数据抽象（即“对象”）的前身。在研究过程中，此团队意识到，虽然他们可以使用 CLU 语言从最终用户的数据类型中抽象出某种数据表示，但经常发现必须揭示数据的内部结构，这样其他人才能智能地使用数据。让他们感到惊愕的结果是，创造了称为“迭代器”的语言构造。（借助 CLU 语言，人们可以更好地理解最终推广的“面向对象的编程”。）

取消进一步迭代： yield break

有时可能希望取消进一步迭代。为此，可以添加 if 语句，从而不再执行代码中的其他任何语句。不过，也可以使用 yield break 让 MoveNext 返回 false，并将控制权立即返回给调用方，同时结束循环。下面的示例展示了此类方法：

 
    
 
     public System.Collections.Generic.IEnumerable<T>
  GetNotNullEnumerator()
{
  if((First == null) || (Second == null))
  {
    yield break;
  }
  yield return Second;
  yield return First;
}
 
    

如果 Pair<T> 类中的两个元素有一个为 null，那么此方法就会取消迭代。

yield break 语句类似于在确定没有要执行的操作时，将 return 语句置于函数顶部。这样一来，无需使用 if 代码块将所有剩余代码围住，即可退出进一步迭代。因此，可以多次退出。请谨慎使用这种方法，因为随意读取代码可能会忽视早期退出。

迭代器的工作方式

遇到迭代器时，C# 编译器会将代码扩展到相应枚举器设计模式的适当 CIL 中。在生成的代码中，C# 编译器会先创建嵌套的私有类来实现 IEnumerator<T> 接口及其 Current 属性和 MoveNext 方法。Current 属性返回与迭代器的返回类型对应的类型。如图 3 所示，Pair<T> 包含返回 T 类型的迭代器。C# 编译器会先检查迭代器中包含的代码，然后在 MoveNext 方法和 Current 属性中创建必要的代码来模仿它的行为。对于 Pair<T> 迭代器，C# 编译器生成大致等效的代码（见图 6）。

图 6：C# 编译器生成的等效迭代器 C# 代码

 
    
 
     using System;
using System.Collections.Generic;
public class Pair<T> : IPair<T>, IEnumerable<T>
{
  // ...
  // The iterator is expanded into the following
  // code by the compiler.
  public virtual IEnumerator<T> GetEnumerator()
  {
    __ListEnumerator result = new __ListEnumerator(0);
    result._Pair = this;
    return result;
  }
  public virtual System.Collections.IEnumerator
    System.Collections.IEnumerable.GetEnumerator()
  {
    return new GetEnumerator();
  }
  private sealed class __ListEnumerator<T> : IEnumerator<T>
  {
    public __ListEnumerator(int itemCount)
    {
      _ItemCount = itemCount;
    }
    Pair<T> _Pair;
    T _Current;
    int _ItemCount;
    public object Current
    {
      get
      {
        return _Current;
      }
    }
    public bool MoveNext()
    {
      switch (_ItemCount)
      {
        case 0:
          _Current = _Pair.First;
          _ItemCount++;
          return true;
        case 1:
          _Current = _Pair.Second;
          _ItemCount++;
          return true;
        default:
          return false;
      }
    }
  }
}
 
    

由于编译器需要使用 yield return 语句，并生成与可能已手动编写的内容对应的类，因此 C# 迭代器与手动实现枚举器设计模式的类的性能特征相同。虽然性能没有得到提升，但程序员的工作效率得到了大幅提升。

在一个类中创建多个迭代器

上面的迭代器示例实现了 IEnumerable<T>.GetEnumerator，即 foreach 隐式搜寻的方法。有时，可能需要使用不同的迭代序列，如反向迭代、筛选结果或迭代除默认值之外的对象投影。可以在类中声明其他迭代器，具体方法是将它们封装在返回 IEnumerable<T> 或 IEnumerable 的属性或方法中。例如，若要反向迭代 Pair<T> 的元素，可以提供 GetReverseEnumerator 方法，如图 7 所示。

图 7：在返回 IEnumerable<T> 的方法中使用 yield return 语句

 
    
 
     public struct Pair<T>: IEnumerable<T>
{
  ...
  public IEnumerable<T> GetReverseEnumerator()
  {
    yield return Second;
    yield return First;
  }
  ...
}
public void Main()
{
  var game = new Pair<string>("Redskins", "Eagles");
  foreach (string name in game.GetReverseEnumerator())
  {
    Console.WriteLine(name);
  }
}
 
    

请注意，返回的是 IEnumerable<T>，而不是 IEnumerator<T>。这不同于返回 IEnumerator<T> 的 IEnumerable<T>.GetEnumerator。Main 中的代码展示了如何使用 foreach 循环调用 GetReverseEnumerator。

Yield 语句要求

可以只在返回 IEnumerator<T>/IEnumerable<T> 类型或其非泛型等效类型的成员中使用 yield return 语句。成员主体包括可能没有简单返回的 yield return 语句。如果成员使用 yield return 语句，那么 C# 编译器会生成必要的代码来保持迭代器的状态。相反，如果成员使用 return 语句替代 yield return 语句，那么程序员将负责维护自己的状态机，并返回一个迭代器接口的实例。此外，就像含返回类型的方法中的所有代码路径都必须包含随附值的 return 语句一样（假设不会引发异常），如果要返回任何数据，迭代器中的所有代码路径都必须包含 yield return 语句。

如果违反以下关于 yield 语句的其他限制，则会导致编译器错误生成：

yield 语句可能只出现在方法、用户定义的运算符或索引器/属性的 get 访问器中。成员不得使用任何引用或输出参数。
yield 语句可能不会出现在匿名方法或 Lambda 表达式内。
yield 语句可能不会出现在 try 语句的 catch 和 finally 子句中。此外，只有在没有 catch 代码块时，yield 语句才可能出现在 try 代码块中。

总结

绝大程度上，泛型是 C# 2.0 中推出的一项炫酷功能，但它并不是当时推出的唯一一项集合相关功能。另一项重要补充功能就是迭代器。就像我在本文中概述的一样，迭代器涉及上下文关键字 yield。C# 使用此关键字生成基础 CIL 代码来实现 foreach 循环使用的迭代器模式。此外，我还详细介绍了 yield 语法，此语法通过 GetEnumerator 实现了 IEnumerable<T>，允许使用 yield break 退出循环，甚至支持返回 IEnumeable<T> 的 C# 方法。

dz45693

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NET 本质论 - 了解 C# foreach 的内部工作原理和使用 yield 的自定义迭代器

本文转自https://msdn.microsoft.com/zh-cn/magazine/mt797654 和 https://msdn.microsoft.com/zh-cn/magazine/mt809121虽然 foreach 语句编码起来很容易，但很少有开发者了解它的内部工作原理，这让我感到非常惊讶。例如，你是否注意到 foreach 对数组的运行方式不同于 IEnumber
复制链接

扫一扫

专栏目录