Effective C#之Item 46:Minimize Interop

最新推荐文章于 2024-09-02 16:43:09 发布

MaybeHelios

最新推荐文章于 2024-09-02 16:43:09 发布

阅读量1.1k

点赞数

分类专栏： Effective C# 文章标签： interop c# marshalling microsoft library parameters

Effective C# 专栏收录该内容

54 篇文章 0 订阅

订阅专栏

rel="File-List" href="file:///C:%5CDOCUME%7E1%5CHelios%5CLOCALS%7E1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_filelist.xml"> rel="themeData" href="file:///C:%5CDOCUME%7E1%5CHelios%5CLOCALS%7E1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_themedata.thmx"> rel="colorSchemeMapping" href="file:///C:%5CDOCUME%7E1%5CHelios%5CLOCALS%7E1%5CTemp%5Cmsohtmlclip1%5C01%5Cclip_colorschememapping.xml">

Item 46: Minimize Interop

最小化交互

One of the smartest moves Microsoft made when designing .NET was to realize that no one would adopt the platform if there wasn't a way to integrate their existing code assets into new .NET development. Microsoft knew that without a way to leverage existing code, adoption would slow down. But that doesn't make interop easy or efficient. Interop works, but that's the only good thing that can be said about it. All the interop strategies are forced to provide some marshalling when the flow of control passes between the native and the managed boundaries. Also, interop strategies force you, the developer, to declare the method parameters by hand. Finally, the CLR cannot perform optimizations across an interop boundary. Nothing would be better for a developer than to ignore all the investment in native code and COM objects. But the world doesn't always work that way. Most of us need to add new features to existing applications, enhance and update existing tools, or otherwise make new managed applications interact with old legacy applications. Using some kind of interop is often the only practical way to slowly replace legacy systems. Therefore, it's important to understand the costs associated with the different interop strategies. These costs are paid in terms of both development schedules and runtime performance. Sometimes, the best choice is to rewrite the legacy code. Other times, you need to pick the correct interop strategy.

微软在设计.NET时，做出的最明智的改动就是意识到：如果没有办法将他们现存的代码声明整合到新的.Net开发中，那么将没有人愿意采用这个平台。微软早知道，没有利用现有代码的方法的话，大家对它的接受将会变慢。但是这并没有使交互变容易或者高效。交互是可以工作的，但是那仅仅是可以用来说一下的。当控制流在本地的和托管的边界之间穿梭时，所有的交互策略都被强制提供一些编集信号(Marshalling,is the peocess of transformaing the memory representation of an object to a data format suitable for storage or transmission.It is typically used when data must be moved between different parts of a computer program or from one program to another.这里译作：编集)。同时，交互策略强迫你，开发者，手工声明方法参数。最后，CLR不能跨越交互边界进行优化。对于开发者来说，忽略所有对本地代码和COM对象的投入，是最好不过的了。但是世界却不是这么工作的。我们中的大多数需要向现存应用程序添加新的特性，增强或者更新现存的工具，或者编制新的托管应用程序与老的遗留应用程序交互。要缓慢的替换遗留系统，使用某种交互经常是唯一可行的方式。因此，理解不同的交互策略带来的花费是重要的。这些代价是以开发计划和运行时性能表示的。有时，最好的选择是重写遗留代码。有时，你需要选择正确的交互策略。

Before I discuss the interop strategies that are available to you, I need to spend a paragraph discussing the "just throw it out" strategy. Chapter 5, "Working with the Framework," showed you some of the classes and techniques that are already built for you and delivered in the .NET Framework. More often than you would think, you can identify the classes and algorithms that represent your core functionality and rewrite only those in C#. The rest of the existing codebase can be replaced by the functionality delivered in the .NET Framework. It doesn't work everywhere or every time, but it should be seriously considered as a migration strategy. All of Chapter 5 could be taken as a recommendation to follow the "throw it out" strategy. This one item is dedicated to interop. Interop is painful.

在我讨论可用的交互策略之前，需要花费一段篇幅来讨论“Just throw it out”(放弃)策略。第5章,“与框架一起工作”，向你展示了一些已经构建好的并且随着.Net框架发布的类和技术。比你会想到的要常用，你可以确认一些表述你的核心功能的类和算法，在C#里面重新编写，剩下的现存代码库，可以由.Net框架发布的方法所替换。这并不是任何时候任何地方都行得通的，但是作为一种迁移策略，值得郑重的考虑。第5章的所有内容可以被看做是遵循“Just throw it out”策略的建议。本条款专注于交互，交互令人痛苦。

For the rest of this item, let's assume that you've determined that the full rewrite isn't practical. Several different strategies will let you access native code from .NET code. You need to understand the cost and inefficiencies inherent in crossing the boundary between managed and unmanaged code. There are three tolls to pay using interop. The first toll is paid by marshalling data back and forth between the managed heap and the native heap. The second toll is the thunking cost of moving between managed code and unmanaged code. You and your users pay these performance tolls. The third toll is yours alone: the amount of work you need to perform to manage this mixed environment. The third toll is the biggest, so your design decisions should minimize that cost.

本条款的其余部分，让我们假设你已经做出决定：完全重写是不实际的。一些不同的策略将让你从.NET代码访问本地代码。你需要理解在托管和非托管代码之间穿插存在固有的代价，而且是低效的。使用交互有3个代价。第一个是：在托管堆和本地堆之间的编排数据传递。第二个是：托管代码和非托管代码之间的thunking(吞吐)。你和你的用户要付出这些性能代价。第三个代价就是你自己的了：要管理这个混合环境，你要做的工作量。第三个是最大的，因此你的设计决定应该使该开销最小。

Let's begin by discussing the performance costs associated with interop and how to minimize that cost. Marshalling is the single biggest factor. As with the web services and remoting, you need to strive for a chunky API rather than a chatty API. You accomplish this differently when you interact with unmanaged code. You create a chunky interop API by modifying the existing unmanaged to add a new, more interop-friendly API. A common COM practice is to declare many properties that clients can set, changing the internal state or the behavior of the object. Setting each property marshals data back and forth across the boundary. (It also thunks each time as it crosses the interop boundary.) That is very inefficient. Unfortunately, the COM object or unmanaged library might not be under your control. When that happens, you need to work harder. In this case, you can create a very thin unmanaged C++ library that exposes the type's capabilities using the chunkier API that you need. That's going to increase your development time (that third toll again).

让我们从交互带来的性能开销以及如何使开销最小开始讨论。编集是最大的因素。像web服务和远程操作一样，你需要争取笨重API而不是小巧API。当你和未托管代码交互的时候，可以采用不同的方法来完成。通过修改现存未托管的API，添加新的，交互更友好的API，来创建笨重的交互API。常见的COM实践就是，声明很多客户可以设置的属性，改变内部状态或者对象的行为。每次设置属性都会编排数据并跨越边界(在每次穿越交互边界时也会有转换)。这是很低效的。不幸的是，COM对象或者未托管库可能不受你控制。当那些发生时，你需要艰苦的工作。在这种情况下，你可以创建一个非常薄的未托管C++库，使用你需要的小巧的API来暴露类型的功能。这会增加你的开发时间(就是第三种开销)。

When you wrap a COM object, make sure that you modify the data types to provide a better marshalling strategy between the managed and unmanaged sections of your code. Some types can be marshaled much more efficiently than others. Try to limit the data types passed between the managed and unmanaged layers of your code to blittable types. A blittable type is one in which the managed and unmanaged representations of the type are the same. The contents can be copied without regard to the internal structure of the object. In some cases, the unmanaged code can use the managed memory. The blittable types are listed here:

当你包装COM对象时，确定修改了数据类型，可以在代码的托管和未托管部分之间提供更好的群集策略。一些类型可以比其他的编排起来更有效率。努力将代码的托管和未托管层之间传输的数据类型限制为blittable类型。如果一个类型的托管和未托管表现是一样的，那么它就是Blittable类型。不需要考虑对象的内部结构，它的内容就能被复制。有些时候，未托管代码能使用托管内存。Blittable类型如下所示：

 
 System.Byte
System.SByte
System.Int16
System.UInt16
System.Int32
System.UInt32
System.Int64
System.UInt64
System.UIntPtr
 

In addition, any one-dimensional array of a blittable type is blittable. Finally, any formatted type that contains only blittable types is blittable. A formatted type is a struct that explicitly defines its layout using StructLayoutAttribute:

另外，任何blittable类型的一维数组都是blittable的。最后，任何只包含blittable类型的格式化的类型也是blittable的。格式化类型就是使用StructLayoutAttribute显式的定义了布局的结构体：

 
 [ StructLayout( LayoutKind.Sequential ) ]
public struct Point3D
{
  public int X;
  public int Y;
  public int Z;
}
 
 

When you use only blittable types between the unmanaged and managed layers of your code, you minimize how much information must be copied. You also optimize any copy operations that must occur.

当你在代码的托管和未托管层之间仅仅使用blittable类型时，可以最小化很多必须被拷贝的信息。同时，也优化了任何应该发生的拷贝操作。

If you can't restrict your data types to the blittable types, you can use InAttribute and OutAttribute to control when copies are made. Similar to COM, these attributes control which direction the data is copied. In/Out parameters are copied both ways; In parameters and Out parameters are copied only once. Make sure you apply the most restrictive In/Out combination to avoid more copying than necessary.

如果不能将你的数据类型限制为blittable类型，你可以使用InAttribute和OutAttribute控制何时拷贝。和COM类似，这些特性控制了数据被拷贝的方向。In/Out参数是双向拷贝；In或者Out参数是只拷贝一次。确认应用了最严格的In/Out组合来避免多余的拷贝。

Finally, you can increase performance by declaring how data should be marshaled. This is most common with strings. Marshalling strings uses BSTRs by default. That's a safe strategy, but it is the least efficient. You can save extra copying operations by modifying the default marshalling scheme by applying the MarshalAs attribute. The following declaration marshals the string as a LPWStr, or wchar*:

最后，通过声明数据应该如何被编排，可以提高性能。对于string是最常用的。编排字符串默认使用BSTRs。这是安全的策略，但是是最低效的。通过应用MarshalAs特性，可以改变默认的编排方案，从而节省额外的拷贝操作。下面的声明将string的编排为LPWStr或wchar*:

 
 public void SetMsg( [ MarshalAs( UnmanagedType.LPWStr ) ] string msg );
 

That's the short story for handling data between managed and unmanaged layers: Data gets copied and possibly translated between managed and unmanaged types. You can minimize the copy operations in three ways. The first is by limiting the parameters and return values to blittable types. That's the preferred solution. When you can't do that, apply the In and Out attributes to minimize the copy and transfer operations that must occur. As a final optimization, some types can be marshaled in more than one manner, so pick the most optimal manner for your use.

这就是在托管和未托管层次之间管理数据的故事了：数据被拷贝，可能在托管和未托管类型之间传递。有三种方式可以使得拷贝操作最少。第一个，将参数和返回值限制为blittable类型，这是完美的解决方法。当你不能那样做时，应用In和Out特性来使得应该发生的拷贝和传输最小。最后一个优化方法，一些类型可以用多于一种的方式被编排，那么就选择对于你的应用来说最优的方式。

Now let's move on to how you can transfer program control between managed and unmanaged components. You have three options: COM interop, Platform Invoke (P/Invoke), and managed C++. Each has its own advantages and disadvantages.

现在让我们考虑，如何在托管和非托管组件之间传递程序控制。有三个选择：COM交互，平台调用(P/Invoke)和托管C++。每一个都有优点和缺点。

COM interop is the easiest way to leverage those COM components you are already using. But COM interop is the least efficient way to access native code in .NET. Unless you already have a significant investment in COM components, don't go down this path. Don't look at this path don't even think about it. Using COM interop if you don't have COM components means learning COM as well as all the interop rules. This is no time to start understanding IUnknown. Those of us who did are trying to purge it from our memories as quickly as possible. Using COM interop also means that you pay the runtime cost associated with the COM subsystem. You also have to consider what it means in terms of the differences between the CLR's object lifetime management and the COM version of object lifetime management. You can defer to the CLR, in which case every COM object you import has a finalizer, which calls Release() on that COM interface. Or, you can explicitly release the COM object yourself by using ReleaseCOMObject. The first approach introduces runtime inefficiencies in your program (see Item 15). The second introduces headaches in your programmers. Using ReleaseCOMObject means you are diving down into the management issues already solved by the CLR's COM interop layer. You're taking over, and you think you know best. The CLR begs to differ, and it releases COM objects, unless you tell it correctly that you have done so. This is tricky, at best, because COM expects programmers to call Release() on each interface, and your managed code is dealing with objects. In short, you need to know which interfaces have been AddRef'd on an object and release only those. Let the CLR manage COM lifetimes for you, and pay the performance costs. You're a busy developer. Learning to mix COM resource management in .NET is more than you should take on (that third toll).

COM交互是利用已经在使用的COM组件的最简单的方式。但是在.NET中，COM交互是访问本地代码的最低效的方式。除非你对COM组件有了显著的了解，就不要走这条路。不要看这条路，甚至都不要想。如果你没有COM组件，那么使用COM交互意味着你要在学习所有交互规则的同时还要学习COM。没有时间来理解IUnknown。我们中用过这些的人，都在努力尽可能快的从我们的记忆里面删除它。使用COM交互还意味着，要对COM子系统带来的开销付出运行时代价。你不得不考虑它们之间的区别意味着什么：CLR的生命周期托管和COM版本的生命周期托管。你可以遵从CLR，这样的话，你导出的每个COM对象都有一个终结器，可以在COM接口上调用Release()。或者，你可以使用自己的ReleaseCOMObject来显式发布COM对象。第一个途径会在你的程序里面引入运行时的低效率(见Item15)。第二个在你的程序里让你头疼。使用ReleaseCOMObject意味着，你将一头扎进托管问题里面，而这些问题在CLR的COM交互层已经得到了解决。你正在占领这一领域，以为自己知道的很好了。CLR祈求不同，将发布COM对象，除非你正确的告诉它你已经这么做了。这很搞笑，因为COM期望程序员在每个接口上调用Release()，同时，你的托管代码和这些对象打交道。简而言之，你需要知道在一个对象上哪个接口已经在AddRef'd，并且只发布这些。让CLR为你管理COM生命周期，付出性能代价吧。你很忙，在.NET里面要学习混合COM资源管理，不是你应该做的(第三种开销)。

Your second option is to use P/Invoke. This is the most efficient way to call any of the Win32 APIs because you avoid the overhead associated with COM. The bad news is that you need to hand-code the interface to each method that you call using P/Invoke. The more methods you invoke, the more method declarations you must hand-code. This P/Invoke declaration tells the CLR how to access the native method. This extra work explains why every example of P/Invoke (including the following one) uses MessageBox:

你的第二个选择是使用P/Invoke。这是调用任何Win32 API的最高效的方法，因为避免了在上层与COM打交道。坏的方面是，使用P/Invoke，你需要为每个调用的方法手工编写接口。调用的方法越多，手动编写的方法声明就越多。P/Invoke声明告诉CLR如何访问本地方法。这个额外的工作解释了为什么每个P/Invoke (包括下面这个)都使用MessageBox：

 
 public class PInvokeMsgBox
{
   [ DllImport( "user32.dll" ) ]
   public static extern int MessageBoxA( int h, string m, string c, int type );
 
   public static int Main()
   {
      return MessageBoxA( 0, "P/InvokeTest", "It is using Interop", 0 );
   }
}
 

The other major drawback to P/Invoke is that it is not designed for object-oriented languages. If you need to import a C++ library, you must specify the decorated names in your import declarations. Suppose that instead of the Win32 MessageBox API, you wanted to access one of the two AfxMessageBox methods in the MFC C++ DLL. You'd need to create a P/Invoke declaration for one of these two methods:

P/Invoke的另一个主要缺点是，它不是为面向对象语言设计的。如果你需要导出一个C++库，应该在导出声明上指定装饰性名字。假如不要Win32的MessageBox这个API，而是希望访问MFC C++ DLL里面两个AfxMessageBox方法中的一个。你需要为两个方法中的一个创建P/Invoke声明：

 
 ?AfxMessageBox@@YGHIII@Z
?AfxMessageBox@@YGHPBDII@Z

These two decorated names match these two methods:

这两个装饰性名字匹配这2个方法：

 
 int AfxMessageBox( LPCTSTR lpszText, UINT nType, UINT nIDHelp );
int AFXAPI AfxMessageBox( UINT nIDPrompt, UINT nType, UINT nIDHelp);

Even after just a few overloaded methods, you quickly realize that this is not a productive way to provide interoperability. In short, use P/Invoke only to access C-style Win32 methods (more toll in developer time).

甚至仅仅在几个重载方法之后，你就将迅速的意识到，这不是提供互操作性的具有生产力的方式。简而言之，仅仅使用P/Invoke来访问C风格的Win32方法(开发要更多时间)。

Your last option is to mix managed and unmanaged code using the /CLR switch on the Microsoft C++ compiler. If you compile all your native code using /CLR, you create an MSIL-based library that uses the native heap for all data storage. That means this C++ library cannot be called directly from C#. You must build a managed C++ library on top of your legacy code to provide the bridge between the unmanaged and managed types, providing the marshalling support between the managed and unmanaged heaps. This managed C++ library contains managed classes, whose data members are on the managed heap. These classes also contain references to the native objects:

最后的选择就是：在Microsoft C++编译器上使用/CLR开关，来混用托管和非托管代码。如果使用/CLR编译所有的本地代码，就创建了基于MSIL的库，对所有数据存储都使用本地堆。这意味着，从C#里面不能直接调用C++库。应该在你的原始代码上层构建托管C++库，在托管与非托管类型之间建立桥梁，在托管与非托管堆之间提供编排支持。下面这个托管C++库包含托管的类，它的数据成员位于托管堆上。这些类同时也包含对本地对象的引用：

 
 // Declare the managed class:
public __gc class ManagedWrapper : public IDisposable
{
private:
  NativeType* _pMyClass;
 
public:
  ManagedWrapper( ) : _pMyClass( new NativeType( ) )
  {
  }
  // Dispose:
  virtual void Dispose( )
  {
    delete _pMyClass;
    _pMyClass = NULL;
    GC::SuppressFinalize( this );
  }
 
  ~ManagedWrapper( )
  {
    delete _pMyClass;
  }
 
  // example property:
  __property System::String* get_Name( )
  {
    return _pMyClass->Name( );
  }
  __property void set_Name( System::String* value )
  {
    char* tmp  = new char [ value->Length + 1 ];
    for (int i = 0 ; i < value->Length; i++ )
      tmp[ i ] = ( char )value->Chars[ i ];
    tmp[ i ] = 0;
    _pMyClass->Name( tmp );
    delete [] tmp;
  }
 
  // example method:
  void DoStuff( )
  {
    _pMyClass->DoStuff( );
  }
 
  // other methods elided...
}
 
 

Again, this is not the most productive programming tool we've ever used. This is repetitive code, and the entire purpose is to handle the marshalling and thunking between managed and unmanaged data. The advantages are that you have complete control over how you expose your methods and properties from your native code. The disadvantage is that you have to write all this code with one part of your head writing .NET code and another part writing C++. It's easy to make simple mistakes as you shift between the two. You must remember to delete unmanaged objects. Managed objects are not your responsibility. It slows down your developer time to constantly check which is correct.

再说一次，这不是我们使用过的最有生产效率的编程工具。这是重复的代码，整个的目的就是在托管和非托管数据之间处理编排和thunking(吞吐)。优点是，对于如何暴露来自本地代码的方法和属性，你有完全的控制。缺点是，编写代码的时候，你的大脑一半在想编写.Net代码，一半在想编写C++代码。当你在两者间切换的时候，很容易犯简单的错误。你应该记得删除非托管对象。编排对象不是你的职责。持续的检查哪个是正确的，会延迟开发时间。

Using the /CLR switch sounds like magic, but it's not the magic bullet for all interop scenarios. Templates and exception handling are handled quite differently in C++ and C#. Well-written, efficient C++ does not necessarily translate into the best MSIL constructs. More importantly, C++ code compiled with the /CLR switch is not verifiable. As I said earlier, this code uses the native heap: It accesses native memory. The CLR cannot verify this code as safe. Programs that call this code must have been granted security permission to access unsafe code. Even so, the /CLR strategy is the best way to leverage your existing C++ code (not COM objects) in .NET. Your program does not incur the thunking cost because your C++ libraries are now in MSIL, not native CPU instructions.

使用/CLR开关听起来很神奇，但是它不适用于所有的交互场景。模板和异常处理，在C++和C#里面是相当不同的。编写良好，高效的C++不需要转换成最好的MSIL结构。更重要的是，使用/CLR进行编译的C++代码是不可验证的。正如我前面所说，这些代码使用本地堆：访问本地内存。CLR不能验证代码是否安全。调用这些代码的程序应该被授以安全许可来访问非安全代码。即使这样，/CLR策略仍然是在.NET中利用先有C++代码的最好的方法。你的程序还没有遭遇吞吐开销，是因为你的C++库现在在MSIL里面，而不是本地CPU指令。

Interop is painful. Seriously consider rewriting native applications before you use interop. It is often easier and faster. Unfortunately for many developers, interop is necessary. If you have existing COM objects written in any language, use COM interop. If you have existing C++ code, the /CLR switch and managed C++ provide the best strategy to access your existing native codebase from new development created in C#. Pick the strategy that takes the least time. It might be the "just throw it out" strategy.

交互是痛苦的。在你使用交互之前要慎重考虑重新编写本地应用程序，通常会更简单也更快捷。不幸的是，对于很多开发者，交互是必须的。如果你有现有的以任何语言编写的COM对象，那么就使用COM交互。如果你有现有的C++代码，那么/CLR开关和托管C++提供了最好的策略，使得用C#创建的新开发可以访问现存的本地代码库。选择花费最少时间的策略，这可能就是“just throw it out”策略。