A-simple-IOCP-Server-Client-Class完成端口_explain the issues that couldarise with memory and-CSDN博客

 
 使用C++编程IOCP完成端口服务端 

 
 链接地址： 

 
  https://www.codeproject.com/Articles/10330/A-simple-IOCP-Server-Client-Class 
 

 
 This source code uses the advanced  
 IOCP 
  technology which can efficiently serve multiple clients. It also presents some solutions to practical problems that arise with the  
 IOCP 
  programming API, and provides a simple echo client/server with file transfer. 

 
 1.1 Requirements 

The article expects the reader to be familiar with C++, TCP/IP, socket programming, MFC, and multithreading.
The source code uses Winsock 2.0 and the IOCP technology, and requires:

Windows NT/2000 or later: Requires Windows NT 3.5 or later.
Windows 95/98/ME: Not supported.
Visual C++ .NET, or a fully updated Visual C++ 6.0.

 
 1.2 Abstract 

 
 When you develop different types of software, sooner or later, you will have to deal with client/server development. To write a comprehensive client/server code is a difficult task for a programmer. This documentation presents a simple but powerful client/server source code that can be extended to any type of client/server application. This source code uses the advanced  
 IOCP 
  technology which can efficiently serve multiple clients.  
 IOCP 
 presents an efficient solution to the "one-thread-per-client" bottleneck problem (among others), using only a few processing threads and asynchronous input/output send/receive. The  
 IOCP 
  technology is widely used for different types of high performance servers as Apache etc. The source code also provides a set of functions that are frequently used while dealing with communication and client/server software as file receiving/transferring function and logical thread pool handling. This article focuses on the practical solutions that arise with the  
 IOCP 
 programming API, and also presents an overall documentation of the source code. Furthermore, a simple echo client/server which can handle multiple connections and file transfer is also presented here. 

 
 当您开发不同类型的软件时，您将不得不处理客户端/服务器开发。编写一个全面的客户端/服务器代码对于程序员来说是一项艰巨的任务。本文档介绍了一个简单但功能强大的客户端/服务器源代码，可以扩展到任何类型的客户端/服务器应用程序。该源代码使用先进的IOCP技术，可以有效地为多个客户端提供服务。 IOCP为仅使用少量处理线程和异步输入/输出发送/接收的“单线程客户端”瓶颈问题（其中包括）提供了有效的解决方案。 IOCP技术广泛应用于不同类型的高性能服务器，如Apache等。源代码还提供了一组在处理通信和客户端/服务器软件时经常使用的功能，作为文件接收/传输功能和逻辑线程池处理。本文重点介绍IOCPprogramming API产生的实际解决方案，并提供源代码的总体文档。此外，这里还介绍了一个可以处理多个连接和文件传输的简单的回显客户端/服务器 

 
 2.1 Introduction 

 
 This article presents a class which can be used in both the client and server code. The class uses  
 IOCP 
  (Input Output Completion Ports) and asynchronous (non-blocking) function calls which are explained later. The source code is based on many other source codes and articles: [1, 2, and 3]. 

 
 With this simple source code, you can: 

Service or connect to multiple clients and servers.
Send or receive files asynchronously.
Create and manage a logical worker thread pool to process heavier client/server requests or computations.

 
 It is difficult to find a comprehensive but simple source code to handle client/server communications. The source codes that are found on the net are either too complex (20+ classes), or don’t provide sufficient efficiency. This source code is designed to be as simple and well documented as possible. In this article, we will briefly present the 
 IOCP 
  technology provided by Winsock API 2.0, and also explain the thorny problems that arise while coding and the solution to each one of them. 

 
 本文介绍一个可以在客户端和服务器代码中使用的类。该类使用以后解释的IOCP（输入输出完成端口）和异步（非阻塞）函数调用。源代码基于许多其他源代码和文章：[1,2和3]。使用这个简单的源代码，您可以： 

 
 ●服务或连接到多个客户端和服务器。 

 
 ●异步发送或接收文件。 

 
 ●创建和管理逻辑工作线程池以处理较重的客户机/服务器请求或计算。很难找到一个全面但简单的源代码来处理客户端/服务器通信。在网络上找到的源代码太复杂（20+以上），或者没有提供足够的效率。该源代码设计为尽可能简单和文档化。在本文中，我们将简要介绍Winsock API 2.0提供的IOCP技术，并解释编码时出现的棘手问题，并解决每一个问题 

 
 2.2 Introduction to asynchronous Input/Output Completion Ports ( 
 IOCP 
 ) 

 
 A server application is fairly meaningless if it cannot service multiple clients at the same time, usually asynchronous I/O calls and multithreading is used for this purpose. By definition, an asynchronous I/O call returns immediately, leaving the I/O call pending. At some point of time, the result of the I/O asynchronous call must be synchronized with the main thread. This can be done in different ways. The synchronization can be performed by: 

Using events - A signal is set as soon as the asynchronous call is finished. The disadvantage of this approach is that the thread has to check or wait for the event to be set.
Using the GetOverlappedResult function - This approach has the same disadvantage as the approach above.
Using Asynchronous Procedure Calls (or APC) - There are several disadvantages associated with this approach. First, the APC is always called in the context of the calling thread, and second, in order to be able to execute the APCs, the calling thread has to be suspended in the so called alterable wait state.
Using IOCP - The disadvantage of this approach is that there are many practical thorny programming problems that must be solved. Coding IOCP can be a bit of a hassle.

 
 如果服务器应用程序无法同时为多个客户端服务那这个服务是无意义的，则通常使用异步I / O调用和多线程来实现此目的。根据定义，异步I / O调用立即返回，使I / O调用挂起。在某个时间点，I / O异步调用的结果必须与主线程同步。这可以通过不同的方式完成。同步可以通过以下方式执行： 使用事件 - 异步调用完成后立即设置一个信号。这种方法的缺点是线程必须检查或等待事件被设置。 使用GetOverlappedResult函数 - 这种方法与上述方法具有相同的缺点。 使用异步过程调用（或APC） - 这种方法有几个缺点。首先，APC在调用线程的上下文中始终被调用，其次，为了能够执行APC，调用线程必须以所谓的可变等待状态挂起。 使用IOCP - 这种方法的缺点是有许多实践棘手的编程问题必须解决。编码IOCP可能有点麻烦 

 
 2.2.1 Why using  
 IOCP 
 ? 

 
 By using  
 IOCP 
 , we can overcome the "one-thread-per-client" problem. It is commonly known that the performance decreases heavily if the software does not run on a true multiprocessor machine. Threads are system resources that are neither unlimited nor cheap. 

 
 IOCP 
  provides a way to have a few (I/O worker) threads handle multiple clients' input/output "fairly". The threads are suspended, and don't use the CPU cycles until there is something to do. 

 
 通过使用IOCP，我们可以克服“单线程客户端”问题。众所周知，如果软件不在真正的多处理器机器上运行，则性能会大大降低。线程是既不限制也不便宜的系统资源。 IOCP提供了一种方式让几个（I / O工作）线程处理多个客户端的输入/输出“公平”。线程被暂停，不要使用CPU周期，直到有事情要做。 

 
 2.3 What is  
 IOCP 
 ? 

 
 We have already stated that  
 IOCP 
  is nothing but a thread synchronization object, similar to a semaphore, therefore  
 IOCP 
  is not a sophisticated concept. An  
 IOCP 
  object is associated with several I/O objects that support pending asynchronous I/O calls. A thread that has access to an  
 IOCP 
  can be suspended until a pending asynchronous I/O call is finished. 

 
 我们已经说过，IOCP只不过是线程同步对象，类似于信号量，因此IOCP不是一个复杂的概念。 
  
 IOCP对象与支持挂起的异步I / O调用的多个I / O对象相关联。可以暂停访问IOCP的线程，直到挂起的异步I / O调用完成。

 
 注： 
 io操作完成后才启用一个线程去处理 

 
 3 How does  
 IOCP 
  work? 

 
 To get more information on this part, I referred to other articles. [1, 2, 3, see References.] 

 
 While working with  
 IOCP 
 , you have to deal with three things, associating a socket to the completion port, making the asynchronous I/O call, and synchronization with the thread. To get the result from the asynchronous I/O call and to know, for example, which client has made the call, you have to pass two parameters: the  
 CompletionKey 
 parameter, and the  
 OVERLAPPED 
  structure. 

 
 要获得有关这方面的更多信息，我参考了其他文章。 [1，2，3，参见参考文献] 在使用IOCP时，必须处理三件事情，将套接字与完成端口相关联，进行异步I / O调用，并与线程同步。要获取异步I / O调用的结果，并且例如知道哪位客户端进行了调用，您必须传递两个参数：CompletionKey参数和OVERLAPPED结构。 

 
 3.1 The completion key parameter 

 
 The first parameter, the  
 CompletionKey 
 , is just a variable of type  
 DWORD 
 . You can pass whatever unique value you want to, that will always be associated with the object. Normally, a pointer to a structure or a class that can contain some client specific objects is passed with this parameter. In the source code, a pointer to a structure 
 ClientContext 
  is passed as the  
 CompletionKey 
  parameter. 

 
 第一个参数CompletionKey只是一个DWORD类型的变量。您可以传递所需的任何唯一值，这将始终与对象相关联。通常，使用此参数传递指向可以包含某些客户端特定对象的结构或类的指针。在源代码中，指向结构ClientContext的指针作为CompletionKey参数传递。 

 
 3.2 The OVERLAPPED parameter 

 
 This parameter is commonly used to pass the memory buffer that is used by the asynchronous I/O call. It is important to note that this data will be locked and is not paged out of the physical memory. We will discuss this later. 

 
 此参数通常用于传递异步I / O调用使用的内存缓冲区。重要的是要注意，这些数据将被锁定，不会从物理内存中分页。我们稍后再讨论一下 

 
 3.3 Associating a socket with the completion port 

 
 Once a completion port is created, the association of a socket with the completion port can be done by calling the function  
 CreateIoCompletionPort 
  in the following way: 

 
 一旦创建完成端口，可以通过以下方式调用函数CreateIoCompletionPort来完成套接字与完成端口的关联： 

 
 BOOL 
 IOCP 
 S::AssociateSocketWithCompletionPort(SOCKET socket, HANDLE hCompletionPort, DWORD dwCompletionKey) { HANDLE h = CreateIoCompletionPort((HANDLE) socket, hCompletionPort, dwCompletionKey, m_nIOWorkers); 
 return 
  h == hCompletionPort; } 

 
 3.4 Making the asynchronous I/O call 

 
 To make the actual asynchronous call, the functions  
 WSASend 
 ,  
 WSARecv 
  are called. They also need to have a parameter  
 WSABUF 
 , that contains a pointer to a buffer that is going to be used. A rule of thumb is that normally when the server/client wants to call an I/O operation, they are not made directly, but is posted into the completion port, and is performed by the I/O worker threads. The reason for this is, we want the CPU cycles to be partitioned fairly. The I/O calls are made by posting a status to the completion port, see below: 

 
 为了进行实际的异步调用，函数WSASend，WSARecv被调用。它们还需要一个参数WSABUF，它包含一个指向要使用的缓冲区的指针。一个经验法则是，通常当服务器/客户端想要调用I / O操作时，它们不是直接生成的，而是发布到完成端口中，并且由I / O工作线程执行。原因是，我们希望CPU周期被公平分割。 I / O呼叫是通过将状态发送到完成端口来完成的，如下所示 

 
 BOOL bSuccess = PostQueuedCompletionStatus(m_hCompletionPort, pOverlapBuff->GetUsed(), (DWORD) pContext, &pOverlapBuff->m_ol); 

 
 3.5 Synchronization with the thread 

 
 Synchronization with the I/O worker threads is done by calling the  
 GetQueuedCompletionStatus 
  function (see below). The function also provides the  
 CompletionKey 
  parameter and the  
 OVERLAPPED 
  parameter (see below): 

 
 通过调用GetQueuedCompletionStatus函数来完成与I / O工作线程的同步（见下文）。该函数还提供了CompletionKey参数和OVERLAPPED参数（见下文） 

 
 、BOOL GetQueuedCompletionStatus( HANDLE CompletionPort, 
 // handle to completion port 
  LPDWORD lpNumberOfBytes, 
 // bytes transferred 
  PULONG_PTR lpCompletionKey,  
 // file completion key 
  LPOVERLAPPED *lpOverlapped,  
 // buffer 
  DWORD dwMilliseconds  
 // optional timeout value 
  ); 

 
 3.6 Four thorny  
 IOCP 
  coding hassles and their solutions 

 
 There are some problems that arise while using  
 IOCP 
 , some of them are not intuitive. In a multithreaded scenario using  
 IOCP 
 s, the control flow of a thread function is not straightforward, because there is no relationship between threads and communications. In this section, we will represent four different problems that can occur while developing client/server applications using  
 IOCP 
 s. They are: 

The WSAENOBUFS error problem.
The package reordering problem.
The access violation problem.

 
 使用IOCP时会出现一些问题，其中一些不直观。在使用IOCP的多线程场景中，线程函数的控制流程并不简单，因为线程和通信之间没有任何关系。在本节中，我们将介绍使用IOCP开发客户端/服务器应用程序时可能发生的四个不同问题。他们是： WSAENOBUFS错误问题。 包装重新排序问题。 访问冲突问题。 

 
 3.6.1 The WSAENOBUFS error problem 

 
 This problem is non intuitive and difficult to detect, because at first sight, it seems to be a normal deadlock or a memory leakage "bug". Assume that you have developed your server and everything runs fine. When you stress test the server, it suddenly hangs. If you are lucky, you can find out that it has something to do with the WSAENOBUFS error. 

 
 With every overlapped send or receive operation, it is possible that the data buffer submitted will be locked. When memory is locked, it cannot be paged out of physical memory. The operating system imposes a limit on the amount of memory that can be locked. When this limit is reached, the overlapped operations will fail with the WSAENOBUFS error. 

 
 If a server posts many overlapped receives on each connection, this limit will be reached when the number of connections grow. If a server anticipates handling a very high number of concurrent clients, the server can post a single zero byte receive on each connection. Because there is no buffer associated with the receive operation, no memory needs to be locked. With this approach, the per-socket receive buffer should be left intact because once the zero-byte receive operation is completed, the server can simply perform a non-blocking receive to retrieve all the data buffered in the socket's receive buffer. There is no more data pending when the non-blocking receive fails with  
 WSAEWOULDBLOCK 
 . This design would be for the one that requires the maximum possible concurrent connections while sacrificing the data throughput on each connection. Of course, the more you know about how the clients interact with the server, the better. In the previous example, a non-blocking receive was performed once the zero-byte receive completes retrieving the buffered data. If the server knows that clients send data in bursts, then once the zero-byte receive is completed, it may post one or more overlapped receives in case the client sends a substantial amount of data (greater than the per-socket receive buffer that is 8 KB by default). 

 
 A simple practical solution to the WSAENOBUFS error problem is in the source code provided. We perform an asynchronous  
 WSARead(..) 
  (see  
 OnZeroByteRead(..) 
 ) with a zero byte buffer. When this call completes, we know that there is data in the TCP/IP stack, and we read it by performing several asynchronous  
 WSARead(..) 
 with a buffer of  
 MAXIMUMPACKAGESIZE 
 . This solution locks physical memory only when data arrives, and solves the WSAENOBUFS problem. But this solution decreases the throughput of the server (see Q6 and A6 in section 9 F.A.Q). 

 
 3.6.2 The package reordering problem 

 
 This problem is also being discussed by [3]. Although committed operations using the IO completion port will always be completed in the order they were submitted, thread scheduling issues may mean that the actual work associated with the completion is processed in an undefined order. For example, if you have two I/O worker threads and you should receive "byte chunk 1, byte chunk 2, byte chunk 3", you may process the byte chunks in the wrong order, namely, "byte chunk 2, byte chunk 1, byte chunk 3". This also means that when you are sending the data by posting a send request on the I/O completion port, the data can actually be sent in a reordered way. 

 
 This can be solved by only using one worker thread, and committing only one I/O call and waiting for it to finish, but if we do this, we lose all the benefits of  
 IOCP 
 . 

 
 A simple practical solution to this problem is to add a sequence number to our buffer class, and process the data in the buffer if the buffer sequence number is in order. This means that the buffers that have incorrect numbers have to be saved for later use, and because of performance reasons, we will save the buffers in a hash map object (e.g.,  
 m_SendBufferMap 
  and  
 m_ReadBufferMap 
 ). 

 
 To get more information about this solution, please go through the source code, and take a look at the following functions in the  
 IOCP 
 S 
  class: 

GetNextSendBuffer (..) and GetNextReadBuffer(..), to get the ordered send or receive buffer.
IncreaseReadSequenceNumber(..) and IncreaseSendSequenceNumber(..), to increase the sequence numbers.

 
 3.6.3 Asynchronous pending reads and byte chunk package processing problem 

 
 The most common server protocol is a packet based protocol where the first X bytes represent a header and the header contains details of the length of the complete packet. The server can read the header, work out how much more data is required, and keep reading until it has a complete packet. This works fine when the server is making one asynchronous read call at a time. But if we want to use the  
 IOCP 
  server's full potential, we should have several pending asynchronous reads waiting for data to arrive. This means that several asynchronous reads complete out of order (as discussed before in section 3.6.2), and byte chunk streams returned by the pending reads will not be processed in order. Furthermore, a byte chunk stream can contain one or several packages and also partial packages, as shown in figure 1. 

 
 Figure 1. The figure shows how partial packages (green) and complete packages (yellow) can arrive asynchronously in different byte chunk streams (marked 1, 2, 3). 

 
 This means that we have to process the byte stream chunks in order to successfully read a complete package. Furthermore, we have to handle partial packages (marked with green in figure 1). This makes the byte chunk package processing more difficult. The full solution to this problem can be found in the  
 ProcessPackage(..) 
 function in the  
 IOCP 
 S 
  class. 

 
 3.6.4 The access violation problem 

 
 This is a minor problem, and is a result of the design of the code, rather than an  
 IOCP 
  specific problem. Suppose that a client connection is lost and an I/O call returns with an error flag, then we know that the client is gone. In the parameter  
 CompletionKey 
 , we pass a pointer to a structure  
 ClientContext 
  that contains client specific data. What happens if we free the memory occupied by this  
 ClientContext 
  structure, and some other I/O call performed by the same client returns with an error code, and we transform the parameter  
 CompletionKey 
 variable of  
 DWORD 
  to a pointer to  
 ClientContext 
 , and try to access or delete it? An access violation occurs! 

 
 The solution to this problem is to add a number to the structures that contain the number of pending I/O calls ( 
 m_nNumberOfPendlingIO 
 ), and we delete the structure when we know that there are no more pending I/O calls. This is done by the  
 EnterIoLoop(..) 
  function and  
 ReleaseClientContext(..) 
 . 

 
 3.7 The overview of the source code 
 源代码概述 

 
 The goal of the source code is to provide a set of simple classes that handle all the hassled code that has to do with  
 IOCP 
 . The source code also provides a set of functions which are frequently used while dealing with communication and client/server software as file receiving/transferring functions, logical thread pool handling, etc.. 

 
 源代码的目标是提供一组简单的类来处理与IOCP有关的所有麻烦的代码。源代码还提供了一组在处理通信和客户端/服务器软件时经常使用的功能，作为文件接收/传输功能，逻辑线程池处理等。 

、

 
 Figure 2. The figure above illustrates an overview of the  
 IOCP 
  class source code functionality. 

 
 We have several IO worker threads that handle asynchronous I/O calls through the completion port ( 
 IOCP 
 ), and these workers call some virtual functions which can put requests that need a large amount of computation in a work queue. The logical workers take the job from the queue, and process it and send back the result by using some of the functions provided by the class. The Graphical User Interface (GUI) usually communicates with the main class using Windows messages (because MFC is not thread safe) and by calling functions or by using the shared variables. 

 
 图2.上图显示了IOCP类源代码功能的概述。 我们有几个IO工作线程，通过完成端口（IOCP）处理异步I / O调用，这些工​​作人员调用一些虚拟函数，可以将需要大量计算的请求放在工作队列中。逻辑工作人员从队列中获取工作，并通过使用该类提供的一些功能进行处理并发送结果。图形用户界面（GUI）通常使用Windows消息（因为MFC不是线程安全的）和通过调用函数或使用共享变量与主类通信。 

、

 
 Figure 3. The figure above shows the class overview. 

 
 The classes that can be observed in figure 3 are: 

CIOCPBuffer: A class used to manage the buffers used by the asynchronous I/O calls.
IOCPS: The main class that handles all the communication.
JobItem: A structure which contains the job to be performed by the logical worker threads.
ClientContext: A structure that holds client specific information (status, data, etc.).

 
 图3.上图显示了类概述。 图3中可以看到的类是： CIOCPBuffer：用于管理异步I / O调用使用的缓冲区的类。 IOCPS：处理所有通信的主类。 JobItem：包含由逻辑工作线程执行的作业的结构。 ClientContext：持有客户端特定信息（状态，数据等）的结构。 

 
 3.7.1 The buffer design – The C 
 IOCP 
 Buffer class 

 
 When using asynchronous I/O calls, we have to provide a private buffer to be used with the I/O operation. There are some considerations that are to be taken into account when we allocate buffers to use: 

 
 当使用异步I / O调用时，我们必须提供一个专用缓冲区，以便与I / O操作一起使用。当我们分配缓冲区以使用时，有一些注意事项要考虑： 

To allocate and free memory is expensive, therefore we should reuse buffers (memory) which have been allocated. Therefore, we save buffers in the linked list structures given below:

 
 // Free Buffer List.. 

 
 CCriticalSection m_FreeBufferListLock; 

 
 CPtrList m_FreeBufferList; 

 
 // OccupiedBuffer List.. (Buffers that is currently used) 

 
 CCriticalSection m_BufferListLock; 

 
 CPtrList m_BufferList; 

 
 // Now we use the function AllocateBuffer(..) 

 
 // to allocate memory or reuse a buffer. 

Sometimes, when an asynchronous I/O call is completed, we may have partial packages in the buffer, therefore the need to split the buffer to get a complete message. This is done by the SplitBufferfunction in the CIOCPS class. Also, sometimes we need to copy information between the buffer, and this is done by the AddAndFlush(..) function in the IOCPS class.
As we know, we also need to add a sequence number and a state (IOType variable,IOZeroReadCompleted, etc.) to our buffer.
We also need methods to convert data to byte stream and byte stream to data, some of these functions are also provided in the CIOCPBuffer class.

 
 All the solutions to the problems we have discussed above exist in the  
 C 
 IOCP 
 Buffer 
  class. 

分配和释放内存是很昂贵的，因此我们应该重用已经分配的缓冲区（内存）。因此，我们将缓冲区保存在下面给出的链表结构中
有时，当异步I / O调用完成时，我们可能在缓冲区中有部分程序包，因此需要拆分缓冲区以获取完整的消息。这是由CIOCPS类中的SplitBuffer函数完成的。另外，有时我们需要在缓冲区之间复制信息，而这是通过IOCPS类中的AddAndFlush（..）函数完成的。
我们知道，我们还需要添加序列号和状态（IOType变量IOZeroReadCompleted等）到我们的缓冲区。
我们还需要将数据转换为字节流和字节流到数据的方法，其中一些功能也在CIOCPBuffer类中提供。

 
 我们上面讨论的所有问题的所有解决方案都存在于CIOCPBuffer类中。 

 
 3.8 How to use the source code? 

 
 By inheriting your own class from  
 IOCP 
  (shown in figure 3) and using the virtual functions and the functionality provided by the  
 IOCP 
 S 
  class (e.g., threadpool), it is possible to implement any type of server or client that can efficiently manage a huge number of connections by using only a few number of threads. 

 
 通过从IOCP（如图3所示）继承自己的类并使用IOCPS类（例如，线程池）提供的虚拟功能和功能，可以实现任何类型的服务器或客户端，可以有效地管理大量的数据的连接只使用几个线程。 

 
 3.8.1 Starting and closing the server/client 

 
 To start the server, call the function: 

 
 Hide   Copy Code 

 
 BOOL Start( 
 int 
  nPort= 
 999 
 , 
 int 
  iMaxNumConnections= 
 1201 
 , 
 int 
  iMaxIOWorkers= 
 1 
 , 
 int 
  nOfWorkers= 
 1 
 , 
 int 
  iMaxNumberOfFreeBuffer= 
 0 
 , 
 int 
  iMaxNumberOfFreeContext= 
 0 
 , BOOL bOrderedSend=TRUE, BOOL bOrderedRead=TRUE,  
 int 
  iNumberOfPendlingReads= 
 4 
 ); 

nPortt

 
 Is the port number that the server will listen on. (Let it be -1 for client mode.) 

iMaxNumConnections

 
 Maximum number of connections allowed. (Use a big prime number.) 

iMaxIOWorkers

 
 Number of Input/Output worker threads. 

nOfWorkers

 
 Number of logical workers. (Can be changed at runtime.) 

iMaxNumberOfFreeBuffer

 
 Maximum number of buffers that we save for reuse. (-1 for none, 0= Infinite number) 

 
 保存重复使用的最大缓冲区数。 （-1为无，0 =无限数） 

iMaxNumberOfFreeContext

 
 Maximum number of client information objects that are saved for reuse. (-1 for none, 0= Infinite number) 

bOrderedRead

 
 Make sequential reads. (We have discussed this before in section 3.6.2.) 

 
 进行顺序读取。 （我们在3.6.2节之前已经讨论过了。） 

bOrderedSend

 
 Make sequential writes. (We have discussed this before in section 3.6.2.) 

iNumberOfPendlingReads

 
 Number of pending asynchronous read loops that are waiting for data. 

 
 读取等待数据loops的异步数 

 
 To connect to a remote connection (Client mode  
 nPort 
 = -1), call the function: 

 
 Hide   Copy Code 

 
 Connect( 
 const 
  CString &strIPAddr,  
 int 
  nPort) 

strIPAddr

 
 The IP address of the remote server. 

nPort

 
 The port. 

 
 To close, make the server call the function:  
 ShutDown() 
 . 

 
 For example: 

 
 Hide   Copy Code 

 
 My 
 IOCP 
  m_ 
 iocp 
 ; 
 if 
 (!m_ 
 iocp 
 .Start(- 
 1 
 , 
 1210 
 , 
 2 
 , 
 1 
 , 
 0 
 , 
 0 
 ))AfxMessageBox( 
 "Error could not start the Client" 
 );….m_ 
 iocp 
 .ShutDown(); 

 
 4.1 Source code description 

 
 For more details about the source code, please check the comments in the source code. 

 
 4.1.1 Virtual functions 

NotifyNewConnection

 
 Called when a new connection has been established. 

NotifyNewClientContext

 
 Called when an empty  
 ClientContext 
  structure is allocated. 

NotifyDisconnectedClient

 
 Called when a client disconnects. 

ProcessJob

 
 Called when logical workers want to process a job. 

NotifyReceivedPackage

 
 Notifies that a new package has arrived. 

NotifyFileCompleted

 
 Notifies that a file transfer has finished. 

 
 4.1.2 Important variables 

 
 重要变量 

 
 Notice that all the variables have to be exclusively locked by the function that uses the shared variables, this is important to avoid access violations and overlapping writes. All the variables with name XXX, that are needed to be locked, have a XXXLock variable. 

 
 请注意，所有变量必须由使用共享变量的函数专门锁定，这对于避免访问冲突和重叠写入很重要。名称为XXX的所有需要​​锁定的变量都有一个XXXLock变量。 

m_ContextMapLock;

 
 Holds all the client data (socket, client data, etc.). 

 
 保存所有客户端数据（套接字，客户端数据等）。 

ContextMap m_ContextMap;
m_NumberOfActiveConnections

 
 Holds the number of connected connections. 

 
 保持连接的连接数 

 
 4.1.3 Important functions 

GetNumberOfConnections()

 
 Returns the number of connections. 

CString GetHostAdress(ClientContext* p)

 
 Returns the host address, given a client context. 

BOOL ASendToAll(CIOCPBuffer *pBuff);

 
 Sends the content of the buffer to all the connected clients. 

DisconnectClient(CString sID)

 
 Disconnects a client, given the unique identification number. 

CString GetHostIP()

 
 Returns the local IP number. 

JobItem* GetJob()

 
 Removes a  
 JobItem 
  from the queue, returns  
 NULL 
  if there are no Jobs. 

BOOL AddJob(JobItem *pJob)

 
 Adds a Job to the queue. 

BOOL SetWorkers(int nThreads)

 
 Sets the number of logical workers that can be called anytime. 

DisconnectAll();

 
 Disconnect all the clients. 

ARead(…)

 
 Makes an asynchronous read. 

ASend(…)

 
 Makes an asynchronous send. Sends data to a client. 

ClientContext* FindClient(CString strClient)

 
 Finds a client given a string ID. OBS! Not thread safe! 

DisconnectClient(ClientContext* pContext, BOOL bGraceful=FALSE);

 
 Disconnects a client. 

DisconnectAll()

 
 Disconnects all the connected clients. 

StartSendFile(ClientContext *pContext)

 
 Sends a file specified in the  
 ClientContext 
  structure, using the optimized  
 transmitfile(..) 
  function. 

PrepareReceiveFile(..)

 
 Prepares the connection for receiving a file. When you call this function, all incoming byte streams are written to a file. 

PrepareSendFile(..)

 
 Opens a file and sends a package containing information about the file to the remote connection. The function also disables the  
 ASend(..) 
  function until the file is transmitted or aborted. 

DisableSendFile(..)

 
 Disables send file mode. 

DisableRecevideFile(..)

 
 Disables receive file mode. 

 
 5.1 File transfer 

 
 File transfer is done by using the Winsock 2.0  
 TransmitFile 
  function. The  
 TransmitFile 
  function transmits file data over a connected socket handle. This function uses the operating system's cache manager to retrieve file data, and provides high-performance file data transfer over sockets. These are some important aspects of asynchronous file transferring: 

Unless the TransmitFile function is returned, no other sends or writes to the socket should be performed because this will corrupt the file. Therefore, all the calls to ASend will be disabled after thePrepareSendFile(..) function.
Since the operating system reads the file data sequentially, you can improve caching performance by opening the file handle with FILE_FLAG_SEQUENTIAL_SCAN.
We are using the kernel asynchronous procedure calls while sending the file (TF_USE_KERNEL_APC). Use ofTF_USE_KERNEL_APC can deliver significant performance benefits. It is possible (though unlikely), however, that the thread in which the context TransmitFile is initiated is being used for heavy computations; this situation may prevent APCs from launching.

 
 The file transfer is made in this order: the sever initializes the file transfer by calling the  
 PrepareSendFile(..) 
 function. When the client receives the information about the file, it prepares for it by calling the 
 PrepareReceiveFile(..) 
 , and sends a package to the sever to start the file transfer. When the package arrives at the server side, the server calls the  
 StartSendFile(..) 
  function that uses the high performance 
 TransmitFile 
  function to transmit the specified file. 

 
 使用Winsock 2.0 TransmitFile功能完成文件传输。 TransmitFile函数通过连接的套接字句柄传输文件数据。该功能使用操作系统的缓存管理器来检索文件数据，并通过套接字提供高性能的文件数据传输。这些是异步文件传输的一些重要方面： 除非发送TransmitFile函数，否则不会执行其他发送或写入套接字，因为这会损坏该文件。因此，所有对ASend的调用将在PrepareSendFile（..）函数之后被禁用。 由于操作系统顺序读取文件数据，您可以通过使用FILE_FLAG_SEQUENTIAL_SCAN打开文件句柄来提高缓存性能。 我们正在使用内核异步过程调用发送文件（TF_USE_KERNEL_APC）。使用TF_USE_KERNEL_APC可以提供显着的性能优势。然而，尽管不太可能，上下文TransmitFile启动的线程正在被用于重型计算;这种情况可能会阻止APC发动。 文件传输按以下顺序进行：服务器通过调用PrepareSendFile（..）函数来初始化文件传输。当客户端收到关于该文件的信息时，它通过调用PrepareReceiveFile（..）来准备它，并将一个包发送到服务器以开始文件传输。当程序包到达服务器端时，服务器调用使用高性能TransmitFile函数的StartSendFile（..）函数来传输指定的文件。 

 
 6 The source code example 

 
 The provided source code example is an echo client/server that also supports file transmission (figure 4). In the source code, a class  
 My 
 IOCP 
  inherited from  
 IOCP 
  handles the communication between the client and the server, by using the virtual functions mentioned in section 4.1.1. 

 
 The most important part of the client or server code is the virtual function  
 NotifyReceivedPackage 
 , as described below: 

 
 提供的源代码示例是还支持文件传输的回显客户端/服务器（图4）。在源代码中，从IOCP继承的类MyIOCP通过使用4.1.1节中提到的虚拟函数来处理客户端和服务器之间的通信。 客户端或服务器代码中最重要的部分是虚拟函数NotifyReceivedPackage，如下所述： 

 
 Hide   Copy Code 

 
 void 
  My 
 IOCP 
 ::NotifyReceivedPackage(C 
 IOCP 
 Buffer *pOverlapBuff,  
 int 
  nSize,ClientContext *pContext) { BYTE PackageType=pOverlapBuff->GetPackageType();  
 switch 
  (PackageType) {  
 case 
  Job_SendText2Client : Packagetext(pOverlapBuff,nSize,pContext);  
 break 
 ; 
 case 
  Job_SendFileInfo : PackageFileTransfer(pOverlapBuff,nSize,pContext); 
 break 
 ; 
 case 
  Job_StartFileTransfer: PackageStartFileTransfer(pOverlapBuff,nSize,pContext); 
 break 
 ; 
 case 
  Job_AbortFileTransfer: DisableSendFile(pContext); 
 break 
 ;}; } 

 
 The function handles an incoming message and performs the request sent by the remote connection. In this case, it is only a matter of a simple echo or file transfer. The source code is divided into two projects,  
 IOCP 
  and 
 IOCP 
 Client, which are the server and the client side of the connection. 

 
 该功能处理传入的消息并执行远程连接发送的请求。在这种情况下，这只是一个简单的回调或文件传输的问题。源代码分为两个项目，即IOCP和IOCPClient，它们是连接的服务器端和客户端。 

 
 6.1 Compiler issues 

 
 When compiling with VC++ 6.0 or .NET, you may get some strange errors dealing with the  
 CFile 
  class, as: 

 
 Hide   Copy Code 

 
 "if (pContext->m_File.m_hFile != INVALID_HANDLE_VALUE) <-error C2446: '!=' : no conversion ""from 'void *' to 'unsigned int'" 

 
 This problems can be solved if you update the header files ( 
 *.h 
 ) or your VC++ 6.0 version, or just change the type conversion error. After some modifications, the server/client source code can be used without MFC. 

 
 7 Special considerations & rule of thumbs 

 
 特殊考虑和经验法则 

 
 When you are using this code in other types of applications, there are some programming traps related to this source code and "multithreaded programming" that can be avoided. Nondeterministic errors are errors that occur stochastically “Randomly”, and it is hard to reproduce these nondeterministic errors by performing the same sequence of tasks that created the error. These types of errors are the worst types of errors that exist, and usually, they occur because of errors in the core design implementation of the source code. When the server is running multiple IO working threads, serving clients that are connected, nondeterministic errors as access violations can occur if the programmer has not thought about the source code multithread environment. 

 
 当您在其他类型的应用程序中使用此代码时，有一些与此源代码相关的编程陷阱和可以避免的“多线程编程”。非确定性错误是随机发生的“随机”错误，并且通过执行创建错误的相同任务序列难以重现这些非确定性错误。这些类型的错误是存在的最差类型的错误，并且通常是由于源代码的核心设计实现中的错误而发生的。当服务器运行多个IO工作线程时，如果程序员没有考虑源代码多线程环境，服务连接的客户端可能会发生非确定性错误，因为访问冲突。 

 
 Rule of thumb #1: 

 
 Never read/write to the client context (e.g.,  
 ClientContext 
 ) with out locking it using the context lock as in the example below. The notification function (e.g.,  
 Notify*(ClientContext *pContext) 
 ) is already “thread safe”, and you can access the members of  
 ClientContext 
  without locking and unlocking the context. 

 
 不要像下面的示例中一样使用上下文锁定来对客户机上下文（例如ClientContext）进行读/写操作。通知功能（例如Notify *（ClientContext * pContext））已经是“线程安全”，您可以访问ClientContext的成员而不锁定和解锁上下文。 

Hide Copy Code
//Do not do it in this way // … If(pContext->m_bSomeData) pContext->m_iSomeData=0; // …

Hide Copy Code
// Do it in this way. //…. pContext->m_ContextLock.Lock(); If(pContext->m_bSomeData) pContext->m_iSomeData=0; pContext->m_ContextLock.Unlock(); //…

 
 Also, be aware that when you are locking a Context, other threads or GUI would be waiting for it. 

 
 Rule of thumb #2: 

 
 Avoid or "use with special care" code that has complicated "context locks" or other types of locks inside a “context lock”, because this may lead to a “deadlock” (e.g., A waiting for B that is waiting for C that is waiting for A => deadlock). 

 
 避免或“使用特殊照顾”代码，在“上下文锁定”中具有复杂的“上下文锁定”或其他类型的锁定，因为这可能导致“死锁”（例如，A等待等待C的B正在等待A =>死锁） 

 
 Hide   Copy Code 

 
 pContext-> m_ContextLock.Lock(); 
 //… code code ..  
 pContext2-> m_ContextLock.Lock(); 
 // code code.. 
 pContext2-> m_ContextLock.Unlock(); 
 // code code.. 
 pContext-> m_ContextLock.Unlock(); 

 
 The code above may cause a deadlock. 

 
 Rule of thumb #3: 

 
 Never access a client context outside the notification functions (e.g.,  
 Notify*(ClientContext *pContext) 
 ). If you do, you have to enclose it with  
 m_ContextMapLock.Lock(); 
  …  
 m_ContextMapLock.Unlock(); 
 . See the source code below. 

 
 不要在通知功能之外访问客户端上下文（例如，Notify *（ClientContext * pContext））。如果你这样做，你必须用m_ContextMapLock.Lock（）包含它; ... m_ContextMapLock.Unlock（）;.请参阅下面的源代码。 

 
 Hide   Copy Code 

 
 ClientContext* pContext=NULL ; m_ContextMapLock.Lock(); pContext = FindClient(ClientID); 
 // safe to access pContext, if it is not NULL 
  
 // and are Locked (Rule of thumbs#1:)  
  
 //code .. code..  
 m_ContextMapLock.Unlock(); 
 // Here pContext can suddenly disappear because of disconnect. 
  
 // do not access pContext members here. 

 
 8 Future work 

 
 In future, the source code will be updated to have the following features in chronological order: 

The implementation of AcceptEx(..) function to accept new connections will be added to the source code, to handle short lived connection bursts and DOS attacks.
The source code will be portable to other platforms as Win32, STL, and WTL.

 
 9 F.A.Q 

 
 Q1 
 : The amount of Memory used (server program is rising steadily on increase in client connections, as seen using the 'Windows Task Manager'. Even if clients disconnect, the amount of memory used does not decrease. What's the problem? 

 
 A1 
 : The code tries to reuse the allocated buffers instead of releasing and reallocating it. You can change this by altering the parameters,  
 iMaxNumberOfFreeBuffer 
  and  
 iMaxNumberOfFreeContext 
 . Please review section 3.8.1. 

 
 Q1：使用的内存量（服务器程序在客户端连接增加时稳定上升，如使用“Windows任务管理器”所示，即使客户端断开连接，使用的内存量也不会减少，有什么问题？ A1：代码尝试重用分配的缓冲区，而不是释放和重新分配它。您可以通过更改参数iMaxNumberOfFreeBuffer和iMaxNumberOfFreeContext来更改此值。 

 
 Q2 
 : I get compilation errors under .NET:  
 "error C2446: '!=' : no conversion from 'unsigned int' to 'HANDLE'" 
  etc.. What is the problem? 

 
 A2 
 : This is because of the different header versions of the SDK. Just change the conversion to  
 HANDLE 
  so the compiler gets happy. You can also just remove the line  
 #define TRANSFERFILEFUNCTIONALITY 
  and try to compile. 

 
 Q2：我得到.NET下的编译错误：“错误C2446：'！='：没有从'unsigned int'转换为'HANDLE'等等..有什么问题？ A2：这是因为SDK的头版本不同。只需将转换更改为HANDLE即可使编译器变得快乐。您也可以删除行#define TRANSFERFILEFUNCTIONALITY并尝试编译。 

 
 Q3 
 : Can the source code be used without MFC? Pure Win32 and in a service? 

 
 A3 
 : The code was developed to be used with a GUI for a short time (not days or years). I developed this client/server solution for use with GUIs in an MFC environment. Of course, you can use it for normal server solutions. Many people have. Just remove the MFC specific stuff as  
 CString 
 ,  
 CPtrList 
  etc.., and replace them with Win32 classes. I don’t like MFC either, so send me a copy when you change the code. Thanks. 

 
 Q3：源代码可以在没有MFC的情况下使用吗？纯Win32和服务？ A3：代码被开发成与GUI一起使用很短的时间（不是几天或几年）。我开发了这种客户端/服务器解决方案，用于MFC环境中的GUI。当然，您可以将其用于正常的服务器解决方案。很多人都有只需删除MFC特定的东西作为CString，CPtrList等，并用Win32类替换它们。我也不喜欢MFC，所以当你更改代码的时候给我一个副本。谢谢。 

 
 Q4 
 : Excellent work! Thank you for this. When will you implement  
 AcceptEx(..) 
  instead of the connection listener thread? 

 
 A4 
 : As soon as the code is stable. It is quite stable right now, but I know that the combination of several I/O workers and several pending reads may cause some problems. I enjoy that you like my code. Please vote! 

 
 Q5 
 : Why start several I/O workers? Is this necessary if you don’t have a true multiprocessor computer? 

 
 A5 
 : No, it is not necessary to have several I/O workers. Just one thread can handle all the connections. On common home computers, one I/O worker gives the best performance. You do not need to worry about possible access violation threats either. But as computers are getting more powerful each day (e.g., hyperthreading, dual-core, etc.), why not have the possibility to have several threads? :=) 

 
 Q6 
 : Why use several pending reads? What is it good for? 

 
 A6 
 : That depends on the server development strategy that is adapted by the developer, namely “many concurrent connections” vs. “ high throughput server”. Having multiple pending reads increases the throughput of the server because the TCP/IP packages will be written directly into the passed buffer instead of to the TCP/IP stack (no double-buffering). If the server knows that clients send data in bursts, pending reads increase the performance (high throughput). However, every pending receive operation (with  
 WSARecv() 
 ) that occurs forces the kernel to lock the receive buffers into the non-paged pool. This may lead to a WSAENOBUFFS error when the physical memory is full (many concurrent connections). The use of pending reads/writes have to be done carefully, and aspects such as “page size on the architecture” and “the amount of non-paged pool (1/4 of the physical memory)” have to be taken into consideration. Furthermore, if you have more than one IO worker, the order of packages is broken (because of the  
 IOCP 
  structure), and the extra work to maintain the order makes it unnecessary to have multiple pending reads. In this design, multiple pending reads is turned off when the number of I/O workers is greater than one because the implementation can not handle the reordering. (The sequence number must exist in the payload instead.) 

 
 Q7 
 : In the previous article, you stated that we have to implement memory management using the  
 VirtualAlloc 
 function instead of  
 new 
 , why have you not implemented it? 

 
 A7 
 : When you allocate memory with  
 new 
 , the memory is allocated in the virtual memory or the physical memory. Where the memory is allocated is unknown, the memory can be allocated between two pages. This means that we load too much memory into the physical memory when we access a certain data (if we use  
 new 
 ). Furthermore, you do not know if the allocated memory is in physical memory or in virtual, and also you can not tell the system when "writing back" to hard disk is unnecessary (if we don’t care of the data in memory anymore). But be aware!! Any new allocation using  
 VirtualAlloc* 
  will always be rounded up to 64 KB (page file size) boundary so that if you allocate a new VAS region bound to the physical memory, the OS will consume an amount of physical memory rounded up to the page size, and will consume the VAS of the process rounded up to 64 KB boundary. Using 
 VirtualAlloc 
  can be difficult:  
 new 
  and  
 malloc 
  use  
 virualAlloc 
  internally, but every time you allocate memory with  
 new 
 / 
 delete 
 , a lot of other computation is done, and you do not have the control to put your data (data related to each other) nicely inside the same page (without overlapping two pages). However, heaps are best for managing large numbers of small objects, and I shall change the source code so it only uses  
 new 
 / 
 delete 
 because of code cleanness. I have found that the performance gain is too small relative when compared to the complexity of the source code.