A simple IOCP Server/Client Class

By spinoza.

This source code uses the advanced IOCP technology which can efficiently serve multiple clients. It also presents some solutions to practical problems that arise with the IOCP programming API, and provides a simple echo client/server with file transfer.



1.1 Requirements 环境要求

  • The article expects the reader to be familiar with C++, TCP/IP, socket programming, MFC, and multithreading.
  • The source code uses Winsock 2.0 and the IOCP technology, and requires:
    • Windows NT/2000 or later: Requires Windows NT 3.5 or later.
    • Windows 95/98/ME: Not supported.
    • Visual C++ .NET, or a fully updated Visual C++ 6.0.
       源码使用Winsock 2.0和IOCP技术,要求:
              Windows NT/2000或以上:要求Windows NT3.5或以后版本
              Windows 95/98/ME:不支持
          Visual C++.NET,或完整更新过的Visual C++ 6.0

1.2 Abstract 摘要

When you develop different types of software, sooner or later, you will have to deal with client/server development. To write a comprehensive client/server code is a difficult task for a programmer. This documentation presents a simple but powerful client/server source code that can be extended to any type of client/server application. This source code uses the advanced IOCP technology which can efficiently serve multiple clients. IOCP presents an efficient solution to the "one-thread-per-client" bottleneck problem (among others), using only a few processing threads and asynchronous input/output send/receive. The IOCP technology is widely used for different types of high performance servers as Apache etc. The source code also provides a set of functions that are frequently used while dealing with communication and client/server software as file receiving/transferring function and logical thread pool handling. This article focuses on the practical solutions that arise with the IOCP programming API, and also presents an overall documentation of the source code. Furthermore, a simple echo client/server which can handle multiple connections and file transfer is also presented here.


2.1 Introduction 引言

This article presents a class which can be used in both the client and server code. The class uses IOCP (Input Output Completion Ports) and asynchronous (non-blocking) function calls which are explained later. The source code is based on many other source codes and articles: [1, 2, and 3].


With this simple source code, you can:

  • Service or connect to multiple clients and servers.
  • Send or receive files asynchronously.
  • Create and manage a logical worker thread pool to process heavier client/server requests or computations.

本文提出了一个类,可以用在客户端和服务端。这个类使用IOCP(Input Output Completion Ports)和异步(非阻塞)机制。…
    ·    服务或连接多客户端和服务端
    ·    异步发送或接收文件
    ·    创建并管理一个逻辑工作者线程池,用以处理繁重的客户端/服务器请求或计算

It is difficult to find a comprehensive but simple source code to handle client/server communications. The source codes that are found on the net are either too complex (20+ classes), or don’t provide sufficient efficiency. This source code is designed to be as simple and well documented as possible. In this article, we will briefly present the IOCP technology provided by Winsock API 2.0, and also explain the thorny problems that arise while coding and the solution to each one of them.


找到一份全面但简单的解决客户端/服务器通信的源码是件困难的事情。在网络上找到的源码要么太复杂(超过20个类),要命没有提供足够的效率。本源码的设计尽可能简单,并提供了充足的文档。在这篇文章中,我们简洁的呈现出了Winsock API 2.0支持的IOCP技术,说明了在编写过程中出现的棘手问题,并提出了每一个问题的解决方案。


2.2 Introduction to asynchronous Input/Output Completion Ports (IOCP) 异步完成端口介绍

A server application is fairly meaningless if it cannot service multiple clients at the same time, usually asynchronous I/O calls and multithreading is used for this purpose. By definition, an asynchronous I/O call returns immediately, leaving the I/O call pending. At some point of time, the result of the I/O asynchronous call must be synchronized with the main thread. This can be done in different ways. The synchronization can be performed by:



  • Using events - A signal is set as soon as the asynchronous call is finished. The disadvantage of this approach is that the thread has to check or wait for the event to be set.
  • Using the GetOverlappedResult function - This approach has the same disadvantage as the approach above.
  • Using Asynchronous Procedure Calls (or APC) - There are several disadvantages associated with this approach. First, the APC is always called in the context of the calling thread, and second, in order to be able to execute the APCs, the calling thread has to be suspended in the so called alterable wait state.
  • Using IOCP - The disadvantage of this approach is that there are many practical thorny programming problems that must be solved. Coding IOCP can be a bit of a hassle.


> 使用事件 – 当异步请求结束时会马上触发一个信号。这种方式的缺点是线程必须检查并等待事件被触发

           > 使用GetOverlappedResult函数 – 这种方式与上一种方式有相同的缺点。

   > 使用Asynchronous Procedure Calls(或APC) – 这种方式有几个缺点。首先,APC总是在请求线程的上下文中被请求;第二,为了执行APC,请求线程必须在可变等候状态下挂起。

        > 使用IOCP – 这种方式的缺点是必须解决很多实际的棘手的编程问题。编写IOCP可能有点麻烦。


2.2.1 Why using IOCP? 为什么使用IOCP?

By using IOCP, we can overcome the "one-thread-per-client" problem. It is commonly known that the performance decreases heavily if the software does not run on a true multiprocessor machine. Threads are system resources that are neither unlimited nor cheap.

IOCP provides a way to have a few (I/O worker) threads handle multiple clients' input/output "fairly". The threads are suspended, and don't use the CPU cycles until there is something to do.





2.3 What is IOCP? 什么是IOCP?

We have already stated that IOCP is nothing but a thread synchronization object, similar to a semaphore, therefore IOCP is not a sophisticated concept. An IOCP object is associated with several I/O objects that support pending asynchronous I/O calls. A thread that has access to an IOCP can be suspended until a pending asynchronous I/O call is finished.




3 How does IOCP work? IOCP是怎样工作的?

To get more information on this part, I referred to other articles. [1, 2, 3, see References.]

While working with IOCP, you have to deal with three things, associating a socket to the completion port, making the asynchronous I/O call, and synchronization with the thread. To get the result from the asynchronous I/O call and to know, for example, which client has made the call, you have to pass two parameters: the CompletionKey parameter, and the OVERLAPPED structure.




3.1 The completion key parameter : 关键参数

The first parameter, the CompletionKey, is just a variable of type DWORD. You can pass whatever unique value you want to, that will always be associated with the object. Normally, a pointer to a structure or a class that can contain some client specific objects is passed with this parameter. In the source code, a pointer to a structure ClientContext is passed as the CompletionKey parameter.




3.2 The OVERLAPPED parameter : OVERLAPPED参数

This parameter is commonly used to pass the memory buffer that is used by the asynchronous I/O call. It is important to note that this data will be locked and is not paged out of the physical memory. We will discuss this later.


这个参数通常用来传递异步I/O请求使用的内存缓冲。很重要的一点是:该数据将会被锁定并不允许从物理内存中换出页面(page out)。


3.3 Associating a socket with the completion port :绑定一个socket到完成端口

Once a completion port is created, the association of a socket with the completion port can be done by calling the function CreateIoCompletionPort in the following way:


BOOL IOCPS::AssociateSocketWithCompletionPort(SOCKET socket, 
HANDLE hCompletionPort, DWORD dwCompletionKey)
HANDLE h = CreateIoCompletionPort((HANDLE) socket,
hCompletionPort, dwCompletionKey, m_nIOWorkers);
return h == hCompletionPort;

3.4 Making the asynchronous I/O call :响应异步I/O请求

To make the actual asynchronous call, the functions WSASend, WSARecv are called. They also need to have a parameter WSABUF, that contains a pointer to a buffer that is going to be used. A rule of thumb is that normally when the server/client wants to call an I/O operation, they are not made directly, but is posted into the completion port, and is performed by the I/O worker threads. The reason for this is, we want the CPU cycles to be partitioned fairly. The I/O calls are made by posting a status to the completion port, see below:


BOOL bSuccess = PostQueuedCompletionStatus(m_hCompletionPort, 
(DWORD) pContext, &pOverlapBuff->m_ol);

3.5 Synchronization with the thread:与线程同步

Synchronization with the I/O worker threads is done by calling the GetQueuedCompletionStatus function (see below). The function also provides the CompletionKey parameter and the OVERLAPPED parameter (see below):


BOOL GetQueuedCompletionStatus(
HANDLE CompletionPort, // handle to completion port
LPDWORD lpNumberOfBytes, // bytes transferred
PULONG_PTR lpCompletionKey, // file completion key
LPOVERLAPPED *lpOverlapped, // buffer
DWORD dwMilliseconds // optional timeout value

3.6 Four thorny IOCP coding hassles and their solutions:四个棘手的IOCP编码问题和解决方法

There are some problems that arise while using IOCP, some of them are not intuitive. In a multithreaded scenario using IOCPs, the control flow of a thread function is not straightforward, because there is no relationship between threads and communications. In this section, we will represent four different problems that can occur while developing client/server applications using IOCPs. They are:


  • The WSAENOBUFS error problem.(WSAENOBUFS错误问题)
  • The package reordering problem.(包重构问题)
  • The access violation problem.(访问非法问题)

3.6.1 The WSAENOBUFS error problem:WSAENOBUFS问题

This problem is non intuitive and difficult to detect, because at first sight, it seems to be a normal deadlock or a memory leakage "bug". Assume that you have developed your server and everything runs fine. When you stress test the server, it suddenly hangs. If you are lucky, you can find out that it has something to do with the WSAENOBUFS error.

这个问题通常很难靠直觉发现,因为当你第一次看见的时候你或许认为是一个内存泄露错误。假定已经开发完成了你的完成端口服务器并且运行的一切良好,但是当你对其进行压力测试的时候突然发现服务器被中止而不处理任何请求了,如果你运气好的话你会很快发现是因为WSAENOBUFS   错误而影响了这一切。

With every overlapped send or receive operation, it is possible that the data buffer submitted will be locked. When memory is locked, it cannot be paged out of physical memory. The operating system imposes a limit on the amount of memory that can be locked. When this limit is reached, the overlapped operations will fail with the WSAENOBUFS error.

每当我们重叠提交一个send或receive操作的时候,其中指定的发送或接收缓冲区就被锁定了。当内存缓冲区被锁定后,将不能从物理内存进行分页。操作系统有一个锁定最大数的限制,一旦超过这个锁定的限制,那么就会产生WSAENOBUFS   错误了。

If a server posts many overlapped receives on each connection, this limit will be reached when the number of connections grow. If a server anticipates handling a very high number of concurrent clients, the server can post a single zero byte receive on each connection. Because there is no buffer associated with the receive operation, no memory needs to be locked. With this approach, the per-socket receive buffer should be left intact because once the zero-byte receive operation is completed, the server can simply perform a non-blocking receive to retrieve all the data buffered in the socket's receive buffer. There is no more data pending when the non-blocking receive fails with WSAEWOULDBLOCK. This design would be for the one that requires the maximum possible concurrent connections while sacrificing the data throughput on each connection. Of course, the more you know about how the clients interact with the server, the better. In the previous example, a non-blocking receive was performed once the zero-byte receive completes retrieving the buffered data. If the server knows that clients send data in bursts, then once the zero-byte receive is completed, it may post one or more overlapped receives in case the client sends a substantial amount of data (greater than the per-socket receive buffer that is 8 KB by default).

如果一个服务器提交了非常多的重叠的receive在每一个连接上,那么限制会随着连接数的增长而变化。如果一个服务器能够预先估计可能会产生的最大并发连接数,服务器可以投递一个使用零缓冲区的receive在每一个连接上。因为当你提交操作没有缓冲区时,那么也不会存在内存被锁定了。使用这种办法后,当你的receive操作事件完成返回时,该socket底层缓冲区的数据会原封不动的还在其中而没有被读取到receive操作的缓冲区来。此时,服务器可以简单的调用非阻塞式的recv将存在socket缓冲区中的数据全部读出来,一直到recv返回   WSAEWOULDBLOCK   为止。 这种设计非常适合那些可以牺牲数据吞吐量而换取巨大 并发连接数的服务器。当然,你也需要意识到如何让客户端的行为尽量避免对服务器造成影响。在上一个例子中,当一个零缓冲区的receive操作被返回后使 用一个非阻塞的recv去读取socket缓冲区中的数据,如果服务器此时可预计到将会有爆发的数据流,那么可以考虑此时投递一个或者多个receive 来取代非阻塞的recv来进行数据接收。(这比你使用1个缺省的8K缓冲区来接收要好的多。)

A simple practical solution to the WSAENOBUFS error problem is in the source code provided. We perform an asynchronous WSARead(..) (see OnZeroByteRead(..)) with a zero byte buffer. When this call completes, we know that there is data in the TCP/IP stack, and we read it by performing several asynchronous WSARead(..) with a buffer of MAXIMUMPACKAGESIZE. This solution locks physical memory only when data arrives, and solves the WSAENOBUFS problem. But this solution decreases the throughput of the server (see Q6 and A6 in section 9 F.A.Q).


    投递使用空缓冲区的   recevie操作,当操作返回后,使用非阻塞的recv来进行真实数据的读取。因此在完成端口的每一个连接中需要使用一个循环的操作来不断的来提交空缓冲区的receive操作。

3.6.2 The package reordering problem:包重构问题

This problem is also being discussed by [3]. Although committed operations using the IO completion port will always be completed in the order they were submitted, thread scheduling issues may mean that the actual work associated with the completion is processed in an undefined order. For example, if you have two I/O worker threads and you should receive "byte chunk 1, byte chunk 2, byte chunk 3", you may process the byte chunks in the wrong order, namely, "byte chunk 2, byte chunk 1, byte chunk 3". This also means that when you are sending the data by posting a send request on the I/O completion port, the data can actually be sent in a reordered way.

... ... 尽管使用IO完成端口的待发操作将总是按照他们发送的顺序来完成,线程调度安排可能使绑定到完成端口的实际工作不按指定的顺序来处理。例如,如果你有两个I/O工作者线程,你可能接收到“字节块2,字节块1,字节块3”。这就意味着:当你通过向I/O完成端口提交请求数据发送数据时,数据实际上用重新排序过的顺序发送了。

This can be solved by only using one worker thread, and committing only one I/O call and waiting for it to finish, but if we do this, we lose all the benefits of IOCP.


A simple practical solution to this problem is to add a sequence number to our buffer class, and process the data in the buffer if the buffer sequence number is in order. This means that the buffers that have incorrect numbers have to be saved for later use, and because of performance reasons, we will save the buffers in a hash map object (e.g., m_SendBufferMap and m_ReadBufferMap).

解决这个问题的一个简单实用办法是给我们的缓冲类添加一个顺序数字,如果缓冲顺序数字是正确的,则处理缓冲中的数据。这意味着:有不正确的数字的缓冲将被存下来以后再用,并且因为执行原因,我们保存缓存到一个HASH MAP对象中(如m_SendBufferMap 和 m_ReadBufferMap)。

To get more information about this solution, please go through the source code, and take a look at the following functions in the IOCPS class:


  • GetNextSendBuffer (..) and GetNextReadBuffer(..), to get the ordered send or receive buffer.
  • IncreaseReadSequenceNumber(..) and IncreaseSendSequenceNumber(..), to increase the sequence numbers.

3.6.3 Asynchronous pending reads and byte chunk package processing problem:异步等待读 和 字节块包处理问题

The most common server protocol is a packet based protocol where the first X bytes represent a header and the header contains details of the length of the complete packet. The server can read the header, work out how much more data is required, and keep reading until it has a complete packet. This works fine when the server is making one asynchronous read call at a time. But if we want to use the IOCP server's full potential, we should have several pending asynchronous reads waiting for data to arrive. This means that several asynchronous reads complete out of order (as discussed before in section 3.6.2), and byte chunk streams returned by the pending reads will not be processed in order. Furthermore, a byte chunk stream can contain one or several packages and also partial packages, as shown in figure 1.


Figure 1. The figure shows how partial packages (green) and complete packages (yellow) can arrive asynchronously in different byte chunk streams (marked 1, 2, 3).


This means that we have to process the byte stream chunks in order to successfully read a complete package. Furthermore, we have to handle partial packages (marked with green in figure 1). This makes the byte chunk package processing more difficult. The full solution to this problem can be found in the ProcessPackage(..) function in the IOCPS class.



3.6.4 The access violation problem :访问非法问题

This is a minor problem, and is a result of the design of the code, rather than an IOCP specific problem. Suppose that a client connection is lost and an I/O call returns with an error flag, then we know that the client is gone. In the parameter CompletionKey, we pass a pointer to a structure ClientContext that contains client specific data. What happens if we free the memory occupied by this ClientContext structure, and some other I/O call performed by the same client returns with an error code, and we transform the parameter CompletionKey variable of DWORD to a pointer to ClientContext, and try to access or delete it? An access violation occurs!



The solution to this problem is to add a number to the structures that contain the number of pending I/O calls (m_nNumberOfPendlingIO), and we delete the structure when we know that there are no more pending I/O calls. This is done by the EnterIoLoop(..) function and ReleaseClientContext(..).


这个问题的解决方法是添加一个数字到结构中,包含等待的I/O请求的数量(m_nNumberOfPendingIO),然后当我们知道没有等待的I/O请求时删除这个结构。这个功能通过函数EnterIoLoop(…) 和ReleaseClientContext(…)来实现。

3.7 The overview of the source code:源码略读

The goal of the source code is to provide a set of simple classes that handle all the hassled code that has to do with IOCP. The source code also provides a set of functions which are frequently used while dealing with communication and client/server software as file receiving/transferring functions, logical thread pool handling, etc..






Figure 2. The figure above illustrates an overview of the IOCP class source code functionality.


We have several IO worker threads that handle asynchronous I/O calls through the completion port (IOCP), and these workers call some virtual functions which can put requests that need a large amount of computation in a work queue. The logical workers take the job from the queue, and process it and send back the result by using some of the functions provided by the class. The Graphical User Interface (GUI) usually communicates with the main class using Windows messages (because MFC is not thread safe) and by calling functions or by using the shared variables.






    > CIOCPBuffer:管理异步请求的缓存的类。
    > IOCPS:处理所有通信的主类。
    > JobItem:保存逻辑工作者线程要处理的任务的结构。
    > ClientContex:保存客户端特定信息的结构(如状态、数据,等等)。

Figure 3. The figure above shows the class overview. 上图显示了类结构纵览。

The classes that can be observed in figure 3 are:

  • CIOCPBuffer: A class used to manage the buffers used by the asynchronous I/O calls.
  • IOCPS: The main class that handles all the communication.
  • JobItem: A structure which contains the job to be performed by the logical worker threads.
  • ClientContext: A structure that holds client specific information (status, data, etc.).


结构层次及相互联系 (1)、工作线程:响应连接的IO投递返回并负责投递读请求,并将IO返回结果投递给处理线程,可设定参数决定工作线程数量; (2)、处理线程:处理线程调用回调函数将信息传递给应用层或协议栈,可设定参数决定工作处理数量; (3)、看守线程:响应Accept事件调用AcceptEx,检测连接和心跳超时 ,将信息投递给工作线程,模块仅有一个看守线程。 1. 技术要求 (1)、线程同步:Lock指令、临界段; (2)、主要Socket API:WSASend、WSARecv、AcceptEx、DisconnectEx; (3)、内存管理:连接池(句柄重用)、内存池; (4)、数据0拷贝:通过内置处理线程,上层应用可以避免自建线程池及复制数据的过程。同时提供GBuf内存分配功能,应用层获得分配地址及填充数据之后亦可直接投递给内核/驱动层; (5)、数据顺序同步:同一个连接同时只有一个处理线程响应其IO事件; (6)、IO请求投递:单投递读、多投递写; (7)、0缓冲读投递:可条件编译实现,以适用大规模连接要求。 (8)、超时机制:可设置空连接(连接不发送数据)超时时间以防止DOS攻击,也可设置心跳超时时间防止网络故障导致的现有连接成为虚连接避免耗尽系统资源。 (9)、接口技术:API、回调函数、客户句柄(客户连接句柄)。 (10)、主、被动发送:不使用HASH、MAP及LIST技术,即可提供安全可靠高效的客户连接句柄,以实现服务器端主被动发送数据功能; (11)、PerHandleData的回收不以IO投递的计数器或链表来做依据但仍能安全回收,同时尽量避免在高频的读写操作时做其他无关的操作以提高读写效率。 (12)、处理线程和工作线程有着良好分工界限,繁重的工作交给处理线程完成,工作线程工作量最大限度的减少,仅响应投递返回及读投递的操作; (13)、支持AWE,模块自动识别AWE是否开启(需手动开启),“否”则使用虚拟内存机制。 2. 功能要求 (1)、多IP多端口监听,每个监听可设置不同的回调函数,以高效的区别处理数据 (2)、可设置每秒最大的连接并发量和空连接(连接不发数据)超时时间以防止DOS攻击造成的服务瘫痪、具有心跳处理(防网络异常造成的虚连接)功能 (3)、不加协议的透明传输,可适用广泛的网络通讯环境 (4)、可现实主、被动发送数据,但不会因兼顾主动发送而额外增加降低效率的工作 (5)、内置处理线程,上层应用可不必自建线程池处理数据,所有IO事件按顺序调用回调函数并可以在回调函数内直接处理数据,不必担心多线程造成的接收数据乱序的问题。 (6)、高效率的数据对应关联机制,在初次连接并根据登录数据设置每个连接对应的宿主(Owner)之后,再接收的数据即可立即获得该连接对应的宿主,而不必再做额外的查询工作,并且模块内部采用的是指针关联方式,对于长连接、主动发送的服务器系统而言是高效率的。 (7)、可兼容IPv6 3. 注意事项 因硬件环境和应用环境不同,不合理的配置会出现效率及性能上的问题,因此以下情况出现时,请务必与作者联系以确保获得更好的参数配置: (1)、连接量超过1000个的。超过的应结合具体硬件配置和网络带宽等因素综合设定运行参数。 (2)、带宽使用率超过20%的。工作线程和处理线程数量的设置也是综合考虑数据吞吐量和数据处理负载的因素来设置的,过多的线程会在调度上浪费时间,同时也应该综合考虑线程优先级别来设置工作线程和处理线程数量,两者的设置也不一定能相等。 (3)、服务器端有主动发送需求的、短连接(含网络故障造成的连接断开)出现频率高的。 压力测试工具介绍: 一、 使用G-TcpClient模块 二、 可以设定间隔时间发起大规模长、短连接 三、 可以发起密集数据包,包括即时和定时发送,1M的光纤带宽最大可以达到100K/S(单向)以上,100M本地网最大可以达到10M/S(单向)以上 四、 数据发送仅由一个独立线程但当,每点击一次Connect就创建一个线程根据当前参数发起连接。 五、 测试前提:服务器接收客户端数据后立即原样返回给客户端




