转载  A simple IOCP Server/Client Class 收藏

1.1 Requirements

  • The article expects the reader to be familiar with C++, TCP/IP, socket programming, MFC, and multithreading.
  • The source code uses Winsock 2.0 and the IOCP technology, and requires:
    • Windows NT/2000 or later: Requires Windows NT 3.5 or later.
    • Windows 95/98/ME: Not supported.
    • Visual C++ .NET, or a fully updated Visual C++ 6.0.

1.2 Abstract

When you develop different types of software, sooner or later, you will have to deal with client/server development. To write a comprehensive client/server code is a difficult task for a programmer. This documentation presents a simple but powerful client/server source code that can be extended to any type of client/server application. This source code uses the advanced IOCP technology which can efficiently serve multiple clients. IOCP presents an efficient solution to the "one-thread-per-client" bottleneck problem (among others), using only a few processing threads and asynchronous input/output send/receive. The IOCP technology is widely used for different types of high performance servers as Apache etc. The source code also provides a set of functions that are frequently used while dealing with communication and client/server software as file receiving/transferring function and logical thread pool handling. This article focuses on the practical solutions that arise with the IOCP programming API, and also presents an overall documentation of the source code. Furthermore, a simple echo client/server which can handle multiple connections and file transfer is also presented here.

2.1 Introduction

This article presents a class which can be used in both the client and server code. The class uses IOCP (Input Output Completion Ports) and asynchronous (non-blocking) function calls which are explained later. The source code is based on many other source codes and articles: [1, 2, and 3].

With this simple source code, you can:

  • Service or connect to multiple clients and servers.
  • Send or receive files asynchronously.
  • Create and manage a logical worker thread pool to process heavier client/server requests or computations.

It is difficult to find a comprehensive but simple source code to handle client/server communications. The source codes that are found on the net are either too complex (20+ classes), or don’t provide sufficient efficiency. This source code is designed to be as simple and well documented as possible. In this article, we will briefly present the IOCP technology provided by Winsock API 2.0, and also explain the thorny problems that arise while coding and the solution to each one of them.

2.2 Introduction to asynchronous Input/Output Completion Ports (IOCP)

A server application is fairly meaningless if it cannot service multiple clients at the same time, usually asynchronous I/O calls and multithreading is used for this purpose. By definition, an asynchronous I/O call returns immediately, leaving the I/O call pending. At some point of time, the result of the I/O asynchronous call must be synchronized with the main thread. This can be done in different ways. The synchronization can be performed by:

  • Using events - A signal is set as soon as the asynchronous call is finished. The disadvantage of this approach is that the thread has to check or wait for the event to be set.
  • Using the GetOverlappedResult function - This approach has the same disadvantage as the approach above.
  • Using Asynchronous Procedure Calls (or APC) - There are several disadvantages associated with this approach. First, the APC is always called in the context of the calling thread, and second, in order to be able to execute the APCs, the calling thread has to be suspended in the so called alterable wait state.
  • Using IOCP - The disadvantage of this approach is that there are many practical thorny programming problems that must be solved. Coding IOCP can be a bit of a hassle.

2.2.1 Why using IOCP?

By using IOCP, we can overcome the "one-thread-per-client" problem. It is commonly known that the performance decreases heavily if the software does not run on a true multiprocessor machine. Threads are system resources that are neither unlimited nor cheap.

IOCP provides a way to have a few (I/O worker) threads handle multiple clients' input/output "fairly". The threads are suspended, and don't use the CPU cycles until there is something to do.

2.3 What is IOCP?

We have already stated that IOCP is nothing but a thread synchronization object, similar to a semaphore, therefore IOCP is not a sophisticated concept. An IOCP object is associated with several I/O objects that support pending asynchronous I/O calls. A thread that has access to an IOCP can be suspended until a pending asynchronous I/O call is finished.

3 How does IOCP work?

To get more information on this part, I referred to other articles. [1, 2, 3, see References.]

While working with IOCP, you have to deal with three things, associating a socket to the completion port, making the asynchronous I/O call, and synchronization with the thread. To get the result from the asynchronous I/O call and to know, for example, which client has made the call, you have to pass two parameters: the CompletionKey parameter, and the OVERLAPPED structure.

3.1 The completion key parameter

The first parameter, the CompletionKey, is just a variable of type DWORD. You can pass whatever unique value you want to, that will always be associated with the object. Normally, a pointer to a structure or a class that can contain some client specific objects is passed with this parameter. In the source code, a pointer to a structure ClientContext is passed as the CompletionKey parameter.

3.2 The OVERLAPPED parameter

This parameter is commonly used to pass the memory buffer that is used by the asynchronous I/O call. It is important to note that this data will be locked and is not paged out of the physical memory. We will discuss this later.

3.3 Associating a socket with the completion port

Once a completion port is created, the association of a socket with the completion port can be done by calling the function CreateIoCompletionPort in the following way:

Collapse Copy Code
BOOL IOCPS::AssociateSocketWithCompletionPort(SOCKET socket, 
               HANDLE hCompletionPort, DWORD dwCompletionKey)
   {
       HANDLE h = CreateIoCompletionPort((HANDLE) socket, 
             hCompletionPort, dwCompletionKey, m_nIOWorkers);
       return h == hCompletionPort;
   }

3.4 Making the asynchronous I/O call

To make the actual asynchronous call, the functions WSASend, WSARecv are called. They also need to have a parameter WSABUF, that contains a pointer to a buffer that is going to be used. A rule of thumb is that normally when the server/client wants to call an I/O operation, they are not made directly, but is posted into the completion port, and is performed by the I/O worker threads. The reason for this is, we want the CPU cycles to be partitioned fairly. The I/O calls are made by posting a status to the completion port, see below:

Collapse Copy Code
BOOL bSuccess = PostQueuedCompletionStatus(m_hCompletionPort, 
                       pOverlapBuff->GetUsed(), 
                       (DWORD) pContext, &pOverlapBuff->m_ol);

3.5 Synchronization with the thread

Synchronization with the I/O worker threads is done by calling the GetQueuedCompletionStatus function (see below). The function also provides the CompletionKey parameter and the OVERLAPPED parameter (see below):

Collapse Copy Code
BOOL GetQueuedCompletionStatus(
   HANDLE CompletionPort, // handle to completion port

   LPDWORD lpNumberOfBytes, // bytes transferred

   PULONG_PTR lpCompletionKey, // file completion key

   LPOVERLAPPED *lpOverlapped, // buffer

   DWORD dwMilliseconds // optional timeout value

   );

3.6 Four thorny IOCP coding hassles and their solutions

There are some problems that arise while using IOCP, some of them are not intuitive. In a multithreaded scenario using IOCPs, the control flow of a thread function is not straightforward, because there is no relationship between threads and communications. In this section, we will represent four different problems that can occur while developing client/server applications using IOCPs. They are:

  • The WSAENOBUFS error problem.
  • The package reordering problem.
  • The access violation problem.

3.6.1 The WSAENOBUFS error problem

This problem is non intuitive and difficult to detect, because at first sight, it seems to be a normal deadlock or a memory leakage "bug". Assume that you have developed your server and everything runs fine. When you stress test the server, it suddenly hangs. If you are lucky, you can find out that it has something to do with the WSAENOBUFS error.

With every overlapped send or receive operation, it is possible that the data buffer submitted will be locked. When memory is locked, it cannot be paged out of physical memory. The operating system imposes a limit on the amount of memory that can be locked. When this limit is reached, the overlapped operations will fail with the WSAENOBUFS error.

If a server posts many overlapped receives on each connection, this limit will be reached when the number of connections grow. If a server anticipates handling a very high number of concurrent clients, the server can post a single zero byte receive on each connection. Because there is no buffer associated with the receive operation, no memory needs to be locked. With this approach, the per-socket receive buffer should be left intact because once the zero-byte receive operation is completed, the server can simply perform a non-blocking receive to retrieve all the data buffered in the socket's receive buffer. There is no more data pending when the non-blocking receive fails with WSAEWOULDBLOCK. This design would be for the one that requires the maximum possible concurrent connections while sacrificing the data throughput on each connection. Of course, the more you know about how the clients interact with the server, the better. In the previous example, a non-blocking receive was performed once the zero-byte receive completes retrieving the buffered data. If the server knows that clients send data in bursts, then once the zero-byte receive is completed, it may post one or more overlapped receives in case the client sends a substantial amount of data (greater than the per-socket receive buffer that is 8 KB by default).

A simple practical solution to the WSAENOBUFS error problem is in the source code provided. We perform an asynchronous WSARead(..) (see OnZeroByteRead(..)) with a zero byte buffer. When this call completes, we know that there is data in the TCP/IP stack, and we read it by performing several asynchronous WSARead(..) with a buffer of MAXIMUMPACKAGESIZE. This solution locks physical memory only when data arrives, and solves the WSAENOBUFS problem. But this solution decreases the throughput of the server (see Q6 and A6 in section 9 F.A.Q).

3.6.2 The package reordering problem

This problem is also being discussed by [3]. Although committed operations using the IO completion port will always be completed in the order they were submitted, thread scheduling issues may mean that the actual work associated with the completion is processed in an undefined order. For example, if you have two I/O worker threads and you should receive "byte chunk 1, byte chunk 2, byte chunk 3", you may process the byte chunks in the wrong order, namely, "byte chunk 2, byte chunk 1, byte chunk 3". This also means that when you are sending the data by posting a send request on the I/O completion port, the data can actually be sent in a reordered way.

This can be solved by only using one worker thread, and committing only one I/O call and waiting for it to finish, but if we do this, we lose all the benefits of IOCP.

A simple practical solution to this problem is to add a sequence number to our buffer class, and process the data in the buffer if the buffer sequence number is in order. This means that the buffers that have incorrect numbers have to be saved for later use, and because of performance reasons, we will save the buffers in a hash map object (e.g., m_SendBufferMap and m_ReadBufferMap).

To get more information about this solution, please go through the source code, and take a look at the following functions in the IOCPS class:

  • GetNextSendBuffer (..) and GetNextReadBuffer(..), to get the ordered send or receive buffer.
  • IncreaseReadSequenceNumber(..) and IncreaseSendSequenceNumber(..), to increase the sequence numbers.

3.6.3 Asynchronous pending reads and byte chunk package processing problem

The most common server protocol is a packet based protocol where the first X bytes represent a header and the header contains details of the length of the complete packet. The server can read the header, work out how much more data is required, and keep reading until it has a complete packet. This works fine when the server is making one asynchronous read call at a time. But if we want to use the IOCP server's full potential, we should have several pending asynchronous reads waiting for data to arrive. This means that several asynchronous reads complete out of order (as discussed before in section 3.6.2), and byte chunk streams returned by the pending reads will not be processed in order. Furthermore, a byte chunk stream can contain one or several packages and also partial packages, as shown in figure 1.

Figure 1. The figure shows how partial packages (green) and complete packages (yellow) can arrive asynchronously in different byte chunk streams (marked 1, 2, 3).

This means that we have to process the byte stream chunks in order to successfully read a complete package. Furthermore, we have to handle partial packages (marked with green in figure 1). This makes the byte chunk package processing more difficult. The full solution to this problem can be found in the ProcessPackage(..) function in the IOCPS class.

3.6.4 The access violation problem

This is a minor problem, and is a result of the design of the code, rather than an IOCP specific problem. Suppose that a client connection is lost and an I/O call returns with an error flag, then we know that the client is gone. In the parameter CompletionKey, we pass a pointer to a structure ClientContext that contains client specific data. What happens if we free the memory occupied by this ClientContext structure, and some other I/O call performed by the same client returns with an error code, and we transform the parameter CompletionKey variable of DWORD to a pointer to ClientContext, and try to access or delete it? An access violation occurs!

The solution to this problem is to add a number to the structures that contain the number of pending I/O calls (m_nNumberOfPendlingIO), and we delete the structure when we know that there are no more pending I/O calls. This is done by the EnterIoLoop(..) function and ReleaseClientContext(..).

3.7 The overview of the source code

The goal of the source code is to provide a set of simple classes that handle all the hassled code that has to do with IOCP. The source code also provides a set of functions which are frequently used while dealing with communication and client/server software as file receiving/transferring functions, logical thread pool handling, etc..

Figure 2. The figure above illustrates an overview of the IOCP class source code functionality.

We have several IO worker threads that handle asynchronous I/O calls through the completion port (IOCP), and these workers call some virtual functions which can put requests that need a large amount of computation in a work queue. The logical workers take the job from the queue, and process it and send back the result by using some of the functions provided by the class. The Graphical User Interface (GUI) usually communicates with the main class using Windows messages (because MFC is not thread safe) and by calling functions or by using the shared variables.

Figure 3. The figure above shows the class overview.

The classes that can be observed in figure 3 are:

  • CIOCPBuffer: A class used to manage the buffers used by the asynchronous I/O calls.
  • IOCPS: The main class that handles all the communication.
  • JobItem: A structure which contains the job to be performed by the logical worker threads.
  • ClientContext: A structure that holds client specific information (status, data, etc.).

3.7.1 The buffer design – The CIOCPBuffer class

When using asynchronous I/O calls, we have to provide a private buffer to be used with the I/O operation. There are some considerations that are to be taken into account when we allocate buffers to use:

  • To allocate and free memory is expensive, therefore we should reuse buffers (memory) which have been allocated. Therefore, we save buffers in the linked list structures given below:
    Collapse Copy Code
    // Free Buffer List.. 
    
       CCriticalSection m_FreeBufferListLock;
       CPtrList m_FreeBufferList;
    // OccupiedBuffer List.. (Buffers that is currently used) 
    
       CCriticalSection m_BufferListLock;
       CPtrList m_BufferList; 
    // Now we use the function AllocateBuffer(..) 
    
    // to allocate memory or reuse a buffer.
  • Sometimes, when an asynchronous I/O call is completed, we may have partial packages in the buffer, therefore the need to split the buffer to get a complete message. This is done by the SplitBuffer function in the CIOCPS class. Also, sometimes we need to copy information between the buffer, and this is done by the AddAndFlush(..) function in the IOCPS class.
  • As we know, we also need to add a sequence number and a state (IOType variable, IOZeroReadCompleted, etc.) to our buffer.
  • We also need methods to convert data to byte stream and byte stream to data, some of these functions are also provided in the CIOCPBuffer class.

All the solutions to the problems we have discussed above exist in the CIOCPBuffer class.

3.8 How to use the source code?

By inheriting your own class from IOCP (shown in figure 3) and using the virtual functions and the functionality provided by the IOCPS class (e.g., threadpool), it is possible to implement any type of server or client that can efficiently manage a huge number of connections by using only a few number of threads.

3.8.1 Starting and closing the server/client

To start the server, call the function:

Collapse Copy Code
BOOL Start(int nPort=999,int iMaxNumConnections=1201,
   int iMaxIOWorkers=1,int nOfWorkers=1,
   int iMaxNumberOfFreeBuffer=0,
   int iMaxNumberOfFreeContext=0,
   BOOL bOrderedSend=TRUE, 
   BOOL bOrderedRead=TRUE,
   int iNumberOfPendlingReads=4);
  • nPortt

    Is the port number that the server will listen on. (Let it be -1 for client mode.)

  • iMaxNumConnections

    Maximum number of connections allowed. (Use a big prime number.)

  • iMaxIOWorkers

    Number of Input/Output worker threads.

  • nOfWorkers

    Number of logical workers. (Can be changed at runtime.)

  • iMaxNumberOfFreeBuffer

    Maximum number of buffers that we save for reuse. (-1 for none, 0= Infinite number)

  • iMaxNumberOfFreeContext

    Maximum number of client information objects that are saved for reuse. (-1 for none, 0= Infinite number)

  • bOrderedRead

    Make sequential reads. (We have discussed this before in section 3.6.2.)

  • bOrderedSend

    Make sequential writes. (We have discussed this before in section 3.6.2.)

  • iNumberOfPendlingReads

    Number of pending asynchronous read loops that are waiting for data.

To connect to a remote connection (Client mode nPort= -1), call the function:

Collapse Copy Code
Connect(const CString &strIPAddr, int nPort)
  • strIPAddr

    The IP address of the remote server.

  • nPort

    The port.

To close, make the server call the function: ShutDown().

For example:

Collapse Copy Code
MyIOCP m_iocp;
if(!m_iocp.Start(-1,1210,2,1,0,0))
AfxMessageBox("Error could not start the Client");
….
m_iocp.ShutDown();

4.1 Source code description

For more details about the source code, please check the comments in the source code.

4.1.1 Virtual functions

  • NotifyNewConnection

    Called when a new connection has been established.

  • NotifyNewClientContext

    Called when an empty ClientContext structure is allocated.

  • NotifyDisconnectedClient

    Called when a client disconnects.

  • ProcessJob

    Called when logical workers want to process a job.

  • NotifyReceivedPackage

    Notifies that a new package has arrived.

  • NotifyFileCompleted

    Notifies that a file transfer has finished.

4.1.2 Important variables

Notice that all the variables have to be exclusively locked by the function that uses the shared variables, this is important to avoid access violations and overlapping writes. All the variables with name XXX, that are needed to be locked, have a XXXLock variable.

  • m_ContextMapLock;

    Holds all the client data (socket, client data, etc.).

  • ContextMap m_ContextMap;
  • m_NumberOfActiveConnections

    Holds the number of connected connections.

4.1.3 Important functions

  • GetNumberOfConnections()

    Returns the number of connections.

  • CString GetHostAdress(ClientContext* p)

    Returns the host address, given a client context.

  • BOOL ASendToAll(CIOCPBuffer *pBuff);

    Sends the content of the buffer to all the connected clients.

  • DisconnectClient(CString sID)

    Disconnects a client, given the unique identification number.

  • CString GetHostIP()

    Returns the local IP number.

  • JobItem* GetJob()

    Removes a JobItem from the queue, returns NULL if there are no Jobs.

  • BOOL AddJob(JobItem *pJob)

    Adds a Job to the queue.

  • BOOL SetWorkers(int nThreads)

    Sets the number of logical workers that can be called anytime.

  • DisconnectAll();

    Disconnect all the clients.

  • ARead(…)

    Makes an asynchronous read.

  • ASend(…)

    Makes an asynchronous send. Sends data to a client.

  • ClientContext* FindClient(CString strClient)

    Finds a client given a string ID. OBS! Not thread safe!

  • DisconnectClient(ClientContext* pContext, BOOL bGraceful=FALSE);

    Disconnects a client.

  • DisconnectAll()

    Disconnects all the connected clients.

  • StartSendFile(ClientContext *pContext)

    Sends a file specified in the ClientContext structure, using the optimized transmitfile(..) function.

  • PrepareReceiveFile(..)

    Prepares the connection for receiving a file. When you call this function, all incoming byte streams are written to a file.

  • PrepareSendFile(..)

    Opens a file and sends a package containing information about the file to the remote connection. The function also disables the ASend(..) function until the file is transmitted or aborted.

  • DisableSendFile(..)

    Disables send file mode.

  • DisableRecevideFile(..)

    Disables receive file mode.

5.1 File transfer

File transfer is done by using the Winsock 2.0 TransmitFile function. The TransmitFile function transmits file data over a connected socket handle. This function uses the operating system's cache manager to retrieve file data, and provides high-performance file data transfer over sockets. These are some important aspects of asynchronous file transferring:

  • Unless the TransmitFile function is returned, no other sends or writes to the socket should be performed because this will corrupt the file. Therefore, all the calls to ASend will be disabled after the PrepareSendFile(..) function.
  • Since the operating system reads the file data sequentially, you can improve caching performance by opening the file handle with FILE_FLAG_SEQUENTIAL_SCAN.
  • We are using the kernel asynchronous procedure calls while sending the file (TF_USE_KERNEL_APC). Use of TF_USE_KERNEL_APC can deliver significant performance benefits. It is possible (though unlikely), however, that the thread in which the context TransmitFile is initiated is being used for heavy computations; this situation may prevent APCs from launching.

The file transfer is made in this order: the sever initializes the file transfer by calling the PrepareSendFile(..) function. When the client receives the information about the file, it prepares for it by calling the PrepareReceiveFile(..), and sends a package to the sever to start the file transfer. When the package arrives at the server side, the server calls the StartSendFile(..) function that uses the high performance TransmitFile function to transmit the specified file.

6 The source code example

The provided source code example is an echo client/server that also supports file transmission (figure 4). In the source code, a class MyIOCP inherited from IOCP handles the communication between the client and the server, by using the virtual functions mentioned in section 4.1.1.

The most important part of the client or server code is the virtual function NotifyReceivedPackage, as described below:

Collapse Copy Code
void MyIOCP::NotifyReceivedPackage(CIOCPBuffer *pOverlapBuff,
                           int nSize,ClientContext *pContext)
   {
       BYTE PackageType=pOverlapBuff->GetPackageType();
       switch (PackageType)
       {
         case Job_SendText2Client :
             Packagetext(pOverlapBuff,nSize,pContext);
             break;
         case Job_SendFileInfo :
             PackageFileTransfer(pOverlapBuff,nSize,pContext);
             break; 
         case Job_StartFileTransfer: 
             PackageStartFileTransfer(pOverlapBuff,nSize,pContext);
             break;
         case Job_AbortFileTransfer:
             DisableSendFile(pContext);
             break;};
   }

The function handles an incoming message and performs the request sent by the remote connection. In this case, it is only a matter of a simple echo or file transfer. The source code is divided into two projects, IOCP and IOCPClient, which are the server and the client side of the connection.

6.1 Compiler issues

When compiling with VC++ 6.0 or .NET, you may get some strange errors dealing with the CFile class, as:

Collapse Copy Code
“if (pContext->m_File.m_hFile != 
INVALID_HANDLE_VALUE) <-error C2446: '!=' : no conversion "
"from 'void *' to 'unsigned int'”

This problems can be solved if you update the header files (*.h) or your VC++ 6.0 version, or just change the type conversion error. After some modifications, the server/client source code can be used without MFC.

7 Special considerations & rule of thumbs

When you are using this code in other types of applications, there are some programming traps related to this source code and "multithreaded programming" that can be avoided. Nondeterministic errors are errors that occur stochastically “Randomly”, and it is hard to reproduce these nondeterministic errors by performing the same sequence of tasks that created the error. These types of errors are the worst types of errors that exist, and usually, they occur because of errors in the core design implementation of the source code. When the server is running multiple IO working threads, serving clients that are connected, nondeterministic errors as access violations can occur if the programmer has not thought about the source code multithread environment.

Rule of thumb #1:

Never read/write to the client context (e.g., ClientContext) with out locking it using the context lock as in the example below. The notification function (e.g., Notify*(ClientContext *pContext)) is already “thread safe”, and you can access the members of ClientContext without locking and unlocking the context.

Collapse Copy Code
//Do not do it in this way

//

If(pContext->m_bSomeData)
pContext->m_iSomeData=0;
//
Collapse Copy Code
// Do it in this way. 

//….

pContext->m_ContextLock.Lock(); 
If(pContext->m_bSomeData) 
pContext->m_iSomeData=0; 
pContext->m_ContextLock.Unlock(); 
//

Also, be aware that when you are locking a Context, other threads or GUI would be waiting for it.

Rule of thumb #2:

Avoid or "use with special care" code that has complicated "context locks" or other types of locks inside a “context lock”, because this may lead to a “deadlock” (e.g., A waiting for B that is waiting for C that is waiting for A => deadlock).

Collapse Copy Code
pContext-> m_ContextLock.Lock();
//… code code .. 

pContext2-> m_ContextLock.Lock(); 
// code code.. 

pContext2-> m_ContextLock.Unlock(); 
// code code.. 

pContext-> m_ContextLock.Unlock();

The code above may cause a deadlock.

Rule of thumb #3:

Never access a client context outside the notification functions (e.g., Notify*(ClientContext *pContext)). If you do, you have to enclose it with m_ContextMapLock.Lock();m_ContextMapLock.Unlock();. See the source code below.

Collapse Copy Code
ClientContext* pContext=NULL ; 
m_ContextMapLock.Lock(); 
pContext = FindClient(ClientID); 
// safe to access pContext, if it is not NULL

// and are Locked (Rule of thumbs#1:) 

//code .. code.. 

m_ContextMapLock.Unlock(); 
// Here pContext can suddenly disappear because of disconnect. 

// do not access pContext members here.

8 Future work

In future, the source code will be updated to have the following features in chronological order:

  1. The implementation of AcceptEx(..) function to accept new connections will be added to the source code, to handle short lived connection bursts and DOS attacks.
  2. The source code will be portable to other platforms as Win32, STL, and WTL.

9 F.A.Q

Q1: The amount of Memory used (server program is rising steadily on increase in client connections, as seen using the 'Windows Task Manager'. Even if clients disconnect, the amount of memory used does not decrease. What's the problem?

A1: The code tries to reuse the allocated buffers instead of releasing and reallocating it. You can change this by altering the parameters, iMaxNumberOfFreeBuffer and iMaxNumberOfFreeContext. Please review section 3.8.1.

Q2: I get compilation errors under .NET: "error C2446: '!=' : no conversion from 'unsigned int' to 'HANDLE'" etc.. What is the problem?

A2: This is because of the different header versions of the SDK. Just change the conversion to HANDLE so the compiler gets happy. You can also just remove the line #define TRANSFERFILEFUNCTIONALITY and try to compile.

Q3: Can the source code be used without MFC? Pure Win32 and in a service?

A3: The code was developed to be used with a GUI for a short time (not days or years). I developed this client/server solution for use with GUIs in an MFC environment. Of course, you can use it for normal server solutions. Many people have. Just remove the MFC specific stuff as CString, CPtrList etc.., and replace them with Win32 classes. I don’t like MFC either, so send me a copy when you change the code. Thanks.

Q4: Excellent work! Thank you for this. When will you implement AcceptEx(..) instead of the connection listener thread?

A4: As soon as the code is stable. It is quite stable right now, but I know that the combination of several I/O workers and several pending reads may cause some problems. I enjoy that you like my code. Please vote!

Q5: Why start several I/O workers? Is this necessary if you don’t have a true multiprocessor computer?

A5: No, it is not necessary to have several I/O workers. Just one thread can handle all the connections. On common home computers, one I/O worker gives the best performance. You do not need to worry about possible access violation threats either. But as computers are getting more powerful each day (e.g., hyperthreading, dual-core, etc.), why not have the possibility to have several threads? :=)

Q6: Why use several pending reads? What is it good for?

A6: That depends on the server development strategy that is adapted by the developer, namely “many concurrent connections” vs. “ high throughput server”. Having multiple pending reads increases the throughput of the server because the TCP/IP packages will be written directly into the passed buffer instead of to the TCP/IP stack (no double-buffering). If the server knows that clients send data in bursts, pending reads increase the performance (high throughput). However, every pending receive operation (with WSARecv()) that occurs forces the kernel to lock the receive buffers into the non-paged pool. This may lead to a WSAENOBUFFS error when the physical memory is full (many concurrent connections). The use of pending reads/writes have to be done carefully, and aspects such as “page size on the architecture” and “the amount of non-paged pool (1/4 of the physical memory)” have to be taken into consideration. Furthermore, if you have more than one IO worker, the order of packages is broken (because of the IOCP structure), and the extra work to maintain the order makes it unnecessary to have multiple pending reads. In this design, multiple pending reads is turned off when the number of I/O workers is greater than one because the implementation can not handle the reordering. (The sequence number must exist in the payload instead.)

Q7: In the previous article, you stated that we have to implement memory management using the VirtualAlloc function instead of new, why have you not implemented it?

A7: When you allocate memory with new, the memory is allocated in the virtual memory or the physical memory. Where the memory is allocated is unknown, the memory can be allocated between two pages. This means that we load too much memory into the physical memory when we access a certain data (if we use new). Furthermore, you do not know if the allocated memory is in physical memory or in virtual, and also you can not tell the system when "writing back" to hard disk is unnecessary (if we don’t care of the data in memory anymore). But be aware!! Any new allocation using VirtualAlloc* will always be rounded up to 64 KB (page file size) boundary so that if you allocate a new VAS region bound to the physical memory, the OS will consume an amount of physical memory rounded up to the page size, and will consume the VAS of the process rounded up to 64 KB boundary. Using VirtualAlloc can be difficult: new and malloc use virualAlloc internally, but every time you allocate memory with new/delete, a lot of other computation is done, and you do not have the control to put your data (data related to each other) nicely inside the same page (without overlapping two pages). However, heaps are best for managing large numbers of small objects, and I shall change the source code so it only uses new/delete because of code cleanness. I have found that the performance gain is too small relative when compared to the complexity of the source code.

10 References

11 Revision History

  • Version 1.0 - 2005-05-10
    • Initial public release.
  • Version 1.1 - 2005-06-13
    • Fixed some memory leakage (e.g., ~CIOCPBuffer()).
    • TransmitFile is now optional in the source code (by using #define TRANSFERFILEFUNCTIONALITY).
    • Some extra functions are added (by using #define SIMPLESECURITY).
  • Version 1.11 - 2005-06-18
    • Changes in IOCPS::ProcessPackage(…) to avoid access violation.
    • Error in CIOCPBuffer::Flush(..) fixed.
    • Changes in IOCPS::Connect(..) to release socket when an error occurs.
  • Version 1.12 - 2005-11-29
    • Changes in IOCPS::OnWrite(….) to avoiding entering an infinite loop.
    • Changes in OnRead(…) and OnZeroByteRead (…) to avoid access violation if memory is full and AllocateBuffer fails.
    • Changes in OnReadCompleted(…) to avoid access violation.
    • Changes in AcceptIncomingClient(..) to better handle a new connection when the maximum number of connections is reached.
  • Version 1.13 - 2005-12-29
    • ReleaseBuffer(…) added to ARead(..), ASend(..), AZeroByteRead(..) to avoid memory leakage.
    • Changes in DisconnectClient(…) and ReleaseClientContext(…) to avoid “duplicate key” error when clients rapidly connect and disconnect.
    • Changes in IOWorkerThreadProc(…), OnWrite(ClientContext *pContext,…), etc. to avoid buffer leakage.
    • Changes in DisconnectClient( unsigned int iID) to avoid access violation.
    • Added EnterIOLoop(..)/ExitIPLoop(..) to StartSendFile(..) and OnTransmitFileCompleted(..) to avoid access violation.
    • Some unessential error messages removed from the release mode, and additional debug information (e.g., TRACE(..) ) added to the source code in debug mode.
    • The function AcceptIncomingClients(..) changed and replaced with AssociateIncomingClientWithContext(..).
    • The Connect(..) function now uses AssociateIncomingClientWithContext(..).
    • Transfer file functions are now completely optional by making #define TRANSFERFILEFUNCTIONALITY.
    • Changes in DisableSendFile(..)and other file transfer functions, to avoid access violation.
    • Some unnecessary functions and comments removed from the source code. Appropriate functions are now made private, protected, or public.
    • Several functions are now “inlined” to avoid the overhead of calling a function and for gaining performance.
    • Removed and replaced EnterIOLoop(..) and other code in OnWrite(ClientContext *pContext,…) to avoid access violations. Information is in the source code.
    • Added "random disconnect" to demo server, and "auto reconnect" to demo client, plus additional cleanup in the demo project, and I now follow my own advices. :=)
  • Version 1.14 - 2006-02-18
    • Changes in IOWorkerThreadProc(LPVOID pParam)to avoid memory leakage (e.g., new ClientContext) on shutdown ("bug" detected by Maxim Y. Mluhov).
    • Small changes in OnReadCompleted(..).
    • Small change in IOCPS::DisconnectIfIPExist(..), to gain performance (fix by spring).
    • Small change in CIOCPBuffer::Flush(...) (fix by spring).
    • When using multiple pending reads (e.g., m_iNumberOfPendlingReads>1) with multiple I/O workers (e.g., m_iMaxIOWorkers>1), the order of the packages is broken. Temporary fix added to IOCPS::startup() (e.g., if(m_iMaxIOWorkers>1) m_iNumberOfPendlingReads=1;).
    • Updated section 8 and 6.3.2 in the article.
  • Version 1.15 - 2006-06-19
    • Changes in the CIOCPBuffer class and AllocateBuffer(..). Now, all the memory allocation/de –allocation is made on the heap using new/delete and VirtualAlloc(..) is not used (read question 7 for more information).
    • Changes in IOCPS::OnInitialize(..) to avoid WSAENOBUFS, exchanged the order of the ARead(..), AZeroByteRead(..).
    • Multiple pending read removed when multiple I/O workers are used. (Temporary fix is now permanent fix, read A6 and Q6.)
    • The #define SIMPLESECURITY functions are used inside the ConnectAcceptCondition(..) with the SO_CONDITIONAL_ACCEPT parameter using WSAAccept(..), increasing security. We can refuse connections in a lower level in the kernel (not sending ACK => the attacker thinks that the server is down).
    • IsAlreadyConnected(..) and IsInBannedList(..) replacing DisconnectIfIPExist(..) and DisconnectIfBanned(..) because of optimization, the IP compare is using sockaddr_in instead of string compare.
  • Version 1.16 - 2008-12-08
    • FIX: Changes in IOCPS::GetNextReadBuffer(ClientContext *pContext, CIOCPBuffer *pBuff) to avoid memory leak with CMap.
    • FIX: AllocateBuffer() can return NULL pointer in the following code in method IOCPS::ARead()
    • FIX: Socket leakage in IOCPS::AssociateIncomingClientWithContext(SOCKET clientSocket) on shutdown.
    •  

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Amin Gholiha.
Education:
- Master of Science in Information Technology.
- Degree of Master of Education.
Knowledge/interest: programming (.NET,Visual, C#/C++), neural network, mathematical modeling, signal processing, sequence analysis, pattern recognition,robot technology, system design, security and business management systems. For business proposal email Gholiha@rocketmail.com, all other emails will be ignored.
Current Work:
Project Manager / Developer

------------------------------------------------------------------翻译---------------------------------------------------------

引自:http://www.vckbase.com/vckbase/default.aspx

源码使用了高级的完成端口(IOCP)技术,该技术可以有效地服务于多客户端。本文提出了一些IOCP编程中出现的实际问题的解决方法,并提供了一个简单的echo版本的可以传输文件的客户端/服务器程序。

程序截图:

1.1 环境要求
本文读者需要熟悉C++、TCP/IP、Socket编程、MFC,和多线程。
源码使用Winsock 2.0和IOCP技术,要求:
Windows NT/2000或以上:要求Windows NT3.5或以后版本
Windows 95/98/ME:不支持
Visual C++.NET,或完整更新过的Visual C++ 6.0

1.2 摘要
当你开发不同类型的软件,你迟早必须处理C/S的开发。对一个程序员来说,写一个通用的C/S编码是一项困难的工作。本文档提供了一份简单但是功能强大的C/S源码,可以扩展到任何类型的C/S应用程序中。这份源码使用了高级的IOCP技术,该技术可以高效的服务于多客户端。IOCP提供了解决“每个客户端占用一个线程”的瓶颈问题的办法,只使用几个处理线程,异步输入/输出来发送/接收。IOCP技术被广泛应用在各种类型的高效服务端,例如Apache等。这份源码也提供了一系列的在处理通信和C/S软件中经常使用的功能,如文件接收/传送功能和逻辑线程池管理。本文重点在于出现在IOCP程序API中实用的解决方案,以及关于源码的全面的文档。另外,一份简单的echo版的可处理多连接和文件传输的C/S程序也在这里提供。

2.1 引言
本文提出了一个类,可以用在客户端和服务端。这个类使用IOCP(Input Output Completion Ports)和异步(非阻塞)机制。…
通过这些简单的源码,你可以:
· 服务或连接多客户端和服务端
· 异步发送或接收文件
· 创建并管理一个逻辑工作者线程池,用以处理繁重的客户端/服务器请求或计算

找到一份全面但简单的解决客户端/服务器通信的源码是件困难的事情。在网络上找到的源码要么太复杂(超过20个类),要命没有提供足够的效率。本源码的设计尽可能简单,并提供了充足的文档。在这篇文章中,我们简洁的呈现出了Winsock API 2.0支持的IOCP技术,说明了在编写过程中出现的棘手问题,并提出了每一个问题的解决方案。

2.2 异步完成端口介绍

如果一个服务器应用程序不能同时支持多个客户端,那是毫无意义的,为此,通常使用异步I/O请求和多线程。根据定义,一个异步I/O请求会立即返回,而留下I/O请求处于等待状态。有时,I/O异步请求的结果必须与主线程同步。这可以通过几种不同方式解决。同步可以通过下面的方式实现:

> 使用事件 – 当异步请求结束时会马上触发一个信号。这种方式的缺点是线程必须检查并等待事件被触发
> 使用GetOverlappedResult函数 – 这种方式与上一种方式有相同的缺点。
> 使用Asynchronous Procedure Calls(或APC) – 这种方式有几个缺点。首先,APC总是在请求线程的上下文中被请求;第二,为了执行APC,请求线程必须在可变等候状态下挂起。
> 使用IOCP – 这种方式的缺点是必须解决很多实际的棘手的编程问题。编写IOCP可能有点麻烦。

2.2.1 为什么使用IOCP?
通过使用IOCP,我们可以解决“每个客户端占用一个线程”的问题。通常普遍认为如果软件不能运行在真正的多处理器机器上,执行能力会严重降低。线程是系统资源,而这些资源既不是无限的,也不是低价的。

IOCP提供了一种方式来使用几个线程“公平的”处理多客户端的输入/输出。线程被挂起,不占用CPU周期直到有事可做。

2.3 什么是IOCP?
我们已经看到IOCP只是一个线程同步对象,类似于信号灯,因此IOCP并不是一个复杂的概念。一个IOCP对象与几个支持待定异步I/O请求的I/O对象绑定。一个可以访问IOCP的线程可以被挂起,直到一个待定的异步I/O请求结束。

3 IOCP是怎样工作的?
要使用IOCP,你必须处理三件事情,绑定一个socket到完成端口,创建异步I/O请求,并与线程同步。为从异步I/O请求获得结果,如那个客户端发出的请求,你必须传递两个参数:CompletionKey参数和OVERLAPPED结构。

3.1 关键参数
第一个参数:CompletionKey,是一个DWORD类型的变量。你可以传递任何你想传递的唯一值,这个值将总是同该对象绑定。正常情况下会传递一个指向结构或类的指针,该结构或类包含了一些客户端的指定对象。在源码中,传递的是一个指向ClientContext的指针。

3.2 OVERLAPPED参数

这个参数通常用来传递异步I/O请求使用的内存缓冲。很重要的一点是:该数据将会被锁定并不允许从物理内存中换出页面(page out)。

3.3 绑定一个socket到完成端口
一旦创建完成一个完成端口,可以通过调用CreateIoCompletionPort函数来绑定socket到完成端口。形式如下:

3.4 响应异步I/O请求
响应具体的异步请求,调用函数WSASend和WSARecv。他们也需要一个参数:WSABUF,这个参数包含了一个指向缓冲的指针。一个重要的规则是:通常当服务器/客户端响应一个I/O操作,不是直接响应,而是提交给完成端口,由I/O工作者线程来执行。这么做的原因是:我们希望公平的分割CPU周期。通过发送状态给完成端口来发出I/O请求,如下:

3.5 与线程同步

与I/O工作者线程同步是通过调用GetQueuedCompletionStatus函数来实现的(如下)。这个函数也提供了CompletionKey参数和OVERLAPPED参数,如下:

3.6 四个棘手的IOCP编码问题和解决方法

使用IOCP时会出现一些问题,其中有一些不是很直观的。在使用IOCP的多线程编程中,一个线程函数的控制流程不是笔直的,因为在线程和通讯直接没有关系。在这一章节中,我们将描述四个不同的问题,可能在使用IOCP开发客户端/服务器应用程序时会出现,分别是:

The WSAENOBUFS error problem.(WSAENOBUFS错误问题)
The package reordering problem.(包重构问题)
The access violation problem.(访问非法问题)



3.6.1 WSAENOBUFS问题

这个问题通常很难靠直觉发现,因为当你第一次看见的时候你或许认为是一个内存泄露错误。假定已经开发完成了你的完成端口服务器并且运行的一切良好,但是当你对其进行压力测试的时候突然发现服务器被中止而不处理任何请求了,如果你运气好的话你会很快发现是因为WSAENOBUFS 错误而影响了这一切。

每当我们重叠提交一个send或receive操作的时候,其中指定的发送或接收缓冲区就被锁定了。当内存缓冲区被锁定后,将不能从物理内存进行分页。操作系统有一个锁定最大数的限制,一旦超过这个锁定的限制,那么就会产生WSAENOBUFS 错误了。

如果一个服务器提交了非常多的重叠的receive在每一个连接上,那么限制会随着连接数的增长而变化。如果一个服务器能够预先估计可能会产生的最大并发连接数,服务器可以投递一个使用零缓冲区的receive在每一个连接上。因为当你提交操作没有缓冲区时,那么也不会存在内存被锁定了。使用这种办法后,当你的receive操作事件完成返回时,该socket底层缓冲区的数据会原封不动的还在其中而没有被读取到receive操作的缓冲区来。此时,服务器可以简单的调用非阻塞式的recv将存在socket缓冲区中的数据全部读出来,一直到recv返回 WSAEWOULDBLOCK 为止。 这种设计非常适合那些可以牺牲数据吞吐量而换取巨大 并发连接数的服务器。当然,你也需要意识到如何让客户端的行为尽量避免对服务器造成影响。在上一个例子中,当一个零缓冲区的receive操作被返回后使 用一个非阻塞的recv去读取socket缓冲区中的数据,如果服务器此时可预计到将会有爆发的数据流,那么可以考虑此时投递一个或者多个receive 来取代非阻塞的recv来进行数据接收。(这比你使用1个缺省的8K缓冲区来接收要好的多。)

源码中提供了一个简单实用的解决WSAENOBUF错误的办法。我们执行了一个零字节缓冲的异步WSARead(...)(参见 OnZeroByteRead(..))。当这个请求完成,我们知道在TCP/IP栈中有数据,然后我们通过执行几个有MAXIMUMPACKAGESIZE缓冲的异步WSARead(...)去读,解决了WSAENOBUFS问题。但是这种解决方法降低了服务器的吞吐量。

总结:

解决方法一:

投递使用空缓冲区的 receive操作,当操作返回后,使用非阻塞的recv来进行真实数据的读取。因此在完成端口的每一个连接中需要使用一个循环的操作来不断的来提交空缓冲区的receive操作。

解决方法二:

在投递几个普通含有缓冲区的receive操作后,进接着开始循环投递一个空缓冲区的receive操作。这样保证它们按照投递顺序依次返回,这样我们就总能对被锁定的内存进行解锁。

3.6.2 包重构问题
... ... 尽管使用IO完成端口的待发操作将总是按照他们发送的顺序来完成,线程调度安排可能使绑定到完成端口的实际工作不按指定的顺序来处理。例如,如果你有两个I/O工作者线程,你可能接收到“字节块2,字节块1,字节块3”。这就意味着:当你通过向I/O完成端口提交请求数据发送数据时,数据实际上用重新排序过的顺序发送了。

这可以通过只使用一个工作者线程来解决,并只提交一个I/O请求,等待它完成。但是如果这么做,我们就失去了IOCP的长处。

解决这个问题的一个简单实用办法是给我们的缓冲类添加一个顺序数字,如果缓冲顺序数字是正确的,则处理缓冲中的数据。这意味着:有不正确的数字的缓冲将被存下来以后再用,并且因为执行原因,我们保存缓存到一个HASH MAP对象中(如m_SendBufferMap 和 m_ReadBufferMap)。

获取这种解决方法的更多信息,请查阅源码,仔细查看IOCPS类中如下的函数:

GetNextSendBuffer (..) and GetNextReadBuffer(..), to get the ordered send or receive buffer.
IncreaseReadSequenceNumber(..) and IncreaseSendSequenceNumber(..), to increase the sequence numbers.

3.6.3 异步等待读 和 字节块包处理问题

最通用的服务端协议是一个基于协议的包,首先X个字节代表包头,包头包含了详细的完整的包的长度。服务端可以读包头,计算出需要多少数据,继续读取直到读完一个完整的包。当服务端同时只处理一个异步请求时工作的很好。但是,如果我们想发挥IOCP服务端的全部潜能,我们应该启用几个等待的异步读事件,等待数据到达。这意味着几个异步读操作是不按顺序完成的,通过等待的读事件返回的字节块流将不会按顺序处理。而且,一个字节块流可以包含一个或几个包,也可能包含部分包,如下图所示:

这个图形显示了部分包(绿色)和完整包(黄色)是怎样在不同字节块流中异步到达的。
这意味着我们必须处理字节流来成功的读取一个完整的包。而且,我们必须处理部分包(图表中绿色的部分)。这就使得字节流的处理更加困难。这个问题的完整解决方法在IOCPS类的ProcessPackage(…)函数中。

3.6.4 访问非法问题
这是一个较小的问题,代码设计导致的问题更胜于IOCP的特定问题。假设一个客户端连接已经关闭并且一个I/O请求返回一个错误标志,然后我们知道客户端已经关闭。在参数CompletionKey中,我们传递了一个指向结构ClientContext的指针,该结构中包含了客户端的特定数据。如果我们释放这个ClientContext结构占用的内存,并且同一个客户端处理的一些其它I/O请求返回了错误代码,我们通过转换参数CompletionKey为一个指向ClientContext结构的指针并试图访问或删除它,会发生什么呢?一个非法访问出现了!

这个问题的解决方法是添加一个数字到结构中,包含等待的I/O请求的数量(m_nNumberOfPendingIO),然后当我们知道没有等待的I/O请求时删除这个结构。这个功能通过函数EnterIoLoop(…) 和ReleaseClientContext(…)来实现。

3.7 源码略读
源码的目标是提供一系列简单的类来处理所有IOCP编码中的问题。源码也提供了一系列通信和C/S软件中经常使用的函数,如文件接收/传送函数,逻辑线程池处理,等等。

上图功能性的图解说明了IOCP类源码。

我们有几个IO工作者线程通过完成端口来处理异步IO请求,这些工作者线程调用一些虚函数,这些虚函数可以把需要大量计算的请求放到一个工作队列中。逻辑工作者通过类中提供的这些函数从队列中取出任务、处理并发回结果。GUI经常与主类通信,通过Windows消息(因为MFC不是线程安全的)、通过调用函数或通过使用共享的变量。

上图显示了类结构纵览。

图3中的类说明如下:

> CIOCPBuffer:管理异步请求的缓存的类。
> IOCPS:处理所有通信的主类。
> JobItem:保存逻辑工作者线程要处理的任务的结构。
> ClientContex:保存客户端特定信息的结构(如状态、数据,等等)。

3.7.1 缓冲设计 - CIOCPBuffer类
使用异步I/O调用时,我们必须提供私有的缓冲区供I/O操作使用。
当我们将帐号信息放入分配的缓冲供使用时有许多情况需要考虑:

.分配和释放内存代价高,因此我们应重复使用以及分配的缓冲(内存),
因此我们将缓冲保存在列表结构中,如下所示:

有时,当异步I/O调用完成后,缓冲里可能不是完整的包,因此我们需要分割缓冲去取得完整的信息。在CIOCPS类中提供了SplitBuffer函数。
同样,有时候我们需要在缓冲间拷贝信息,IOCPS类提供了AddAndFlush函数。

. 众所周知,我们也需要添加序号和状态(IOType 变量, IOZeroReadCompleted, 等等)到我们的缓冲中。

. 我们也需要有将数据转换到字节流或将字节流转换到数据的方法,CIOCPBuffer也提供了这些函数。


以上所有问题都在CIOCPBuffer中解决。

3.8 如何使用源代码
从IOCP继承你自己的类(如图3),实现IOCPS类中的虚函数(例如,threadpool),
在任何类型的服务端或客户端中实现使用少量的线程有效地管理大量的连接。

3.8.1 启动和关闭服务端/客户端


调用下面的函数启动服务端

nPort
服务端侦听的端口. ( -1 客户端模式.)

iMaxNumConnections
允许最大的连接数. (使用较大的数.)

iMaxIOWorkers
I/O工作线程数

nOfWorkers
逻辑工作者数量Number of logical workers. (可以在运行时改变.)

iMaxNumberOfFreeBuffer
重复使用的缓冲最大数. (-1 不使用, 0= 不限)

iMaxNumberOfFreeContext
重复使用的客户端信息对象数 (-1 for 不使用, 0= 不限)

bOrderedRead
顺序读取. (我们已经在 3.6.2. 处讨论过)

bOrderedSend
顺序写入. (我们已经在 3.6.2. 处讨论过)

iNumberOfPendlingReads
等待读取数据时未决的异步读取循环数

连接到远程服务器(客户端模式nPort=-1),调用函数:
CodeConnect(const CString &strIPAddr, int nPort)

.strIPAddr
远程服务器的IP地址

.nPort
端口

调用ShutDown()关闭连接

例如:
if(!m_iocp.Start(-1,1210,2,1,0,0))
AfxMessageBox("Error could not start the Client");
….
m_iocp.ShutDown();

4.1 源代码描述
更多关于源代码的信息请参考代码里的注释。

4.1.1 虚函数
NotifyNewConnection
新的连接已接受

NotifyNewClientContext
空的ClientContext结构被分配

NotifyDisconnectedClient
客户端连接断开

ProcessJob
逻辑工作者需要处理一个工作

NotifyReceivedPackage
新的包到达

NotifyFileCompleted
文件传送完成。

4.1.2 重要变量
所有变量共享使用时必须加锁避免存取违例,所有需要加锁的变量,名称为XXX则锁变量名称为XXXLock。

m_ContextMapLock;
保存所有客户端数据(socket,客户端数据,等等)

ContextMap m_ContextMap;
m_NumberOfActiveConnections
保存已连接的连接数

4.1.3 重要函数
GetNumberOfConnections()
返回连接数

CString GetHostAdress(ClientContext* p)
提供客户端上下文,返回主机地址

BOOL ASendToAll(CIOCPBuffer *pBuff);
发送缓冲上下文到所有连接的客户端

DisconnectClient(CString sID)
根据客户端唯一编号,断开指定的客户端

CString GetHostIP()
返回本地IP

JobItem* GetJob()
将JobItem从队列中移出, 如果没有job,返回 NULL

BOOL AddJob(JobItem *pJob)
添加Job到队列

BOOL SetWorkers(int nThreads)
设置可以任何时候调用的逻辑工作者数量

DisconnectAll();
断开所有客户端

ARead(…)
异步读取

ASend(…)
异步发送,发送数据到客户端

ClientContext* FindClient(CString strClient)
根据字符串ID寻找客户(非线程安全)

DisconnectClient(ClientContext* pContext, BOOL bGraceful=FALSE);
端口客户

DisconnectAll()
端口所有客户

StartSendFile(ClientContext *pContext)
根据ClientContext结构发送文件(使用经优化的transmitfile(..) 函数)

PrepareReceiveFile(..)
接收文件准备。调用该函数时,所有进入的字节流已被写入到文件。

PrepareSendFile(..)
打开文件并发送包含文件信息的数据包。函数禁用ASend(..)函数,直到文件传送关闭或中断。

DisableSendFile(..)
禁止发送文件模式

DisableRecevideFile(..)
禁止文件接收模式

5.1 文件传输
文件传输使用Winsock 2.0 中的TransmitFile函数。TransmitFile函数通过连接的socket句柄传送文件数据。函数使用操作系统的高速缓冲管理器(cache manager)接收文件数据,通过sockets提供高性能的文件数据传输。异步文件传输要点:
在TransmitFile函数返回前,所有其他发送或写入到该socket的操作都将无法执行,因为这将使文件数据混乱。
因此,在PrepareSendFile()函数调用之后,所有ASend都被禁止。
因为操作系统连续读取文件数据,你可以使用FILE_FLAG_SEQUENTIAL_SCAN参数来优化高速缓冲性能。
发送文件时我们使用了内核异步操作(TF_USE_KERNEL_APC)。TF_USE_KERNEL_APC的使用可以更好地提升性能。有可能, 无论如何,TransmitFile在线程中的大量使用,这种情形可能会阻止APCs的调用.

文件传输按如下顺序执行:服务器调用PrepareSendFile(..)函数初始化文件传输。客户端接收文件信息时,调用PrepareReceiveFile(..)作接收前的准备,并发送一个包到服务器告知开始文件传送。当包到达服务器端,服务器端调用StartSendFile(..)采用高性能的TransmitFile函数发送指定文件。

6 源代码示例
提供的源代码演示代码是一个echo客户端/服务器端程序,并提供了对文件传输的支持(图4)。在代码中,MyIOCP类从IOCP继承,处理客户端/服务器端的通讯,所涉及的虚函数可以参见4.1.1处。
客户端或服务器端最重要的部分是虚函数NotifyReceivedPackage,定义如下:

函数接收进入的信息并执行远程连接发送的请求。这种情况,只是简单的echo或文件传输的情形。服务器端和客户端源代码分成两个工程,IOCP和IOCPClient。

6.1 编译问题
使用VC++6.0或VC.NET编译,你可能在编译CFile时得到一些奇怪的错误,如:


“if (pContext->m_File.m_hFile !=
INVALID_HANDLE_VALUE) <-error C2446: '!=' : no conversion "
"from 'void *' to 'unsigned int'”

这个问题可以通过更新头文件(*.h)或VC++ 6.0的版本或改变类型转换错误来解决,在修正了错误后,服务器端/客户端源代码可以在不需MFC的情况下使用。

7 特别的考虑和经验总结
当你在其他类型的程序中使用本代码,有一些可以避免的编程陷阱和多线程问题。
非确定错误指的是那些随机出现的错误,很难通过执行相同顺序的任务来重现这些错误。
这是最坏的错误类型,通常,错误出现在内部源代码的设计中。当服务器端有多个IO工作线程在运行,
为客户端提供连接,如果程序员没有考虑多线程环境,可能会发生存取违例等不确定错误。

经验 #1:
在未对上下文加锁时,不要读写客户上下文(例如:ClientContext)。
通知函数(例如:Notify*(ClientContext *pContext))已经是线程安全,处理成员变量ClientContext可以不需要解锁、解锁。

大家都知道的,当你锁定上下文,其他线程或GUI都将等待它。

经验#2:
避免使用复杂的和其他类型的"上下文锁",应为容易造成死锁(例如:A在等待B,B在等待C,C在等待A,A死锁)

以上代码可能导致死锁

经验 #3:
不要在通知函数(例如:Notify*(ClientContext *pContext))以外处理客户上下文,如果你需要这样做,你
要放入
m_ContextMapLock.Lock();

m_ContextMapLock.Unlock();
参考如下代码:

8 将来的工作
将来,代码将提供以下功能:
添加支持AcceptEx(..)接受新连接,处理短连接和DOS攻击。
源代码兼容Win32,STL,WTL等环境。

发表于 @ 2009年06月24日 09:23:00 | 评论( loading... ) | 编辑| 举报| 收藏

旧一篇:Developing a Truly Scalable Winsock Server using IO Completion Ports | 新一篇:Windows Sockets 2.0: Write Scalable Winsock Apps Using Completion Ports

  • 发表评论
  • 评论内容:
  •  
Copyright © yuyunliuhen
Powered by CSDN Blog