Ant Ren的专栏

Keep sharing, keep challenge yourself...

Windows & Unix IPC Introduction (Part.2)

3. Sockets

A socket is a software abstraction that is used to create a channel-like interface between processes. The channel allows bi-directional communication between processes, but does little to structure the data being transmitted. In general, the serving machine in a system that uses socket communication follows several steps to initiate communication:

ü  Create a socket

ü  Map the socket to a local address

ü  Listen for client requests


When the client process wishes to begin a transaction with the server, it follows similar steps:

ü  Create a socket

ü  Determine the location (system name and port number) of the server

ü  Begin sending and/or receiving data


3.1 Types of Sockets

Communication between processes via sockets can be viewed as streams or datagrams. Streambased communication views transmitted data as a sequence of bytes, whereas datagram-based communication deals with small discrete packets of information. These packets typically contain header and trailer information in addition to the data being transmitted. Choosing a socket type is important, because each has different performance and reliability characteristics, and two communicating sockets must be of the same type.

ü  Stream. Stream sockets are reliable; that is, data is guaranteed to be delivered in the same order it was sent. An underlying mechanism that the programmer need not be concerned with handles duplicate data, performs error checking, and controls flow. When using stream sockets, a connection is established. This means that two processes must logically agree to communicate before information can be transferred between the two of them. This connectionis maintained by both processes throughout the communication session.

ü  Datagram. Datagram sockets provide unreliable communication between the two processes, meaning that data could be received out of order. Datagram sockets provide no notion of a connection, so each datagram is sent and processed independently. This has a host of implications, one of the most significant of which is that individual datagrams can take different paths to their ultimate destination. Another important implication is that there is no flow control. Datagram packets are typically quite small and usually are fixed in size. No error correction is mandated, and when optional error connection is used, it is weak.


3.2 Berkeley Sockets

In the early 1980s, the Berkeley socket interface was introduced and it has formed the basis for UNIX sockets programming ever since. Berkeley sockets allow general file system type access routines to be used to send and receive information over a network connection. Berkeley sockets are completely protocol independent, so they need not use TCP/IP for communication although this is most common.


Berkeley sockets support asynchronous I/O, meaning that the process can request that the kernel alert it when a specified socket descriptor is ready for I/O. In most UNIX implementations, the notification from the kernel to the user process occurs via the SIGIO signal. Because a given process can have only a single signal handler for each signal, to use asynchronous I/O with more than one socket, another technique is used. Otherwise, the process would not know which socket is ready for I/O. Such multiplexing is implemented via the select() system call.


Using multiplexed I/O solves the problem of delaying data from non-active sockets while an active socket is blocking waiting for an I/O transaction to take place. There are several potential solutions to this problem in UNIX:

ü  Polling. If all sockets are set to be nonblocking, the process can execute a loop that checks each socket to see if there is something to be read. If there is, it handles it; otherwise, it goes to sleep until the next time that the socket is scheduled to be checked. Polling is inefficient because most of the time there is no work to be done, yet CPU time is wasted.

ü  Use multiple processes. The parent process could use fork() to establish a child process to handle each socket. With this technique, each process can read from the port and block since the system will handle scheduling the processes most efficiently. However, the child process must then return the read data to the parent process via some other form of IPC.

ü  Asynchronous I/O. This is not a good solution in UNIX since signals are very expensive to catch. In addition, if more than one socket is using asynchronous I/O, there is a need to determine which socket the signal corresponds to.

ü  Use select(). Using select() allows a user process to request that the kernel wait for one of several events occurs and to wake the process up at that time. The select() call is quite flexible: asking it to return immediately effectively is a poll request, and timeouts can optionally be specified. The use of select() is the preferred technique when using Berkeley sockets because it is fairly efficient and simple to work with.


3.3 Windows Sockets: WinSock v2.2

The Microsoft implementation of Windows sockets is derived from Berkeley sockets as included in BSD v4.3. Like Berkeley sockets, Windows sockets are protocol independent and support asynchronous operation. The most recent release of Windows sockets, which is embodied by theWinSock v2.2 API, includes native support for the following protocols:

ü  IPX/SPX: sockets serve as an alternative to Novell’s Event Control Block (ECB) architecture, which forces a programmer to learn the details of the protocol.

ü  NetBEUI: replaces the NetBIOS interface

ü  AppleTalk: to allow communication with Macintosh computers

ü  ISO TP/4: widely used in Europe, so is important for internationalization efforts


The availability of all of these protocols via sockets simplifies application programming since using a different protocol for communication does not necessitate learning its idiosyncrasies or learning a new API. Although Windows sockets give a programmer protocol independence, protocol transparency is not provided since setting up a socket connection requires specifying the protocol to be used, among other parameters. Later an alternative technique for providing protocol transparency using Win32 will be discussed.


3.4 Comparing Windows and Berkeley Sockets

3.4.1 Porting Code from Berkeley Sockets to Windows Sockets

Windows sockets were designed to facilitate easy porting of Berkeley socket code to Windows sockets. Windows sockets support the Berkeley API (with minor exceptions) as well as functionality specific to Win32.1 Code written using Berkeley sockets can usually be ported to Windows sockets with only small changes. One such change is the need to call WSAStartup(), which accepts as an argument the version of the sockets DLL an application requires; this provides ( Such nonportable functions have the prefix of “WSA” in their names.) simple versioning, which is quite essential since the WinSock specification changes frequently.


Aswith Berkeley sockets, socket() opens a new socket. In Windows, socket() returns an unsigned int, which behaves like a HANDLE. It is important to note that the int is unsigned; Berkeley sockets are represented by plain ints. This means that it is not correct to check for success by checking for non-negativity of the socket returned. Alteratively, WSASocket() can be used; this allows the programmer additional flexibility, such as being able to select between overlapped and nonoverlapped sockets (socket() only creates non-overlapped sockets as of WinSock v2.2). The bind() call works basically the same under Berkeley and Windows sockets, as does name resolution.


Windows NT provides %SYSTEM32%/drivers/etc/lmhosts, which is equivalent to /etc/hosts in UNIX. Other calls that function nearly identically include listen(), accept(), and connect().


After a socket connection is established, data needs to be transferred. Using the standard read() and write() calls from Berkeley sockets is the most straightforward to accomplish this communication using Windows sockets. The programer need to be aware of the fact that TCP/IP as implemented on Windows 95 and Windows NT will fragment packets, meaning that if 1024 bytes of data are to be received, one cannot just call recv() and ask for 1024 bytes; instead data should be “accumulated”. According to the WinSock documentation, send() also has this restriction, although this behavior rarely, if ever, manifests itself when using send().


Some Berkeley socket applications use the _read() and _write() calls since UNIX treats sockets as files. Windows sockets support these functions, but some additional work on the part of the programmer is required. The socket must be opened as non-overlapped and must be converted to a file handle by calling _open_osfhandle(). This will return a file handle (i.e. an int) which can then be used with _read() and _write().


When the program is through with the socket, it should call the nonportable function closesocket() on it, as opposed to the Berkeley socket standard close(). When the entire transaction is completed, WSACleanup() should be called to shut down the DLL. WinSock keeps a reference count for each socket being used so that when the count drops to zero it can be freed so the programmer need not worry about this detail.


So far, the discussion of sockets has focused on applications that rely on blocking sockets. Sockets can also be put into a nonblocking mode using ioctl() on UNIX or WSAIoctl() under WinSock v2.2. WSAIoctl() supports the use of overlapped execution for more time-intensive requests.


When using Berkeley sockets, traditionally, one uses select() to detect the occurrence of an event. Since select() blocks until an event occurs, it is useful for determining when to call a function to act on received data. Although select() works using Windows sockets, it is not recommended because it is cumbersome to use, due largely to the fact that it deals with an array of sockets and not an array of strings like the Berkeley implementation does. More importantly, it is very inefficient, since it triggers context switches. Therefore, several alternatives are available to programmers using WinSock v2.2:

ü  Use blocking overlapped sockets, and specify a timeout.

ü  Set send() and recv() timeouts using setsockopt() on top of overlapped sockets. If this option is used, send() and recv() will block only for the user-specified period of time, not indefinitely.

ü  WSAAsyncSelect() can be used to put a socket into a nonblocking mode and request notification of events that occur on it.

ü  Request asynchronous notification by using WSAEventSelect(), which also makes the socket nonblocking. This differs from using WSAAsyncSelect() since it does not post a message to a window; it instead causes a Win32 object to signal.


3.4.2 Features Specific to Windows Sockets

There are two levels of extensions to Berkeley sockets that are included in Windows sockets. The first are the WinSock v1 extensions, which include several WSA functions that offer message posting and asynchronous execution. The second level of extensions is the WinSock v2 extensions, which add numerous enhancements, many of which address overlapped I/O. Overlapped I/O

Windows sockets natively support overlapped I/O to provide superior data transfer capabilities. In the WinSock v1.1 standard, overlapped I/O was optional and limited to use under Windows NT. WinSock v2.2 supports overlapped I/O as the default under both Windows NT and Windows 95.


Since files are opened as Win32 overlapped I/O handles, you can pass sockets to ReadFile() or ReadFileEx() as well as the corresponding write functions without any modifications. If overlapped I/O is used, the programmer must use event handles, I/O completion routines, or I/O completion ports. However, this overhead is a small price to pay for the superior performance that can be achieved, especially on the server-side of a connection. Tests indicate that performance using overlapped I/O consistently is approximately 100K/sec better than with other techniques.


WSAAsyncSelect() is used to put a socket into nonblocking mode and to post messages to a window when events occur. This is an important feature because it features a great deal more functionality and automation than select(). There are many events that can be checked for; some of the most common are shown below.


lEvent Value



Data is available for reading


The socket can be written to


Urgent data needs to be read


A client request for a connection has arrived


A client’s connect attempt has completed


The partner station has closed the socket


The socket’s quality of service has been changed


WSAAsyncSelect() does not need to be reactivated after each event is handled by the appropriate handling function. Detailed error messages are returned via the standard Win32 interface. Only one call to WSAAsyncSelect() can be active at any given time. Subsequent calls override previous calls, so if multiple events are to be handled their values can be ORed together. A typical sequence of events in a system using WSAAsyncSelect() is shown below, for both client and server sides of

the connection.

Server Side

Client Side

Create a sockets and bind the your address into it

Create a socket

Call WSAAsyncSelect() and request FD_ACCEPT notification

Call WSAAsyncSelect() and request FD_CONNECT notification

Call listen() and go to other tasks

Call connect() or WSAConnect(); they will return right away so you can go on to doing something else

When the window is notified that a message came

in, accept it using accept() or WSAAccept()

When you receive the FD_CONNECT notification,



Request FD_READ/OOB/CLOSE socket so you know when the client sends data or closes the connection

When FD_WRITE is reported, the socket is

available for the client to send requests to the server

When you receive FD_READ/OOB, call

ReadFile(), read(), recv(), recvfrom(), WSARecv(), WSARecvFrom() to get the data

When the data comes from the server, the window received FD_READ/OOB, and you can respond by calling ReadFile(), read(), recv(), recvfrom(), WSARecv(), or WSARecvFrom().

Respond to FD_CLOSE by calling closesocket()

The client should normally close the connection, but should be ready to handle an unexpected FD_CLOSE from the server

( The implementation of overlapped I/O in Windows 95 does not support I/O completion ports.)


WSAAsyncSelect() is a very powerful function that adds a great deal of functionality to a program and allows the programmer to work at a higher level of abstraction. Windows servers using WSAAsyncSelect() do not need to be multithreaded for good performance, since using the function gives implicit multithreading. Certainly being able to write virtually multithreaded code without worrying about deadlock, synchronization, and the like is a welcome benefit. When using


WSAAsyncSelect() it is also not necessary to allocate buffer space before an asynchronous receive request is posted since the system buffers the data. This also prevents excessive allocation of memory since when memory does need to be allocated the precise quantity is known.


An addition to WinSock v2 is the WSAEventSelect() call. It is quite similar to WSAAsyncSelect() but it uses signalling instead of message posting. WSAEventSelect()accepts a handle to an event as a parameter. The call immediately returns, and then the program should wait on the event handle using a standard Win32 mechanism such as WaitForSingleObject(). WinSock v2 does provide a special function, WSAWaitForMultipleEvents(), which allows a timeout to be specified and allows the programmer to select from waiting until all events have signalled or just for any one event to signal. If the programmer wishes to approximate the Berkeley sockets select() function, but in a more efficient way than using the Windows sockets select() call, s/he could use WSAEnumNetworkEvents(), which fills an array with the FD_* codes of completed operations. It is very important to note that WSAAsyncSelect() and WSAEventSelect() are mutually exclusive; that is, using one cancels the other. As with WSAAsyncSelect(), the solution is to OR FD_* codes together. Protocol Transparency

Since a single binary program image does not work with all protocols but protocol transparency is a desirable feature, Microsoft included he Service Registration API starting with Windows NT v3.5. Using the Service Registration API, servers using Windows sockets can register their presence on the network with a centralized information brokerage. In addition, they can ask what parameters the underlying support layers should be passed when using socket functions, allowing

code to be written with no assumptions about the networking environment in which the program will ultimately run. WinSock v2.2 now provides some of this functionality to the programmer transparently.


Using the Service Registration API, server class information can be registered. A server name can be associated with the registered class identifier. When the server is started, it can use this information to determine what sockets should be opened and to what ports and machine addresses they should be bound. The API also allows the server to advertise its presence on the network to facilitate discovery and subsequent connection by clients. Unfortunately, using this useful facility requires complex variable-sized data structures to be populated— not an easy chore. Taking a look at the specifics of the operation of the Service Registration API is quite interesting. The basic operations it supports are:

ü  Registration of services and classes. A class or service is registered by associating it with a globally unique identifier (GUID); the GUIDs are the same ones used to identify COM components and RPC interfaces. When this registration is performed, the communication protocols to be supported can be specified

ü  Helping the server application read the registration information. When the server first comes up, it needs to read the stored registration information. With this information it knows what to pass to the socket() or WSASocket() calls. After setting up its sockets and entering the listening state, the server can tell the registration database that it is online.

ü  Helping the client application read the registration information. When the client starts up, it too needs to retrieve registration information using the Service Registration API. With this information it can find a target server to connect to, and the client knows what information to pass to the relevant function calls. Socket Abstractions

Microsoft Foundation Classes (MFC) are widely used for Windows software development today. It is no surprise that there are classes designed specifically to accommodate the use of sockets. There are two classes offered by MFC for sockets:

ü  CAsyncSocket. This class is a thin wrapper around the WinSock v2.2 API. It provides

asynchronous communication that relies on WSAAsyncSelect() heavily. All socket event notifications are handled internally by a CSocketWnd instance so the programmer does not need to deal with them. The events to be monitored are indicated using AsyncSelect().

ü  CSocket. The CSocket class is very similar to CAsyncSocket but it provides the illusion of synchronous communication. In other words, the functions block and do not return until the operation completes, although in reality asynchronous communication is still used.


These classes are significant to MFC programmers because in addition to encapsulating much of the busywork of using sockets into the socket classes, they work with MFC features such as object serialization quite transparently. For example, to send a CArchive-derived object over a socket connection one can use the << operator. Quality of Service

WinSock v2.2 also offers quality of service (QOS) capabilities. The basic QOS mechanism descends from a RFC flow specification. Each flow specification describes a set of characteristics about a proposed unidirectional flow through the network. An application may associate a pair of flow specifications with a socket (one for each direction) at the time a connection request is made using WSAConnect(), or at other times using WSAIoctl(). Flow specifications indicate what level of service is required for the application at hand and provide a feedback mechanism for applications to use in adapting to network conditions.


An application may establish its QOS requirements at any time via WSAIoctl() or when the connection is started using WSAConnect(). For connection-oriented sockets, it is often most convenient for an application to use WSAConnect(); QOS values supplied at connect time supersede those that may have been supplied earlier via WSAIoctl(). If the call to WSAConnect() completes successfully, the application knows that its QOS request has been honored by the network, and the application is then free to use the socket for data exchange. If the connect operation fails because of limited resources an appropriate error is returned and the application may scale down its service request and try again or determine that network conditions are not acceptable and exit.


After every connection attempt, transport providers update the associated flow specification structures to indicate the most recent network conditions. This update from the service provider about current network conditions is especially useful when the application’s QOS request consisted of the default values, which any service provider should be able to agree to (meaning that application had no real information about the network). Applications expect to be able to use this information to guide their use of the network, including any subsequent QOS requests. Information provided by the transport in the updated flow specification structure may be little more than a rough estimate that only applies to the first hop as opposed to the complete end-to-end connection, so the application should take appropriate precautions when interpreting this information.


Connectionless sockets may use WSAConnect() to establish a specified QOS level to a single designated peer. Otherwise connectionless sockets make use of WSAIoctl() to stipulate the initial QOS request, and any subsequent QOS renegotiations. The flow specifications for WinSock v2.2 divide QOS characteristics into the following areas:

ü  Source Traffic Description. The manner in which the application's traffic will be injected into the network. This includes specifications for the token rate, the token bucket size, and the peak bandwidth. Even though the bandwidth requirement is expressed in terms of a token rate, this does not mean that service provider must actually implement token buckets; any traffic management scheme that yields equivalent behavior is permitted.

ü  Latency. Upper limits on the amount of delay and delay variation that are acceptable.

ü  Level of service guarantee. Whether or not an absolute guarantee is required as opposed to best effort. Providers which have no feasible way to provide the level of service requested are expected to fail the connection attempt.

ü  Cost. This is for the future when a meaningful cost metric can be determined.

ü  Provider-specific parameters. The flow specification can be extended in ways that are particular to specific providers. Other Useful Extensions

Another useful extension to Berkeley sockets is the TransmitFile() call. It can be used to easily send an open file over a socket connection. The call allows all or part of a file to be sent over the network, maintains state using the standard notion of a seek pointer, and allows optimization of the transmission by allowing the programmer to override the default transmission packet size. A very useful feature of this call is that it takes as a parameter a struct that can hold header and trailer information. This is useful if, for example, the name that the file should assume on the receiving end of the connection should be transmitted, or it would be helpful to indicate that this file was the last in a series of transfers.


There are numerous WSAAsyncXXX() functions provided which offer superior performance and integration in an environment in which asynchronous communication is being used. These functions take advantage of capabilities provided via Win32 that allow communication and noncommunication

functions alike to more efficiently operate in multithreaded environments than

their standard counterparts as implemented in UNIX do. Asynchronous socket programming is considerably more difficult and error prone than synchronous programming is, but large performance gains can be reaped, and often the benefits outweigh the added development effort. Problems

Despite all of these advantages that Windows sockets have over Berkeley sockets, they are not perfect. Although CAsyncSocket is fairly efficient and imposes only a marginal performance penalty, using all of its asynchronous capabilities requires some challenging programming. On the other hand, CSocket is very simple to use, but this simplicity comes at the cost of high performance losses. A significant problem with  these classes is that they do not supportoverlapped I/O, meaning that they cannot be used with I/O completion routines or I/O completion ports. Finally, these classes still rely on WinSock v1.1; they do not take advantage of the new features or enhanced efficiency of WinSock v2.2 yet.

个人分类: OS、DB
想对作者说点什么? 我来说一句



Windows & Unix IPC Introduction (Part.2)