I/O Multiplexing

What we need is the capability to tell the kernel that we want to be notified if one or more I/O conditions are ready (i.e., input is ready to be read, or the descriptor is capable of

taking more output). This capability is called I/O multiplexing and is provided by the select and poll functions. We will also cover a newer POSIX variation of the former, called pselect.

  1. When a client is handling multiple descriptors (normally interactive input and a network socket).
  2. It is possible, but rare, for a client to handle multiple sockets at the same time.
  3. If a TCP server handles both a listening socket and its connected sockets.
  4. If a server handles both TCP and UDP.
  5. If a server handles multiple services and perhaps multiple protocols (e.g., the inetd daemon ).

I/O Models

  1. blocking I/O
  2. nonblocking I/O
  3. I/O multiplexing (select and poll)
  4. signal driven I/O (SIGIO)
  5. asynchronous I/O (the POSIX aio_functions)

there are normally two distinct phases for an input operation:

  1. Waiting for the data to be ready
  2. Copying the data from the kernel to the process

Blocking I/O model

Nonblocking I/O Model

I/O Multiplexing Model

Signal-Driven I/O Model

Asynchronous I/O Model

The main difference between this model and the signal-driven I/O model in the previous section is that with signal-driven I/O, the kernel tells us when an I/O operation can be initiated, but with asynchronous I/O, the kernel tells us when an I/O operation is complete.

#include <sys/select.h>
#include <sys/time.h>
int select(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timeval *timeout);
Returns: positive count of ready descriptors, 0 on timeout, -1 on error

This function allows the process to instruct the kernel to wait for any one of multiple events to occur and to wake up the process only when one or more of these events occurs or when a specified amount of time has passed.

example

  1. Any of the descriptors in the set {1, 4, 5} are ready for reading
  2. Any of the descriptors in the set {2, 7} are ready for writing
  3. Any of the descriptors in the set {1, 4} have an exception condition pending
  4. 10.2 seconds have elapsed

A timeval structure specifies the number of seconds and microseconds.

struct timeval {
long tv_sec; /* seconds */
long tv_usec; /* microseconds */
};

  1. Wait forever Return only when one of the specified descriptors is ready for I/O. For this, we specify the timeout argument as a null pointer.
  2. Wait up to a fixed amount of time Return when one of the specified descriptors is ready for I/O, but do not wait beyond the number of seconds and microseconds
    specified in the timeval structure pointed to by the timeout argument.
  3. Do not wait at all Return immediately after checking the descriptors. This is called polling. To specify this, the timeout argument must point to a timeval structure and
    the timer value (the number of seconds and microseconds specified by the structure) must be 0.

The wait in the first two scenarios is normally interrupted if the process catches a signal and returns from the signal handler.

for portability, we must be prepared for select to return an error of EINTR if we are catching signals.

There are only two exception conditions currently supported:

  1. The arrival of out-of-band data for a socket.
  2. The presence of control status information to be read from the master side of a pseudo-terminal that has been put into packet mode.

void FD_ZERO(fd_set *fdset); /* clear all bits in fdset */

void FD_SET(int fd, fd_set *fdset); /* turn on the bit for fd in fdset */

void FD_CLR(int fd, fd_set *fdset); /* turn off the bit for fd in fdset */

void FD_ZERO(fd_set *fdset); /* clear all bits in fdset */

int FD_ISSET(int fd, fd_set *fdset); /* is the bit for fd on in fdset ? */

For example

fd_set rset;
FD_ZERO(&rset); /* initialize the set: all bits off */
FD_SET(1, &rset); /* turn on bit for fd 1 */
FD_SET(4, &rset); /* turn on bit for fd 4 */
FD_SET(5, &rset); /* turn on bit for fd 5 */
Indeed, if all three pointers are null, then we have a higher precision timer than the normal Unix sleep function (which sleeps for multiples of a second).

The maxfdp1 argument specifies the number of descriptors to be tested. Its value is the maximum descriptor to be tested plus one

select modifies the descriptor sets pointed to by the readset, writeset, and exceptset pointers. These three arguments are value-result arguments.on return, the result
indicates which descriptors are ready.

The two most common programming errors when using select are to forget to add one to the largest descriptor number and to forget that the descriptor sets are value-result
arguments.

The return value from this function indicates the total number of bits that are ready across all the descriptor sets. If the timer value expires before any of the descriptors are ready, a value of 0 is returned. A return value of 1 indicates an error (which can happen, for example, if the function is interrupted by a caught signal).

  1. A socket is ready for reading if any of the following four conditions is true:
    1. The number of bytes of data in the socket receive buffer is greater than or equal to the current size of the low-water mark for the socket receive buffer.
      A read operation on the socket will not block and will return a value greater than 0 (i.e., the data that is ready to be read). We can set this low-water
      mark using the SO_RCVLOWAT socket option. It defaults to 1 for TCP and UDP sockets.
    2. The read half of the connection is closed (i.e., a TCP connection that has received a FIN). A read operation on the socket will not block and will return
      0 (i.e., EOF).
    3. The socket is a listening socket and the number of completed connections is nonzero. An accept on the listening socket will normally not block.
    4. A socket error is pending.A read operation on the socket will not block and will return an error ( 1) with errno set to the specific error condition. These
      pending errors can also be fetched and cleared by calling getsockopt and specifying the SO_ERROR socket option.
  2. A socket is ready for writing if any of the following four conditions is true:
    1. The number of bytes of available space in the socket send buffer is greater than or equal to the current size of the low-water mark for the socket send
      buffer and either:
      (i) the socket is connected, or
      (ii) the socket does not require a connection (e.g., UDP).
      This means that if we set the socket to nonblocking, a write operation will not block and will return a positive value (e.g., the number of bytes accepted by the transport layer). We can set this low-water mark using the SO_SNDLOWAT socket option. This low-water mark normally defaults to 2048 for TCP and UDP sockets.
    2. The write half of the connection is closed. A write operation on the socket will generate SIGPIPE
    3. A socket using a non-blocking connect has completed the connection, or the connect has failed.
    4. A socket error is pending. A write operation on the socket will not block and will return an error ( -1) with errno set to the specific error condition. These
      pending errors can also be fetched and cleared by calling getsockopt with the SO_ERROR socket option.
  3. A socket has an exception condition pending if there is out-of-band data for the socket or the socket is still at the out-of-band mark.

Notice that when an error occurs on a socket, it is marked as both readable and writable by select.

The function fileno converts a standard I/O file pointer into its corresponding descriptor

batch mode

we need is a way to close one-half of the TCP connection. That is, we want to send a FIN to the server, telling it we have finished sending data, but leave the socket descriptor
open for reading. This is done with the shutdown function.

#include <sys/socket.h>
int shutdown(int sockfd, int howto);
Returns: 0 if OK, -1 on error

there are two limitations with close that can be avoided with shutdown:

  1. close decrements the descriptor's reference count and closes the socket only if the count reaches 0. With shutdown, we can initiate TCP's normal connection termination sequence (the four segments beginning with a FIN ), regardless of the reference count.
  2. close terminates both directions of data transfer, reading and writing. Since a TCP connection is full-duplex, there are times when we want to tell the other end that
    we have finished sending, even though that end might have more data to send us.

The action of the function depends on the value of the howto argument.

  1. SHUT_RD  The read half of the connection is closed
    No more data can be received on the socket and any data currently in the socket receive buffer is discarded. The process can no longer issue any of the read functions on the socket. Any data received after this call for a TCP socket is acknowledged and thensilently discarded.
    By default, everything written to a routing socket loops back as possible input to all routing sockets on the host. Some programs callshutdown with a second argument of SHUT_RD to prevent the loopback copy. An alternative way to prevent this loopback copy is to clear the SO_USELOOPBACK socket option.
  2. SHUT_WR The write half of the connection is closed
    In the case of TCP, this is called a half-close. Any data currently in the socket send buffer will be sent, followed by TCP's normal connection termination sequence. As we mentioned earlier, this closing of the write half is done regardless of whether or not the socket descriptor's reference count is currently greater than 0. The process can no longer issue any of the write functions on the socket.
  3. SHUT_RDWR The read half and the write half of the connection are both closed
    This is equivalent to calling shutdown twice: first with SHUT_RD and then with SHUT_WR.

denial-of-service attack.

  1. use nonblocking I/O
  2.  have each client serviced by a separate thread of control (e.g., either spawn a process or a thread to service each client)
  3. place a timeout on the I/O operations

#include <sys/select.h>
#include <signal.h>
#include <time.h>
int pselect (int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timespec *timeout, const sigset_t *sigmask);
Returns: count of ready descriptors, 0 on timeout, -1 on error

  1. pselect uses the timespec structure
    struct timespec {
    time_t tv_sec; /* seconds */
    long tv_nsec; /* nanoseconds */
    };

    The tv_nsec member of the newer structure specifies nanoseconds, whereas the tv_usec member of the older structure specifies microseconds
  2. pselect adds a sixth argument: a pointer to a signal mask. This allows the program to disable the delivery of certain signals, test some global variables that are set by
    the handlers for these now-disabled signals, and then call pselect, telling it to reset the signal mask.

sigset_t newmask, oldmask, zeromask;
sigemptyset(&zeromask);
sigemptyset(&newmask);
sigaddset(&newmask, SIGINT);
sigprocmask(SIG_BLOCK, &newmask, &oldmask); /* block SIGINT */
if (intr_flag)
handle_intr(); /* handle the signal */
if ( (nready = pselect ( ... , &zeromask)) < 0) {
if (errno == EINTR) {
if (intr_flag)
handle_intr ();
}
...
}

Before testing the intr_flag variable, we block SIGINT. When pselect is called, it replaces the signal mask of the process with an empty set (i.e., zeromask) and then checks the
descriptors, possibly going to sleep. But when pselect returns, the signal mask of the process is reset to its value before pselect was called (i.e., SIGINT is blocked).

#include <poll.h>
int poll (struct pollfd *fdarray, unsigned long nfds, int timeout);
Returns: count of ready descriptors, 0 on timeout, -1 on error

struct pollfd {
int fd; /* descriptor to check */
short events; /* events of interest on fd */
short revents; /* events that occurred on fd */
};

The conditions to be tested are specified by the events member, and the function returns the status for that descriptor in the corresponding revents member.

  1. All regular TCP data and all UDP data is considered normal.
  2. TCP's out-of-band data  is considered priority band.
  3. When the read half of a TCP connection is closed (e.g., a FIN is received), this is also considered normal data and a subsequent read operation will return 0.
  4. The presence of an error for a TCP connection can be considered either normal data or an error (POLLERR). In either case, a subsequent read will return -1 with errno
    set to the appropriate value. This handles conditions such as the receipt of an RST or a timeout.
  5. The availability of a new connection on a listening socket can be considered either normal data or priority data. Most implementations consider this normal data.
  6. The completion of a nonblocking connect is considered to make a socket writable.

The number of elements in the array of structures is specified by the nfds argument.

The return value from poll is -1 if an error occurred, 0 if no descriptors are ready before the timer expires, otherwise it is the number of descriptors that have a nonzero revents
member.

If we are no longer interested in a particular descriptor, we just set the fd member of the pollfd structure to a negative value. Then the events member is ignored and the revents
member is set to 0 on return.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值