《Linux/UNIX系统编程手册》 英文版读书笔记 Alternative I/O Models63.4

63.4.4 A Closer Look at epoll Semantics

When we create an epoll instance using epoll_create(), the kernel creates a new in-memory i-node and open file description, and allocates a new file descriptor in the calling process that refers to the open file description. The interest list for an epoll instance is associated with the open file description, not with the epoll file descriptor. This has the following consequences:

  • If we duplicate an epoll file descriptor using dup() (or similar), then the duplicated descriptor refers to the same epoll interest and ready lists as the original descriptor. We may modify the interest list by specifying either file descriptor as the epfd argument in a call to epoll_ctl(). Similarly, we can retrieve items from the ready list by specifying either file descriptor as the epfd argument in a call to epoll_wait().
  • The preceding point also applies after a call to fork(). The child inherits a duplicate of the parent’s epoll file descriptor, and this duplicate descriptor refers to the same epoll data structures.

When we perform an epoll_ctl() EPOLL_CTL_ADD operation, the kernel adds an item to the epoll interest list that records both the number of the monitored file descriptor and a reference to the corresponding open file description. For the purpose of epoll_wait() calls, the kernel monitors the open file description. This means that we must refine our earlier statement that when a file descriptor is closed, it is automatically removed from any epoll interest lists of which it is a member. The refinement is this: an open file description is removed from the epoll interest list once all file descriptors that refer to it have been closed. This means that if we create duplicate descriptors referring to an open file—using dup() (or similar) or fork()—then the open file will be removed only after the original descriptor and all of the duplicates have been closed.

63.4.5 Performance of epoll Versus I/O Multiplexing

We now look at the reasons why epoll performs better:

  • On each call to select() or poll(), the kernel must check all of the file descriptors specified in the call. By contrast, when we mark a descriptor to be monitored with epoll_ctl(), the kernel records this fact in a list associated with the underlying open file description, and whenever an I/O operation that makes the file descriptor ready is performed, the kernel adds an item to the ready list for the epoll descriptor. (An I/O event on a single open file description may cause multiple file descriptors associated with that description to become ready.) Subsequent epoll_wait() calls simply fetch items from the ready list.
  • Each time we call select() or poll(), we pass a data structure to the kernel that identifies all of the file descriptors that are to be monitored, and, on return,
    the kernel passes back a data structure describing the readiness of all of these descriptors. By contrast, with epoll, we use epoll_ctl() to build up a data structure in kernel space that lists the set of file descriptors to be monitored. Once this data structure has been built, each later call to epoll_wait() doesn’t need to pass any information about file descriptors to the kernel, and the call returns information about only those descriptors that are ready.

By contrast, epoll scales (linearly) according to the number of I/O events that occur. The epoll API is thus particularly efficient in a scenario that is common in servers that handle many simultaneous clients: of the many file descriptors being monitored, most are idle; only a few descriptors are ready.

63.4.6 Edge-Triggered Notification

By default, the epoll mechanism provides level-triggered notification. By this, we mean that epoll tells us whether an I/O operation can be performed on a file descriptor without blocking.

The epoll API also allows for edge-triggered notification—that is, a call to epoll_wait() tells us if there has been I/O activity on a file descriptor since the previous call to epoll_wait() (or since the descriptor was opened, if there was no previous call). Using epoll with edge-triggered notification is semantically similar to signal-driven I/O, except that if multiple I/O events occur, epoll coalesces them into a single notification returned via epoll_wait(); with signal-driven I/O, multiple signals may be generated.

We illustrate the difference between level-triggered and edge-triggered epoll notification using an example. Suppose that we are using epoll to monitor a socket for input (EPOLLIN), and the following steps occur:
1. Input arrives on the socket.
2. We perform an epoll_wait(). This call will tell us that the socket is ready, regardless of whether we are employing level-triggered or edge-triggered notification.
3. We perform a second call to epoll_wait().

If we are employing level-triggered notification, then the second epoll_wait() call will inform us that the socket is ready. If we are employing edge-triggered notification, then the second call to epoll_wait() will block, because no new input has arrived since the previous call to epoll_wait().

the general framework for using edge-triggered epoll notification is as follows:
1. Make all file descriptors that are to be monitored nonblocking.
2. Build the epoll interest list using epoll_ctl().
3. Handle I/O events using the following loop:
- a) Retrieve a list of ready descriptors using epoll_wait().
- b) For each file descriptor that is ready, process I/O until the relevant system call (e.g., read(), write(), recv(), send(), or accept()) returns with the error EAGAIN or EWOULDBLOCK.

Preventing file-descriptor starvation when using edge-triggered notification
Suppose that we are monitoring multiple file descriptors using edge-triggered notification, and that a ready file descriptor has a large amount (perhaps an endless stream) of input available. If, after detecting that this file descriptor is ready, we attempt to consume all of the input using nonblocking reads, then we risk starving the other file descriptors of attention (i.e., it may be a long time before we again check them for readiness and perform I/O on them). One solution to this problem is for the application to maintain a list of file descriptors that have been notified as being ready, and execute a loop that continuously performs the following actions:
1. Monitor the file descriptors using epoll_wait() and add ready descriptors to the application list. If any file descriptors are already registered as being ready in the application list, then the timeout for this monitoring step should be small or 0, so that if no new file descriptors are ready, the application can quickly proceed to the next step and service any file descriptors that are already known to be ready.
2. Perform a limited amount of I/O on those file descriptors registered as being ready in the application list (perhaps cycling through them in round-robin fashion, rather than always starting from the beginning of the list after each call to epoll_wait()). A file descriptor can be removed from the application list when the relevant nonblocking I/O system call fails with the EAGAIN or EWOULDBLOCK error.

Although it requires extra programming work, this approach offers other benefits in addition to preventing file-descriptor starvation. For example, we can include other steps in the above loop, such as handling timers and accepting signals with sigwaitinfo() (or similar).
Starvation considerations can also apply when using signal-driven I/O, since it also presents an edge-triggered notification mechanism. By contrast, starvation considerations don’t necessarily apply in applications employing a level-triggered notification mechanism. This is because we can employ blocking file descriptors with level-triggered notification and use a loop that continuously checks descriptors for readiness, and then performs some I/O on the ready descriptors before once more checking for ready file descriptors.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值