IO多路复用学习（3） epoll

mobai7

已于 2023-11-30 16:14:37 修改

阅读量92

点赞数

分类专栏： IO多路复用文章标签： c语言网络学习

于 2023-09-18 14:26:27 首次发布

本文链接：https://blog.csdn.net/mobai7/article/details/132976817

版权

IO多路复用专栏收录该内容

5 篇文章 0 订阅

订阅专栏

本篇介绍epoll的相关机制和用法。

一、epoll接口

epoll API有三个：epoll_create，epoll_wait，epoll_ctl。三个API共同使用才能完成epoll IO多路复用的功能，这里是与select和poll不一样的地方。

1.epoll_create

int epoll_create(int size);
int epoll_create1(int flags);

epoll_create创建一个epoll实例。从Linux 2.6.8版本开始，size参数被忽略，但必须大于0。原本size参数表示epoll管理的文件描述符的数量，现在内核会动态管理，自动分配内存大小。
epoll_create返回指向新创建epoll实例的文件描述符。随后的epoll_ctl和epoll_wait接口都要使用这个文件描述符。epoll使用完后，需要调用 close() 关闭epoll_create返回的文件描述符。
如果出错，返回-1。
flags参数为0时，epoll_create1等同于epoll_create。

2.epoll_ctl

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

这个系统调用接口提供了控制epoll实例对象（epfd）的方法。
参数含义：

epfd：epoll_ctl用来在epfd指向的epoll实例上做相应操作，这些操作包含增，删，改。
fd：目标文件描述符。

op：操作类型由op参数指定。由如下三种类型：

操作类型	含义
EPOLL_CTL_ADD	把要监听的文件描述符fd注册到epfd指向的epoll实例中，并将event事件与fd关联起来。
EPOLL_CTL_MOD	更改与目标文件描述符关联的event事件。
EPOLL_CTL_DEL	从epfd指向的epoll实例中删除目标文件描述符fd，event可传NULL。

event：监听的事件，与fd关联。
struct epoll_event定义：

           typedef union epoll_data {
               void        *ptr;
               int          fd;
               uint32_t     u32;
               uint64_t     u64;
           } epoll_data_t;

           struct epoll_event {
               uint32_t     events;      /* Epoll events */
               epoll_data_t data;        /* User data variable */
           };

events是位掩码，可由如下事件类型组合而成：

事件类型	含义
EPOLLIN	读事件，关联的fd可read()
EPOLLOUT	写事件，关联的fd可write()
EPOLLRDHUP(since Linux 2.6.17)	Stream socket peer closed connection, or shut down writing half of connection. (This flag is especially useful for writing simple code to detect peer shutdown when using Edge Triggered monitoring.)
EPOLLPRI	文件描述符存在异常
EPOLLERR	关联的文件描述符发生错误。 This event is also reported for the write end of a pipe when the read end has been closed.
EPOLLHUP	关联的文件描述符挂起。 epoll_wait(2) will always wait for this event; it is not necessary to set it in events. Note that when reading from a channel such as a pipe or a stream socket, this event merely indicates that the peer closed its end of the channel. Subsequent reads from the channel will return 0 (end of file) only after all outstanding data in the channel has been consumed.
EPOLLET	将关联的文件描述符设置为边缘触发。epoll的默认行为是水平触发。
EPOLLONESHOT (since Linux 2.6.2)	设置关联文件描述符的一次性行为。Sets the one-shot behavior for the associated file descriptor. This means that after an event is pulled out with epoll_wait(2) the associated file descriptor is internally disabled and no other events will be reported by the epoll interface. The user must call epoll_ctl() with EPOLL_CTL_MOD to rearm the file descriptor with a new event mask.
EPOLLWAKEUP (since Linux 3.5)	If EPOLLONESHOT and EPOLLET are clear and the process has the CAP_BLOCK_SUSPEND capability, ensure that the system does not enter “suspend” or “hibernate” while this event is pending or being processed. The event is considered as being “processed” from the time when it is returned by a call to epoll_wait(2) until the next call to epoll_wait(2) on the same epoll(7) file descriptor, the closure of that file descriptor, the removal of the event file descriptor with EPOLL_CTL_DEL, or the clearing of EPOLLWAKEUP for the event file descriptor with EPOLL_CTL_MOD.
EPOLLEXCLUSIVE (since Linux 4.5)	Sets an exclusive wakeup mode for the epoll file descriptor that is being attached to the target file descriptor, fd. When a wakeup event occurs and multiple epoll file descriptors are attached to the same target file using EPOLLEXCLUSIVE, one or more of the epoll file descriptors will receive an event with epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not set) is for all epoll file descriptors to receive an event. EPOLLEXCLUSIVE is thus useful for avoiding thundering herd problems in certain scenarios. If the same file descriptor is in multiple epoll instances, some with the EPOLLEXCLUSIVE flag, and others without, then events will be provided to all epoll instances that did not specify EPOLLEXCLUSIVE, and at least one of the epoll instances that did specify EPOLLEXCLUSIVE. The following values may be specified in conjunction with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and EPOLLET. EPOLLHUP and EPOLLERR can also be specified, but this is not required: as usual, these events are always reported if they occur, regardless of whether they are specified in events. Attempts to specify other values in events yield an error. EPOLLEXCLUSIVE may be used only in an EPOLL_CTL_ADD operation; attempts to employ it with EPOLL_CTL_MOD yield an error. If EPOLLEXCLUSIVE has been set using epoll_ctl(), then a subsequent EPOLL_CTL_MOD on the same epfd, fd pair yields an error. A call to epoll_ctl() that specifies EPOLLEXCLUSIVE in events and specifies the target file descriptor fd as an epoll instance will likewise fail. The error in all of these cases is EINVAL.（线程独占性）

返回值：成功返回0。失败返回-1。

3.epoll_wait

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

这个接口等待epoll实例（epfd）上的事件发生。内核返回事件集合events（struct epoll_event类型的数组），epoll_wait最多返回maxevents（struct epoll_event类型的数组大小）个事件，maxevents必须大于0。
timeout是epoll_wait阻塞等待的milliseconds。
调用会阻塞直到以下情况：

文件描述符上发生事件
调用被信号处理程序中断
超时时间到达

timeout设置为-1，则epoll_wait会一直阻塞。timeout设置为0，则epoll_wait立即返回，即使没有事件发生。
返回的events中struct epoll_event data域包含了与最近调用epoll_ctl（EPOLL_CTL_ADD, EPOLL_CTL_MOD）中指定的同样的数据，可以获取的对应的文件描述符。events域包含了返回的事件bit。
返回值：
如果成功，epoll_wait返回准备好的I/O文件描述符个数；或者返回0，表示在timeout内，没有文件描述符准备好。
如果失败，epoll_wait返回-1并设置适当的errno。

二、使用epoll注意的点

1、epoll的工作模式

epoll有两种工作模式，边缘触发（ET:edge-triggered）和水平触发（LT:level-triggered）。
epoll事件分发接口即可以设置为边缘触发，也可以设置为水平触发。
在下面的这种场景下，我们可以看看两种机制的不同表现：
1.读端的文件描述符rfd注册在epoll实例上。
2.写端在数据管道（socket）上写了2KB的数据。
3.调用epoll_wait完成，返回准备好的文件描述符rfd。
4.读端读取了1KB的数据。
5.调用epoll_wait完成。
如果rfd文件描述符设置了EPOLLET（边缘触发），那么在上述第5步，epoll_wait的调用可能会挂起尽管在文件的读缓冲区中仍有数据可读。同时，远端发送了数据正在等待响应。上述的原因是在边缘触发的模式下，只有当监听的文件描述符有改变的时候epoll才会传递事件。因此，在第5步中调用者可能会结束等待输入缓冲区中已经存在的数据。在上面的例子中，第2步有写操作，将在rfd上产生一个事件，并且该事件在第3步被消费。由于第4步中的读操作没有读取完所有的缓冲区，因此第5步对epoll_wait的调用可能会无限阻塞。
程序如果使用EPOLLET，那么应该使用非阻塞的文件描述符，避免阻塞在read或write导致多路复用中其他的文件描述符饿死。建议这样使用epoll的边缘触发模式（EPOLLET）：

使用非阻塞的文件描述符。
read或write返回EAGAIN后说明读完或写完，可以等待下一次事件。
需要一次性把数据读完。

相比之下，当使用水平触发时（默认是水平触发，如果不指定EPOLLET），epoll是更快的poll，可以在任何使用poll的地方使用，它们有相同的语义。

边缘触发和水平触发差异实例可参考

2、用法示例

水平触发的使用与poll相同，边缘触发使用时需要注意，避免程序在时间循环中停顿。示例中，监听在非阻塞的socket上。listen_sock是水平触发，conn_sock是边缘触发。函数do_use_fd()使用准备好的文件描述符来读数据或写数据，需要一次性把数据读完，只到read或write返回EAGAIN。

           #define MAX_EVENTS 10
           struct epoll_event ev, events[MAX_EVENTS];
           int listen_sock, conn_sock, nfds, epollfd;

           /* Code to set up listening socket, 'listen_sock',
              (socket(), bind(), listen()) omitted */

           epollfd = epoll_create1(0);
           if (epollfd == -1) {
               perror("epoll_create1");
               exit(EXIT_FAILURE);
           }

           ev.events = EPOLLIN;
           ev.data.fd = listen_sock;
           if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) {
               perror("epoll_ctl: listen_sock");
               exit(EXIT_FAILURE);
           }

           for (;;) {
               nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
               if (nfds == -1) {
                   perror("epoll_wait");
                   exit(EXIT_FAILURE);
               }

               for (n = 0; n < nfds; ++n) {
                   if (events[n].data.fd == listen_sock) {
                       conn_sock = accept(listen_sock,
                                          (struct sockaddr *) &addr, &addrlen);
                       if (conn_sock == -1) {
                           perror("accept");
                           exit(EXIT_FAILURE);
                       }
                       setnonblocking(conn_sock);
                       ev.events = EPOLLIN | EPOLLET;
                       ev.data.fd = conn_sock;
                       if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
                                   &ev) == -1) {
                           perror("epoll_ctl: conn_sock");
                           exit(EXIT_FAILURE);
                       }
                   } else {
                       do_use_fd(events[n].data.fd);
                   }
               }
           }

3、对端关闭的检测

对端关闭（程序里close，shell里kill或ctrl + C），epoll触发EPOLLIN和EPOLLRDHUP事件。因此可以有两种方法检测对端是否关闭。

检测EPOLLIN事件，read返回0。
检测EPOLLRDHUP事件。

4、QA

Q0:What is the key used to distinguish the file descriptors registered in an epoll set?
A0:The key is the combination of the file descriptor number and the open file description (also known as an “open file handle”, the kernel’s internal representation of an open file).

Q1:What happens if you register the same file descriptor on an epoll instance twice?
A1:You will probably get EEXIST. However, it is possible to add a duplicate (dup(2), dup2(2), fcntl(2) F_DUPFD) file descriptor to the same epoll instance. This can be a useful technique for filtering events, if the duplicate file descriptors are registered with different events masks.

Q2:Can two epoll instances wait for the same file descriptor? If so, are events reported to both epoll file descriptors?
A2:Yes, and events would be reported to both. However, careful programming may be needed to do this correctly.

Q3:Is the epoll file descriptor itself poll/epoll/selectable?
A3:Yes. If an epoll file descriptor has events waiting, then it will indicate as being readable.

Q4:What happens if one attempts to put an epoll file descriptor into its own file descriptor set?
A4:The epoll_ctl(2) call fails (EINVAL). However, you can add an epoll file descriptor inside another epoll file descriptor set.

Q5:Can I send an epoll file descriptor over a UNIX domain socket to another process?
A5:Yes, but it does not make sense to do this, since the receiving process would not have copies of the file descriptors in the epoll set.

Q6:Will closing a file descriptor cause it to be removed from all epoll sets automatically?
A6:Yes, but be aware of the following point. A file descriptor is a reference to an open file description (see open(2)). Whenever a file descriptor is duplicated via dup(2), dup2(2), fcntl(2) F_DUPFD, or fork(2), a new file descriptor referring to the same open file description is created. An open file description continues to exist until all file descriptors referring to it have been closed. A file descriptor is removed from an epoll set only after all the file descriptors referring to the underlying open file description have been closed (or before if the file descriptor is explicitly removed using epoll_ctl(2) EPOLL_CTL_DEL). This means that even after a file descriptor that is part of an epoll set has been closed, events may be reported for that file descriptor if other file descriptors referring to the same underlying file description remain open.

Q7: If more than one event occurs between epoll_wait(2) calls, are they combined or reported separately?
A7:They will be combined.

Q8:Does an operation on a file descriptor affect the already collected but not yet reported events?
A8:You can do two operations on an existing file descriptor. Remove would be meaningless for this case. Modify will reread available I/O.

Q9:Do I need to continuously read/write a file descriptor until EAGAIN when using the EPOLLET flag (edge-triggered behavior) ?
A9:Receiving an event from epoll_wait(2) should suggest to you that such file descriptor is ready for the requested I/O operation. You must consider it ready until the next (nonblocking) read/write yields EAGAIN. When and how you will use the file descriptor is entirely up to you.
For packet/token-oriented files (e.g., datagram socket, terminal in canonical mode), the only way to detect the end of the read/write I/O space is to continue to read/write until EAGAIN.
For stream-oriented files (e.g., pipe, FIFO, stream socket), the condition that the read/write I/O space is exhausted can also be detected by checking the amount of data read from / written to the target file descriptor. For example, if you call read(2) by asking to read a certain amount of data and read(2) returns a lower number of bytes, you can be sure of having exhausted the read I/O space for the file descriptor. The same is true when writing using write(2). (Avoid this latter technique if you cannot guarantee that the monitored file descriptor always refers to a stream-oriented file.)