温故 epoll

select, poll, epoll 是三种不同的I/O事件通知机制. select最简单,poll次之,epoll要麻烦点,在于它的边缘触发和水平触发容易产生一些陷阱.

基本上来说, 网络编程中雷区多多,所以轻易不要直接用socket api来写网络应用程序, 使用 ACE, Boost, etc.之类的网络库比较保险一些。

但是,终归出了问题或者调优还得知道里面到底是怎么玩的,这里面 epoll 稍微复杂一点,有必要好好研究研究一下epoll的系统说明

现代Linux系统对于epoll 提供了三个系统调用 

An  epoll  set is connected to a file descriptor created by  epoll_create(2) . Interest for certain file descriptors is then registered via  epoll_ctl(2) . Finally, the actual wait is started by  epoll_wait(2) .

The  epoll  event distribution interface is able to behave both as Edge Triggered ( ET ) and Level Triggered ( LT ). The difference between ET and LT event distribution mechanism can be described as follows. Suppose that this scenario happens : 边缘触发ET和水平触发LT有什么区别?
  1. The file descriptor that represents the read side of a pipe ( RFD ) is added inside the epoll device.
  2. Pipe writer writes 2Kb of data on the write side of the pipe.
  3. A call to epoll_wait(2) is done that will return RFD as ready file descriptor.
  4. The pipe reader reads 1Kb of data from RFD.
  5. A call to epoll_wait(2) is done.
If the  RFD  file descriptor has been added to the  epoll  interface using the  EPOLLET  flag, the call to  epoll_wait(2)  done in step  5  will probably hang because of the available data still present in the file input buffers and the remote peer might be expecting a response based on the data it already sent. 

在边缘触发方式下,第5步或许会hang住,因为依然有 data 在输入缓冲里,而发送数据的远端可能正在期望对于发送的数据的一个响应

The reason for this is that Edge Triggered event distribution delivers events only when events happens on the monitored file. So, in step  5  the caller might end up waiting for some data that is already present inside the input buffer. 

In the above example, an event on  RFD  will be generated because of the write done in  2  , and the event is consumed in  3 . Since the read operation done in  4  does not consume the whole buffer data, the call to  epoll_wait(2)  done in step  5  might lock indefinitely. 

The  epoll  interface, when used with the  EPOLLET  flag ( Edge Triggered ) should use non-blocking file descriptors to avoid having a blocking read or write starve the task that is handling multiple file descriptors. 

边缘触发一定要用非阻塞方式,建议使用边缘触发,原因在于不会这种方式不会对同一不变的状态多次触发,仅在状态改变才作通知

The suggested way to use  epoll  as an Edge Triggered (  EPOLLET  ) interface is below, and possible pitfalls to avoid follow.

需要注意的事项有:

1) with non-blocking file descriptors 
使用非阻塞文件描述符

2) by going to wait for an event only after   read(2)   or   write(2)  return EAGAIN
只有在 read/write返回 EAGAIN 之后才去等待其他事件

On the contrary, when used as a Level Triggered interface,  epoll  is by all means a faster  poll(2) , and can be used wherever the latter is used since it shares the same semantics. 

Since even with the Edge Triggered  epoll  multiple events can be generated up on receival of multiple chunks of data, the caller has the option to specify the EPOLLONESHOT  flag, to tell  epoll  to disable the associated file descriptor after the receival of an event with  epoll_wait(2)

When the  EPOLLONESHOT  flag is specified, it is caller responsibility to rearm the file descriptor using  epoll_ctl(2)  with  EPOLL_CTL_MOD .

常用流程为
1) Add the fd into epoll watch set

int fdEpoll = epoll_create(MAX_FD_SIZE); // the parameter is ignored since Linux 2.6.8, but should be greater than 0
if(fdEpoll < 0)
    return -1;
struct epoll_event evt;
int sock;
memset(&evt, 0, sizeof(evt));
evt.events = EPOLLIN;
evt.data.fd=sock;

int nRet = epoll_ctl(fdEpoll, EPOLL_CTL_ADD, sock, &evt);
if(nRet < 0)
    return -2;



2) wait the interesting event trigger
struct epoll_event ev, *events;
for(;;) {
    nfds = epoll_wait(kdpfd, events, maxevents, -1);
    for(n = 0; n < nfds; ++n) {
        if(events[n].data.fd == listener) {
            client = accept(listener, (struct sockaddr *) &local,
                            &addrlen);
            if(client < 0){
                perror("accept");
                continue;
            }
            setnonblocking(client);
            ev.events = EPOLLIN | EPOLLET;
            ev.data.fd = client;
            if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {
                fprintf(stderr, "epoll set insertion error: fd=%d0,
                        client);
                return -1;
            }
        }
        else
            do_use_fd(events[n].data.fd);
    }
}



FAQ
----------------------
Q.Do I need to continuously read/write an fd until EAGAIN when using the  EPOLLET flag ( Edge Triggered behaviour ) ? A. No you don't. Receiving an event from  epoll_wait(2) should suggest to you that such file descriptor is ready for the requested I/O operation. You have simply to consider it ready until you will receive the next EAGAIN. When and how you will use such file descriptor is entirely up to you. Also, the condition that the read/write I/O space is exhausted can be detected by checking the amount of data read/write from/to the target file descriptor. For example, if you call read(2) by asking to read a certain amount of data and  read(2) returns a lower number of bytes, you can be sure to have exhausted the read I/O space for such file descriptor. Same is valid when writing using the  write(2) function.


注意事项

--------------------------

在边缘触发方式下,保险的方式是一条道跑到黑,读到read return -1, errno = EAGAIN (读完了) 或 read return 0 (连接关闭了)
当然 ,如果你知道了缓冲区中有多少数据,通过read 想读取一定量的数据,结果只返回了很少的数据,那么也可得知要读取的缓冲空间已经耗尽了?通过 ioctl 的 FIONREAD 命令?
从Linux 2.6.17 开始, 可以用 EPOLLRDHUP 来检测socket 的正常关闭, 注意不是 EPOLLHUP (这个是指异常的关闭).
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值