select poll epoll区别

最新推荐文章于 2024-07-25 17:10:43 发布

memiracle

最新推荐文章于 2024-07-25 17:10:43 发布

阅读量485

点赞数

分类专栏：协议

本文链接：https://blog.csdn.net/memiracle/article/details/29855239

版权

协议专栏收录该内容

10 篇文章 0 订阅

订阅专栏

select:

下面是select的函数接口：

int select (int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

select 函数监视的文件描述符分3类，分别是writefds、readfds、和exceptfds。调用后select函数会阻塞，直到有描述副就绪（有数据可读、可写、或者有except），或者超时（timeout指定等待时间，如果立即返回设为null即可），函数返回。当select函数返回后，可以通过遍历fdset，来找到就绪的描述符。

select目前几乎在所有的平台上支持，其良好跨平台支持也是它的一个优点。select的一个缺点在于单个进程能够监视的文件描述符的数量存在最大限制，在Linux上一般为1024，可以通过修改宏定义甚至重新编译内核的方式提升这一限制，但是这样也会造成效率的降低。

poll：

int poll (struct pollfd *fds, unsigned int nfds, int timeout);

不同与select使用三个位图来表示三个fdset的方式，poll使用一个 pollfd的指针实现。

struct pollfd {
int fd; /* file descriptor */
short events; /* requested events to watch */
short revents; /* returned events witnessed */
};

pollfd结构包含了要监视的event和发生的event，不再使用select“参数-值”传递的方式。同时，pollfd并没有最大数量限制（但是数量过大后性能也是会下降）。和select函数一样，poll返回后，需要轮询pollfd来获取就绪的描述符。

从上面看，select和poll都需要在返回后，通过遍历文件描述符来获取已经就绪的socket。事实上，同时连接的大量客户端在一时刻可能只有很少的处于就绪状态，因此随着监视的描述符数量的增长，其效率也会线性下降。

epoll:

epoll的接口如下：

int epoll_create(int size)；
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)；
            typedef union epoll_data {
                void *ptr;
                int fd;
                __uint32_t u32;
                __uint64_t u64;
            } epoll_data_t;

            struct epoll_event {
                __uint32_t events;      /* Epoll events */
                epoll_data_t data;      /* User data variable */
            };

int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);

主要是epoll_create,epoll_ctl和epoll_wait三个函数。epoll_create函数创建epoll文件描述符，参数size并不是限制了epoll所能监听的描述符最大个数，只是对内核初始分配内部数据结构的一个建议。返回是epoll描述符。-1表示创建失败。epoll_ctl 控制对指定描述符fd执行op操作，event是与fd关联的监听事件。op操作有三种：添加EPOLL_CTL_ADD，删除EPOLL_CTL_DEL，修改EPOLL_CTL_MOD。分别添加、删除和修改对fd的监听事件。epoll_wait 等待epfd上的io事件，最多返回maxevents个事件。

在 select/poll中，进程只有在调用一定的方法后，内核才对所有监视的文件描述符进行扫描，而epoll事先通过epoll_ctl()来注册一个文件描述符，一旦基于某个文件描述符就绪时，内核会采用类似callback的回调机制，迅速激活这个文件描述符，当进程调用epoll_wait() 时便得到通知。

epoll的优点主要是一下几个方面：

1. 监视的描述符数量不受限制，它所支持的FD上限是最大可以打开文件的数目，这个数字一般远大于2048,举个例子,在1GB内存的机器上大约是10万左右，具体数目可以cat /proc/sys/fs/file-max察看,一般来说这个数目和系统内存关系很大。select的最大缺点就是进程打开的fd是有数量限制的。这对于连接数量比较大的服务器来说根本不能满足。虽然也可以选择多进程的解决方案( Apache就是这样实现的)，不过虽然linux上面创建进程的代价比较小，但仍旧是不可忽视的，加上进程间数据同步远比不上线程间同步的高效，所以也不是一种完美的方案。

2. IO的效率不会随着监视fd的数量的增长而下降。epoll不同于select和poll轮询的方式，而是通过每个fd定义的回调函数来实现的。只有就绪的fd才会执行回调函数。

3.支持电平触发和边沿触发（只告诉进程哪些文件描述符刚刚变为就绪状态，它只说一遍，如果我们没有采取行动，那么它将不会再次告知，这种方式称为边缘触发）两种方式，理论上边缘触发的性能要更高一些，但是代码实现相当复杂。

4.mmap加速内核与用户空间的信息传递。epoll是通过内核于用户空间mmap同一块内存，避免了无畏的内存拷贝。

epoll的实现原理

1 功能介绍

epoll与select/poll的不同一点是，它是由一组系统调用组成。

int epoll_create(int size);

int epoll_ctl(int epfd,int op,int fd,struct epoll_event* event);

int epoll_wait(int epfd,struct epoll_event* events, int maxevents,int timeout);

epoll相关系统调用是在Linux2.5.44开始引入的。该系统调用针对传统的select/poll系统调用的不足，设计上做了很大的改进。select/poll缺点在于:

1.每次调用时要重复从用户地址空间读入参数。

2.每次调用是要重复扫描文件描述符。

3.每次调用开始时，要把当前进程放入各个文件描述符的等待队列。在调用结束后，又把进程从等待队列中删除。

在实际应用中，select/poll监视的文件描述符可能非常多，如果每次只是返回一小部分，那么，在这种情况下select/poll不够高效。epoll设计思路是把select/poll单个操作拆分成1个epoll_create+ 多个 epoll_ctrl+1个wait.此外，内核针对epoll操作添加了一个文件系统"eventpollfs",每一个或者多个要监视的文件描述符都有一个对应的eventpollfs文件系统的inode节点，主要信息保存在eventpoll结构中。而被监视的文件的重要信息则保存在epitem结构中。索引他们是一对多的关系。

在执行epoll_create和epoll_ctrl时，已经把用户空间的信息保存在内核空间中，所以在反复调用epoll_wait时，避免了以上的三个缺点.

看一下具体实现。

2. 关键结构体:

/* Wrapper struct used by poll queueing*/
struct ep_pqueue

{

poll_table pt;

struct epitm *epi;

};

这个结构体类似于select/poll中的struct poll_wqueues.由于epoll需要在内核中保存大量信息，所以单单一个回调函数已经不能满足需求，所以引入了struct epitem.

/*

* Each file descriptor added to the eventpoll interface will have an entry of this tyoe linked to the hash

*/

struct epitem

{

/* RB-Tree node used to link this struct to the eventpoll rb-tree*/

struct rb_node rbn;

/* List header used to link this struct to the eventpoll ready list*/

struct list_head rdllink;

/* the file descriptor information this item refers to */

struct epoll_filefd ffd;

/* Number of active wait queue attachec to poll operations*/

int nwait;

/* list containing poll wait queues*/

struct list_head pwqlist;

/* The "container" of this item*/

struct eventpoll *ep;

/* The struct that describe the interested events and the source fd*/

struct epoll_event event;

atomic_t usecnt;

/* List header used to link this item to the "struct file" items list*/

struct list_head fllink;

/* List header used to link th item to the transfer list*/

struct list_head txlink;

/* This is used during the collection/transfer of events to usrspace to pin item empty event set*/

//文件描述符状态,在收集和传输是用来锁定的事件集合

unsigned int revents;

};

上面该结构用来保存与epoll节点关联的多个文件描述符，保存的方式是使用红黑树实现的hash表。至于为什么要保存，下文有详细解释。它与被监听的文件描述符一一对应。

struct eventpoll

{

/* Protect the this struct access*/

rwlock_t lock;

/* This semaphore is used to ensure that files are not removed while epoll is using them. This is read- held during the event collection loop and it is write-held during the file cleanup path,the epoll file

exit code and the ctl operations.*/

struct rw_semaphore sem;

/* Wait_queue_used by file->poll()*/

wait_queue_head_t poll_wait;

/*List of ready file descriptors*/

struct list_head rdllist;

/*RB-Tree root used to store monitored fd structs*/

struct rb_root rbr;

};

这个结构保存了epoll文件描述符号的扩展信息，他被保存在file 结构体的private_data中。与epool文件节点一一对应。通常一个epoll文件节点对应多个被监视的文件描述符。所以一个eventpoll结构体会对应多个epitem结构体.

/*Wait struct used by poll hooks*/

struct eppoll_entry

{

/* List header used to link this struct to the "struct epitem "*/

struct list_head llink;

/* The "Base" pointer is set to the container "struct epitem" */

void * base;

/*Wait queue item that will be linked to the target file wait queue head */

wait_queue_t wait;

/* The wait queue head that linked the "wait" wait queue item*/

wait_queue_head_t *head;

};

上面的结构表示epoll中的等待事件.由于epitem 对应一个被监视的文件，所以通过base 可以方便的得到被监视的文件信息。又因为一个文件可能有多个事件发生，所以可以用llink链接这些事件.

3. epoll_create的实现

epoll_create的功能是创建一个eventpollfs文件系统的inode节点。具体有ep_getfd()完成.ep_getfd()线调用ep_eventpoll_inode()创建一个inode节点,然后调用d_alloc()为inode分配一个dentry.最后把file,dentry,inode三者关联起来. 在执行了ep_getfd()后，它又调用了ep_file_init(),又调用了ep_file_init()，分配了eventpoll结构体，并把eventpoll的指针赋给file结构体，这样eventpoll就与file结构体关良了。

注意，size只起参考作用，只要它不小于等于0,就并不限制这个epoll_inode关良的文件描述符数量。

4.epoll_ctl的实现

epoll_ctl的功能是实现一系列操作，如把文件与eventpollfs文件系统的inode节点关联起来。这里要介绍一下eventpoll结构体，它保存在file->f_private中，记录了eventpollfs文件系统的inode节点的重要信息，其中成员rbr保存了该epoll文件节
点监视的所有文件描述符。组织的方式是一棵红黑树，这种结构体在查找节点时非常高效。
首先它调用ep_find()从eventpoll中的红黑树获得epitem结构体。然后根据op参数的不同而选择不同的操作。如果op为EPOLL_CTL_ADD，那么正常情况下epitem是不可能在eventpoll的红黑树中找到的，所以调用ep_insert创建一个epitem结构体并插入到对应的红黑树中。ep_insert()首先分配一个epitem对象，对它初始化后，把它放入对应的红黑树。此外，这个函数还要作一个操作，就是把当前进程放入对应文件操作的等待队列。这一步是由下面的代码完成的。
init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);
。。。
revents = tfile->f_op->poll(tfile, &epq.pt);
函数先调用init_poll_funcptr注册了一个回调函数 ep_ptable_queue_proc，这个函数会在调用f_op->poll时被执行。该函数分配一个epoll等待队列结点eppoll_entry：一方面把它挂到文件操作的等待队列中，另一方面把它挂到epitem的队列中。此外，它还注册了一个等待队列的回调函数ep_poll_callback。当文件操作完成，唤醒当前进程之前，会调用ep_poll_callbac()，把eventpoll放到epitem的完成队列中，并唤醒等待进程。如果在执行f_op->poll以后，发现被监视的文件操作已经完成了，那么把它放在完成队列中了，并立即把等待操作的那些进程唤醒。
5 epoll_wait的实现
epoll_wait的工作是等待文件操作完成并返回。
它的主体是ep_poll()，该函数在for循环中检查epitem中有没有已经完成的事件，有的话就把结果返回。没有的话调用schedule_timeout()进入休眠，直到进程被再度唤醒或者超时。
6 性能分析
epoll机制是针对select/poll的缺陷设计的。通过新引入的eventpollfs文件系统，epoll把参数拷贝到内核态，在每次轮询时不会重复拷贝。通过把操作拆分为epoll_create,epoll_ctl,epoll_wait，避免了重复地遍历要监视的文件描述符。此外，由于调用epoll的进程被唤醒后，只要直接从epitem的完成队列中找出完成的事件，找出完成事件的复杂度由O(N)降到了O(1)。但是epoll的性能提高是有前提的，那就是监视的文件描述符非常多，而且每次完成操作的文件非常少。所以，epoll能否显著提高效率，取决于实际的应用场景。这方面需要进一步测试。