Linux IO 多路复用理解

最新推荐文章于 2024-08-11 19:35:46 发布

zhouguoqionghai

最新推荐文章于 2024-08-11 19:35:46 发布

阅读量9.5k

点赞数 4

分类专栏： TCPIP Linux 文章标签： linux 运维服务器

本文链接：https://blog.csdn.net/zhouguoqionghai/article/details/82531523

版权

Linux 同时被 2 个专栏收录

37 篇文章 3 订阅

订阅专栏

TCPIP

18 篇文章 0 订阅

订阅专栏

1、复用的意思是不用每个进程/线程只能操控一个IO，只需一个进程/线程来操控多个IO，复用的是进程/线程。

2、内核空间不能直接解引用用户态的指针。

select 与 poll

select 传递 fd_set* 的指针，仍然需要将fd_set从用户态拷贝到内核态(上边提到的内核态不能直接解引用用户态的指针)。 poll 传递的 pollfd* 指针一样需要从用户态拷贝所有 pollfd 结构到内核态。（ copy_from_user 方法）

fd_set 只是一个包装成 struct 的数组，就是一个 1024bit 的bitmap 而已。由于传入时需要用来标记监控的文件描述符，返回时也要用其标记是否有事件发生，所以每次调用前需要初始化。fd_set 是一个静态的数组，所以 select 支持的文件描述符数量有限，而 poll 传入的相当于一个动态数组（指针 + 元素个数），所以支持的文件描述符数量没有限制。

pollfd 将文件描述符和事件用不同的字段来分离表示，绑定到一个结构体当中。传入时用 events 表示监控的事件，传出时用 revents 表示返回的事件，所以不用像 select 一样每次调用初始化一下。

struct pollfd
  {
    int fd;			/* File descriptor to poll.  */
    short int events;		/* Types of events poller cares about.  */
    short int revents;		/* Types of events that actually occurred.  */
  };

select 和 polled 的问题在于感兴趣的文件描述符一直由用户态记录，而 epoll 则交个内核来管理。select 和 poll 传入到内核的结构，内核需要遍历所有传入的文件描述符，依次检查每个文件描述符是否有监控的事件发生。在检查之前，会将当前进程加入到文件描述符 fd 的 wait queue 当中。

1、在 select 和 poll 调用之前，如果有事件发生，网卡通过中断和中断联合（interrupt coalescing）来通知内核读写，此时由于检测到事件，内核将 fd_set/ pollfd 结果拷贝到用户态，同时用户态调用返回。用户态遍历所有监控的文件描述符，检查返回结果，作对应处理。

2、如果在 select 和 poll 调用之前，如果没有事件发生。select/poll将阻塞，进程休眠，直到超时、被中断、新的事件来临，内核按文件描述符 fd 为依托来处理。此时该 fd 上的 wait queue 会被依次唤醒。通常的实现下，select/poll 并不知道是自己是被哪个 fd 唤醒，所以又需要再去遍历一遍所有传入的fd，然后同 1 一样在用户态返回和处理。

epoll

epoll 将所有需要监控的文件描述符同一交给内核来管理，所以不需要在每次调用时拷贝。步骤细化，涉及到 3 个调用。

typedef union epoll_data
{
  void *ptr;
  int fd;
  uint32_t u32;
  uint64_t u64;
} epoll_data_t;

struct epoll_event
{
  uint32_t events;	/* Epoll events */
  epoll_data_t data;	/* User data variable */
} __EPOLL_PACKED;


/* Creates an epoll instance.  Returns an fd for the new instance.
   The "size" parameter is a hint specifying the number of file
   descriptors to be associated with the new instance.  The fd
   returned by epoll_create() should be closed with close().  */
extern int epoll_create (int __size) __THROW;

/* Manipulate an epoll instance "epfd". Returns 0 in case of success,
   -1 in case of error ( the "errno" variable will contain the
   specific error code ) The "op" parameter is one of the EPOLL_CTL_*
   constants defined above. The "fd" parameter is the target of the
   operation. The "event" parameter describes which events the caller
   is interested in and any associated user data.  */
extern int epoll_ctl (int __epfd, int __op, int __fd,
		      struct epoll_event *__event) __THROW;


/* Wait for events on an epoll instance "epfd". Returns the number of
   triggered events returned in "events" buffer. Or -1 in case of
   error with the "errno" variable set to the specific error code. The
   "events" parameter is a buffer that will contain triggered
   events. The "maxevents" is the maximum number of events to be
   returned ( usually size of "events" ). The "timeout" parameter
   specifies the maximum wait time in milliseconds (-1 == infinite).

   This function is a cancellation point and therefore not marked with
   __THROW.  */
extern int epoll_wait (int __epfd, struct epoll_event *__events,
		       int __maxevents, int __timeout);

epoll 传入的和 poll 相似，也是一个动态数组，所以数量也没有限制。内核使用红黑树来快速的添加删除需要监控的文件描述符，同时基于事件驱动，文件描述符 fd 有事件发生时，内核的回调函数会将该 fd 加入到内核维护的 ready list 内。所以调用 epoll_ctl 时，内核只需要去检查 ready list 并拷贝结果到用户态即可。所以 epoll_ctl 调用时，ready list 为空，进程休眠。在进程挂在 fd 上的 wait queue 被唤醒之前，内核已经将事件添加到 ready list 了，所以这个时候仍然只要简单的将 ready list 的结果返回给用户态而已。就是说，在用户态，返回的结果只包含产生了事件的文件描述符。

最后，select 和 poll 实际上是水平触发模式，而 epoll 不仅支持水平触发，而且可以设置为边沿触发。

实践经历

无论对端正常还是异常关闭，在本端调用 close 之前，水平触发将一直有 EPOLLIN 事件。因为 TCP 允许只关闭发送或者接收，通过 shutdown 实现。close 调用将同时关闭发送和接收，close 其实是让文件描述符无效。对端可能 FIN 再 RST，或者直接RST，或者只有FIN。

1、对端close 发送 FIN，如果本端继续发送，对端将发送 RST.

2、接收缓冲区还有数据时，linger 的数值是缺省的 0，此时 close 将直接发送 RST 给对端。发送缓冲区有数据，开关参数 l_onoff 非 0，超时参数 l_linger 为 0，此时发送缓冲区还有数据，调用 close 也是直接发送 RST 对对端。

1、如果对端直接 RST 关闭，本端的第一次事件为 EPOLLIN + EPOLLERR + EPOLLHUP (19)，此时 read 返回 -1，错误码为 Connection reset by peer。之后事件为 EPOLLIN + EPOLLHUP (11)，read 返回 0.

2、对端 FIN 之后再 RST 或正常关闭，本端的事件一直是 EPOLLIN，read 返回 0.

close 之后，自动从内核的监控数据结构中去除，再无事件。而 poll 由于兴趣文件描述符需要自己维护，所以需要手动从 pollfd 列表中去除，select 也需要从 fd_set 中去除。否则再次 poll 和 select 获取结果时，poll 直接返回（不阻塞），对应的 pollfd 的事件为 POLLNVAL，而 select 直接返回（不阻塞）-1，错误码为 EBADF，文件描述符错误，并不能知道是哪个文件描述符，所以 select 区别于 poll 的最大问题在于事件没有与文件描述符绑定。

对端正常关闭，本端未调用 close 之前，read 将一直返回 0，所以用 read 返回判断对端是否已经关闭。EPOLLRDHUP 需要注册作为输入，在输出中才能看到。

本端 close 之后，read，write 返回错误，Bad file descripter。对端关闭（本端无法知道是关闭读还是同时都关闭了）。通常对端同时都关闭了，本端第一次 write 可以成功，不过对端返回 RST，之后本端再 write 将收到 SIGPIPE 信号，如果不处理该信号，默认的操作是终止进程。捕获或者忽略该信号，从中断上下文（信号相当于软中断）回到进程上下文（中断，进程，内核三种），write 将立马返回错误 Broken Pipe.