我们常用的进程(线程)间通信机制有管道,信号,消息队列,信号量,共享内存,socket等等,其中主要作为进程(线程)间通知/等待的有管道pipe和socketpair。线程还有特别的condition。
今天来看一个liunx较新的系统调用,它是从LINUX 2.6.27版本开始增加的,主要用于进程或者线程间的通信(如通知/等待机制的实现)。
首先来看一下函数原型:
- #include <sys/eventfd.h>
- int eventfd(unsigned int initval, int flags);
下面是它man手册中的描述,我照着翻译了一遍(我英语四级434,你们要是怀疑下文的话,可以直接去man eventfd ^^):
eventfd()创建了一个"eventfd object",能在用户态用做事件wait/notify机制,通过内核取唤醒用户态的事件。这个对象保存了一个内核维护的uint64_t类型的整型counter。这个counter初始值被参数initval指定,一般初值设置为0。
它的标记可以有以下属性:
EFD_CLOECEX,EFD_NONBLOCK,EFD_SEMAPHORE。
在linux直到版本2.6.26,这个flags参数是没用的,必须指定为0。
它返回了一个引用eventfd object的描述符。这个描述符可以支持以下操作:
read:如果计数值counter的值不为0,读取成功,获得到该值。如果counter的值为0,非阻塞模式,会直接返回失败,并把errno的值指纹EINVAL。如果为阻塞模式,一直会阻塞到counter为非0位置。
write:会增加8字节的整数在计数器counter上,如果counter的值达到0xfffffffffffffffe时,就会阻塞。直到counter的值被read。阻塞和非阻塞情况同上面read一样。
close:这个操作不用说了。
重点是支持这个:
poll(2), select(2) (and similar)
The returned file descriptor supports poll(2) (and analogously epoll(7)) and select(2), as follows:
* The file descriptor is readable (the select(2) readfds argument; the poll(2) POLLIN flag) if the counter has a
value greater than 0.
* The file descriptor is writable (the select(2) writefds argument; the poll(2) POLLOUT flag) if it is possible to
write a value of at least "1" without blocking.
* If an overflow of the counter value was detected, then select(2) indicates the file descriptor as being both
readable and writable, and poll(2) returns a POLLERR event. As noted above, write(2) can never overflow the
counter. However an overflow can occur if 2^64 eventfd "signal posts" were performed by the KAIO subsystem (the‐
oretically possible, but practically unlikely). If an overflow has occurred, then read(2) will return that maxi‐
mum uint64_t value (i.e., 0xffffffffffffffff).
The eventfd file descriptor also supports the other file-descriptor multiplexing APIs: pselect(2) and ppoll(2).
它的内核代码实现是这样子的:
- int eventfd_signal(struct eventfd_ctx *ctx, int n)
- {
- unsigned long flags;
- if (n < 0)
- return -EINVAL;
- spin_lock_irqsave(&ctx->wqh.lock, flags);
- if (ULLONG_MAX - ctx->count < n)
- n = (int) (ULLONG_MAX - ctx->count);
- ctx->count += n;
- if (waitqueue_active(&ctx->wqh))
- wake_up_locked_poll(&ctx->wqh, POLLIN);
- spin_unlock_irqrestore(&ctx->wqh.lock, flags);
- return n;
- }
本质就是做了一次唤醒,不用read,也不用write,与eventfd_write的区别是不用阻塞。
说了这么多,我们来看一个例子,理解理解其中的含义:
- #include <sys/eventfd.h>
- #include <unistd.h>
- #include <stdlib.h>
- #include <stdio.h>
- #include <stdint.h> /* Definition of uint64_t */
- #define handle_error(msg) \
- do { perror(msg); exit(EXIT_FAILURE); } while (0)
- int
- main(int argc, char *argv[])
- {
- int efd, j;
- uint64_t u;
- ssize_t s;
- if (argc < 2) {
- fprintf(stderr, "Usage: %s <num>...\n", argv[0]);
- exit(EXIT_FAILURE);
- }
- efd = eventfd(0, 0);
- if (efd == -1)
- handle_error("eventfd");
- switch (fork()) {
- case 0:
- for (j = 1; j < argc; j++) {
- printf("Child writing %s to efd\n", argv[j]);
- u = strtoull(argv[j], NULL, 0);
- /* strtoull() allows various bases */
- s = write(efd, &u, sizeof(uint64_t));
- if (s != sizeof(uint64_t))
- handle_error("write");
- }
- printf("Child completed write loop\n");
- exit(EXIT_SUCCESS);
- default:
- sleep(2);
- printf("Parent about to read\n");
- s = read(efd, &u, sizeof(uint64_t));
- if (s != sizeof(uint64_t))
- handle_error("read");
- printf("Parent read %llu (0x%llx) from efd\n",
- (unsigned long long) u, (unsigned long long) u);
- exit(EXIT_SUCCESS);
- case -1:
- handle_error("fork");
- }
- }
$ ./a.out 1 2 4 7 14
Child writing 1 to efd
Child writing 2 to efd
Child writing 4 to efd
Child writing 7 to efd
Child writing 14 to efd
Child completed write loop
Parent about to read
Parent read 28 (0x1c) from efd
注意:这里用了sleep(2)保证子进程循环写入完毕,得到的值就是综合28。如果不用sleep(2)来保证时序,当子进程写入一个值,父进程会立马从eventfd读出该值。