在 epoll 的 man 手册里其实并没有对边沿触发有一个特别清晰和明确的定义,作者也只是举了一个例子来阐述边沿触发的表现,在使用的时候还需要根据具体实现去测试它的行为。
在 man 手册里,作者列举了这样一个场景:
Level-triggered and edge-triggered The epoll event distribution interface is able to behave both as edge-triggered (ET) and as level-triggered (LT). The difference between the two mechanisms can be described as follows. Suppose that this scenario happens: 1. The file descriptor that represents the read side of a pipe (rfd) is registered on the epoll instance. 2. A pipe writer writes 2 kB of data on the write side of the pipe. 3. A call to epoll_wait(2) is done that will return rfd as a ready file descriptor. 4. The pipe reader reads 1 kB of data from rfd. 5. A call to epoll_wait(2) is done.
作者说,reader 在第 5 步调用 epoll_wait() 时会发生阻塞。那么现在假设还有第 6 步,也就是 writer 再向 pipe 中写入 1 KB 的数据,此时 reader 会被唤醒吗?要回答这个问题,就不得不搞明白边沿触发的真实含义。不过最靠谱的方法还是写个程序测试一下。但很遗憾,不同的内核版本,对这个问题的回答不一样,这也就是本篇博客想要介绍的内容。
下面列举三个版本:
- Linux 5.5-rc1 之前的版本(版本一)
- Linux 5.5-rc1 到 Linux 5.14-rc4 之间的版本(版本二)
- Linux 5.14-rc4 之后的版本(版本三)
实验结果是,第 6 步中的 pipe write 行为,
- 版本一:会唤醒 reader
- 版本二:不会唤醒 reader
- 版本三:会唤醒 reader
为什么会有这种差异?哪个版本的实现才是真正的边沿触发?
下面通过两个内核 patch 来回答这两个问题:
- 版本一的表现是由历史原因决定的,这里就不深挖了
- 版本二的表现是由一个 patch 决定的,1b6b26ae7053e4914181eedf70f2d92c12abda8a,提交时间是 2019-12-07,作者是 Linus
- 版本三的表现是由另外一个 patch 决定的,3a34b13a88caeb2800ab44a4918f230041b37dd9,提交时间是 2021-07-30,作者是 Linus
在版本二中,Linus 修改了 pipe write 的唤醒逻辑,修改前的逻辑是,pipe write 每次都会唤醒 reader,修改后的逻辑是,只有在 pipe write 之前 pipe 为空的情况下,才会唤醒 reader,修改的理由是,如果 pipe write 之前 pipe 不为空,那说明不会有阻塞的 reader,此时唤醒 reader 就是多余的操作。
在版本二发布后,过了 10 个月,有人提出了问题(参见:https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/),说版本二的修改导致了边沿触发行为的改变。Linus 也做出了回应:
if the pipe was readable from before, then a writer adding new data to it doesn't make it "more readable". Similarly, if a pipe was writable before, and a reader made even more room in it, the pipe didn't get "more writable". So that commit removes the pointless extra wakeup calls that don't actually make any sense (and that gave incorrect edges to the some EPOLL case that saw an edge that didn't actually exist).
同时,Linus 说如果这只是一个 bug 测试,并且没有真实的应用受到影响的话,就先不修复:
if this is more than just a buggy test - and it actually breaks some actual application and real behavior - we'll need to fix it. A regression is a regression, and we'll need to be bug-for-bug compatible for people who depended on bugs. But if it's only a test, and no actual workflow that got broken, then it's just a buggy test.
但就在半个多月前,有人报告了因上述修改而导致的安卓上的一个 bug(参见:https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/),说广泛使用的 "realm-core" 库在 Linux 5.10 上出现了问题,因为该库依赖于之前的唤醒逻辑,即版本一中的 pipe write 唤醒逻辑。Linus 也做出了回应:
I dislike the pointless wakeups, and as long as the only case I knew of was only a test of broken behavior, it was fine. But now that you've reported actual application breakage, this is in the "real regression" category, and so I'll fix it one way or the other.
...
This is literally an epoll() confusion about what an "edge" is. An edge is not "somebody wrote more data". An edge is "there was no data, now there is data". And a level triggered event is *also* not "somebody wrote more data". A level-triggered signal is simply "there is data". Notice how neither edge nor level are about "more data". One is about the edge of "no data" -> "some data", and the other is just a "data is available". Sadly, it seems that our old "we'll wake things up whether needed or not" implementation ended up being something that people thought was edge-triggered semantics. But we have the policy that regressions aren't about documentation or even sane behavior. Regressions are about whether a user application broke in a noticeable way.
因此,上述问题在版本三中修复了,最终的结果是:对于边沿触发而言,每次 pipe write 都会唤醒 reader。
最后,通过上面的引用材料可知,在 Linus 看来,边沿触发应该是:从 “没有数据” 到 “有数据” 的一种状态变迁。也就是说,只有在 pipe write 之前 pipe 为空时,pipe write 才会唤醒 reader。但由于历史原因和历史包袱,边沿触发的实现与它的实际含义有出入。