从内核patch看epoll中的边沿触发

        在 epoll 的 man 手册里其实并没有对边沿触发有一个特别清晰和明确的定义,作者也只是举了一个例子来阐述边沿触发的表现,在使用的时候还需要根据具体实现去测试它的行为。

        在 man 手册里,作者列举了这样一个场景:

Level-triggered and edge-triggered
       The epoll event distribution interface is able to behave both 
as edge-triggered (ET) and as level-triggered (LT).  The difference 
between the two mechanisms can be described as follows.  Suppose 
that this scenario happens:

1. The file descriptor that represents the read side of a pipe
          (rfd) is registered on the epoll instance.

2. A pipe writer writes 2 kB of data on the write side of the
          pipe.

3. A call to epoll_wait(2) is done that will return rfd as a
          ready file descriptor.

4. The pipe reader reads 1 kB of data from rfd.

5. A call to epoll_wait(2) is done.

        作者说,reader 在第 5 步调用 epoll_wait() 时会发生阻塞。那么现在假设还有第 6 步,也就是 writer 再向 pipe 中写入 1 KB 的数据,此时 reader 会被唤醒吗?要回答这个问题,就不得不搞明白边沿触发的真实含义。不过最靠谱的方法还是写个程序测试一下。但很遗憾,不同的内核版本,对这个问题的回答不一样,这也就是本篇博客想要介绍的内容。

        下面列举三个版本:

  • Linux 5.5-rc1 之前的版本(版本一)
  • Linux 5.5-rc1 到 Linux 5.14-rc4 之间的版本(版本二)
  • Linux 5.14-rc4 之后的版本(版本三)

        实验结果是,第 6 步中的 pipe write 行为,

  • 版本一:会唤醒 reader
  • 版本二:不会唤醒 reader
  • 版本三:会唤醒 reader

        为什么会有这种差异?哪个版本的实现才是真正的边沿触发?

        下面通过两个内核 patch 来回答这两个问题:

  • 版本一的表现是由历史原因决定的,这里就不深挖了
  • 版本二的表现是由一个 patch 决定的,1b6b26ae7053e4914181eedf70f2d92c12abda8a,提交时间是 2019-12-07,作者是 Linus
  • 版本三的表现是由另外一个 patch 决定的,3a34b13a88caeb2800ab44a4918f230041b37dd9,提交时间是 2021-07-30,作者是 Linus

        在版本二中,Linus 修改了 pipe write 的唤醒逻辑,修改前的逻辑是,pipe write 每次都会唤醒 reader,修改后的逻辑是,只有在 pipe write 之前 pipe 为空的情况下,才会唤醒 reader,修改的理由是,如果 pipe write 之前 pipe 不为空,那说明不会有阻塞的 reader,此时唤醒 reader 就是多余的操作。

        在版本二发布后,过了 10 个月,有人提出了问题(参见:https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/),说版本二的修改导致了边沿触发行为的改变。Linus 也做出了回应:

if the pipe was readable from before, then a writer adding new data to it
doesn't make it "more readable". Similarly, if a pipe was writable
before, and a reader made even more room in it, the pipe didn't get
"more writable".

So that commit removes the pointless extra wakeup calls that don't
actually make any sense (and that gave incorrect edges to the some
EPOLL case that saw an edge that didn't actually exist).

        同时,Linus 说如果这只是一个 bug 测试,并且没有真实的应用受到影响的话,就先不修复:

if this is more than just a buggy test - and it actually breaks
some actual application and real behavior - we'll need to fix it. A
regression is a regression, and we'll need to be bug-for-bug
compatible for people who depended on bugs.

But if it's only a test, and no actual workflow that got broken, then
it's just a buggy test.

        但就在半个多月前,有人报告了因上述修改而导致的安卓上的一个 bug(参见:https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/),说广泛使用的 "realm-core" 库在 Linux 5.10 上出现了问题,因为该库依赖于之前的唤醒逻辑,即版本一中的 pipe write 唤醒逻辑。Linus 也做出了回应:

I dislike the pointless wakeups, and as long as the only case I
knew of was only a test of broken behavior, it was fine. But now that
you've reported actual application breakage, this is in the "real
regression" category, and so I'll fix it one way or the other.

  ... 

This is literally an epoll() confusion about what an "edge" is.

An edge is not "somebody wrote more data". An edge is "there was no
data, now there is data".

And a level triggered event is *also* not "somebody wrote more data".
A level-triggered signal is simply "there is data".

Notice how neither edge nor level are about "more data". One is about
the edge of "no data" -> "some data", and the other is just a "data is
available".

Sadly, it seems that our old "we'll wake things up whether needed or
not" implementation ended up being something that people thought was
edge-triggered semantics.

But we have the policy that regressions aren't about documentation or
even sane behavior.

Regressions are about whether a user application broke in a noticeable way.

        因此,上述问题在版本三中修复了,最终的结果是:对于边沿触发而言,每次 pipe write 都会唤醒 reader。

        最后,通过上面的引用材料可知,在 Linus 看来,边沿触发应该是:从 “没有数据” 到 “有数据” 的一种状态变迁。也就是说,只有在 pipe write 之前 pipe 为空时,pipe write 才会唤醒 reader。但由于历史原因和历史包袱,边沿触发的实现与它的实际含义有出入。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值