io_uring, SCM_RIGHTS, and reference-count cycles

The io_uring mechanism that was described here in January has been through a number of revisions since then; those changes have generally been fixing implementation issues rather than changing the user-space API. In particular, this patch set seems to have received more than the usual amount of security-related review, which can only be a good thing. Security concerns became a bit of an obstacle for io_uring, though, when virtual filesystem (VFS) maintainer Al Viro threatened to veto the merging of the whole thing. It turns out that there were some reference-counting issues that required his unique experience to straighten out.
今年一月介绍过的 io_uring 机制在此之后已经经历了多轮修改;这些更改主要是修复实现层面的问题,而不是改变用户空间的 API。尤其值得注意的是,该补丁集获得了比以往更多的安全性方面审查,这无疑是件好事。然而,安全问题也在一定程度上成为 io_uring 推进的障碍,虚拟文件系统(VFS)维护者 Al Viro 一度威胁要否决整个合并工作。最终发现,一些引用计数相关的问题需要借助他独有的经验才能解决。

The VFS layer is a complicated beast; it must manage the complexities of the filesystem namespace in a way that provides the highest possible performance while maintaining security and correctness. Achieving that requires making use of almost all of the locking and concurrency-management mechanisms that the kernel offers, plus a couple more implemented internally. It is fair to say that the number of kernel developers who thoroughly understand how it works is extremely small; indeed, sometimes it seems like Viro is the only one with the full picture.
VFS 层是个非常复杂的系统;它必须在确保安全性和正确性的前提下,以尽可能高的性能来管理文件系统命名空间的各种复杂性。为了实现这一目标,VFS 使用了几乎所有内核提供的锁机制和并发控制手段,甚至还实现了几个内部专用的机制。可以说,真正彻底理解 VFS 工作原理的内核开发者少之又少,有时甚至感觉只有 Viro 一人掌握全貌。

In keeping with time-honored kernel tradition, little of this complexity is documented, so when Viro gets a moment to write down how some of it works, it's worth paying attention. In a long "brain dump", Viro described how file reference counts are managed, how reference-count cycles can come about, and what the kernel does to break them. For those with the time to beat their brains against it for a while, Viro's explanation (along with a few corrections) is well worth reading. For the rest of us, a lighter version follows.
按照内核一贯的传统,这些复杂细节鲜有文档记录。因此当 Viro 抽出时间写下一些相关机制的工作原理时,确实值得认真阅读。在一篇冗长的“头脑倾倒”式文章中,Viro 讲解了内核如何管理文件引用计数、引用计数循环是如何形成的,以及内核又是如何打破这些循环的。对于有耐心花时间深入钻研的读者来说,这篇解释(包括其中的一些修正)非常值得一读。对于其余读者,这里提供一个简化版的内容。

Reference counts for file structures
文件结构的引用计数

The Linux kernel uses the file structure to represent an open file. Every open file descriptor in user space is represented by a file structure in the kernel; in essence, a file descriptor is an index into a table in struct files_struct, where a pointer to the file structure can be found. There is a fair amount of information kept in the file structure, including the current position within the file, the access mode, the file_operations structure, a private_data pointer for use by lower-level code, and more.
Linux 内核使用 file 结构来表示一个打开的文件。用户空间中每个打开的文件描述符在内核中对应一个 file 结构;本质上,文件描述符是 struct files_struct 中一个表项的索引,该表项中保存了指向对应 file 结构的指针。file 结构中保存了大量信息,包括当前文件偏移、访问模式、file_operat

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

mounter625

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值