iouring 之 Non-blocking path lookup while opening a file

目录

Non-blocking path lookup while opening a file

使用策略

背景

为什么不用 O_NONBLOCK

合入的代码

openat2的引入

合入代码引入的bug

参考



Non-blocking path lookup while opening a file

内核  5.12 引入 的 

https://lwn.net/Articles/843163/

This release supports path name lookups that will not block under any circumstance. This means that the kernel will try to resolve the path with the cached data, but if it needs to do I/O, it will return an error. This is needed for io_uring(), but support in openat2() with the RESOLVE_CACHED flag has been added, too.

Recommended LWN article: Avoiding blocking file-name lookups

使用策略

Axboe's patch set creates a new internal flag called LOOKUP_CACHED, which is then made available to callers of openat2() as RESOLVE_CACHED. This flag requests the kernel to only carry the open to completion if that can be done using only data that is cached in memory — without performing I/O, in other words. If it becomes clear during the attempt that I/O would be required, the openat2() call will fail with an EAGAIN error. The caller can then retry the operation without RESOLVE_CACHED — in a setting where blocking is tolerable — to successfully open the file.

首先在cache里面查找并打开文件,如果失败,则返回again,下次运行时,则直接IO查找并打开设备信息。

背景

    Many of those other system calls were never designed with asynchronous use in mind, so they will happily block if need be; that is something that io_uring cannot allow, since it would block the handling of other operations as well. So io_uring creates a separate kernel thread to run system calls that might block at inopportune times. That effectively makes those calls asynchronous, but at a cost: moving ring operations into a separate thread can slow execution considerably. For an operation that can be carried out using only cached data, the overhead of shifting to another thread becomes a dominant performance factor.

   很多系统调用没有被设计成异步执行。但是io_uring则设计为异步执行。

 io_uring创建了一个单独的内核线程来运行那些可能阻塞在某个不合时宜时刻的系统调用,这有效完成异步系统调用功能。 

    但是这里面有一个代价:采用单独的线程 进行ring操作可能大大地降低系统调用的执行速度。

引入这种代价的原因在于: 对于一个操作,该操作可能使用cached data就完成了,此种情况下,将此操作切换到由另外一个线程来处理,显然增加了显著的系统开销。

 如上图所示,一个sys call 系统调用本来可以直接访问cached data中的数据;但是在引入io_uring中,需要经过ring转发,都是内存的操作,导致io时间变长。 

The solution is to use this new LOOKUP_CACHED flag. Whenever an open operation is called for in io_uring, an attempt will be made to execute it directly with LOOKUP_CACHED. If that works, all is well and the operation completes successfully; otherwise, it will be pushed off to a thread and retried without LOOKUP_CACHED as before. According to Axboe, an open-heavy benchmark will run nearly three times faster if all of the necessary data is already cached.

 针对open操作,先在cache里面处理;如果在cache里面不能处理此open操作,则进入io ring中。

那么其他的操作呢?为什么单单针对open?

为什么不用 O_NONBLOCK

Another question that might come to mind is: why was the existing O_NONBLOCK flag not used for this purpose? There may be a number of reasons, but one that jumps out is that O_NONBLOCK applies to the resulting file descriptor for its entire life; all operations performed on that descriptor will (potentially, at least) be non-blocking. The RESOLVE_CACHED flag, instead, applies only to the opening of the file.

O_NONBLOCK 导致针对FD的所有操作,都是从cache里面获取;而新增的此标志,则仅仅在打开文件这一个操作过程中。

合入的补丁

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/fs?qt=author&q=Jens+Axboe 

看起来是这几个

2021-01-04fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHEDJens Axboe1-0/+6
2021-01-04fs: add support for LOOKUP_CACHEDJens Axboe1-0/+9
2021-01-04fs: make unlazy_walk() error handling consistent

但是邮件列表里面 https://lwn.net/ml/linux-fsdevel/CAHk-=wjxQOBVZiX-OD9YC1ZkA-N4tG7sjtkWApY8Rtz4gb_k6Q@mail.gmail.com/ 看起来是5个patch。

1. 基础的patch为 

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/?id=6c6ec2b0a3e0381d886d531bd1471dfdb1509237  也就是 fs: add support for LOOKUP_CACHED,添加基本的功能。短短的几行 分别对应的什么流程??

2. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs?id=99668f618062816ca7ba639b007eb145b9d3d41e   fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED   通过这个系统调用,就将这个功能暴露给用户了,但是具体如何看出差异性呢

3. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs?id=3a81fd02045c329f25e5900fa61f613c9b317644  

io_uring: enable LOOKUP_CACHED path resolution for filename lookups

综上,1是基础,2,3分别对应两个针对1的应用。

openat2的引入

https://github.com/torvalds/linux/commit/fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179

https://lwn.net/Articles/796868/

https://man7.org/linux/man-pages/man2/openat2.2.html

Looking up a file given a path name seems like a straightforward task, but it turns out to be one of the more complex things the kernel does. 在给定目录下查找一个文件 是相当复杂的一件事情。

编写测试代码,测试cached的功能。目前内核是5.4的版本,因而不能使用openat2接口,此接口是在5.6的内核引入的。

#include <fcntl.h>          /* Definition of O_* and S_* constants */
#include <linux/openat2.h>  /* Definition of RESOLVE_* constants */
#include <sys/syscall.h>    /* Definition of SYS_* constants */
#include <unistd.h>
int main()
{
   int dir_fd;  
    int fd;  
    int flags;  
    mode_t mode; 
    
  
    dir_fd = open("/home/test", O_RDONLY);  //fd参数是通过打开相对路径名所在的目录来获取。
    if (dir_fd < 0)   
    {  
        perror("open");  
        exit(-1);  
    }  
  
    flags = O_CREAT | O_TRUNC | O_RDWR;  
    mode = 0640;  //-rw-r-----
    fd = openat(dir_fd, "uring.txt", flags, mode);  
    if (fd < 0)   
    {  
        perror("openat");  
        exit(-1);  
    }  
  
    write(fd, "HELLO", 5);  
  
    close(fd);  
    close(dir_fd);  
}

合入代码引入的bug

https://patchwork.kernel.org/project/linux-fsdevel/patch/8b114189-e943-a7e6-3d31-16aa8a148da6@kernel.dk/

参考

https://lwn.net/Articles/649729/   RCU-walk: faster pathname lookup in Linux 这个功能看起来也比较有意思的。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

proware

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值