提交一个内核patch:add plugging support for punt bio

提交patch,在连续多次提交bio的场景下增加plug的使用, 可提高IO性能。

https://lkml.org/lkml/2020/9/10/79 patch已被采纳。

The test and the explaination of the patch as bellow.

Before test we added more debug code in blkg_async_bio_workfn():
	int count = 0
	if (bios.head && bios.head->bi_next) {
		need_plug = true;
		blk_start_plug(&plug);
	}
	while ((bio = bio_list_pop(&bios))) {
		/*io_punt is a sysctl user interface to control the print*/
		if(io_punt) {
			printk("[%s:%d] bio start,size:%llu,%d count=%d plug?%d\n",
				current->comm, current->pid, bio->bi_iter.bi_sector,
				(bio->bi_iter.bi_size)>>9, count++, need_plug);
		}
		submit_bio(bio);
	}
	if (need_plug)
		blk_finish_plug(&plug);

Steps that need to be set to trigger *PUNT* io before testing:
	mount -t btrfs -o compress=lzo /dev/sda6 /btrfs
	mount -t cgroup2 nodev /cgroup2
	mkdir /cgroup2/cg3
	echo "+io" > /cgroup2/cgroup.subtree_control
	echo "8:0 wbps=1048576000" > /cgroup2/cg3/io.max #1000M/s
	echo $$ > /cgroup2/cg3/cgroup.procs

Then use dd command to test btrfs PUNT io in current shell:
	dd if=/dev/zero of=/btrfs/file bs=64K count=100000

Test hardware environment as below:
	[root@localhost btrfs]# lscpu
	Architecture:          x86_64
	CPU op-mode(s):        32-bit, 64-bit
	Byte Order:            Little Endian
	CPU(s):                32
	On-line CPU(s) list:   0-31
	Thread(s) per core:    2
	Core(s) per socket:    8
	Socket(s):             2
	NUMA node(s):          2
	Vendor ID:             GenuineIntel

With above debug code, test command and test environment, I did the
tests under 3 different system loads, which are triggered by stress:
1, Run 64 threads by command "stress -c 64 &"
	[53615.975974] [kworker/u66:18:1490] bio start,size:45583056,8 count=0 plug?1
	[53615.975980] [kworker/u66:18:1490] bio start,size:45583064,8 count=1 plug?1
	[53615.975984] [kworker/u66:18:1490] bio start,size:45583072,8 count=2 plug?1
	[53615.975987] [kworker/u66:18:1490] bio start,size:45583080,8 count=3 plug?1
	[53615.975990] [kworker/u66:18:1490] bio start,size:45583088,8 count=4 plug?1
	[53615.975993] [kworker/u66:18:1490] bio start,size:45583096,8 count=5 plug?1
	... ...
	[53615.977041] [kworker/u66:18:1490] bio start,size:45585480,8 count=303 plug?1
	[53615.977044] [kworker/u66:18:1490] bio start,size:45585488,8 count=304 plug?1
	[53615.977047] [kworker/u66:18:1490] bio start,size:45585496,8 count=305 plug?1
	[53615.977050] [kworker/u66:18:1490] bio start,size:45585504,8 count=306 plug?1
	[53615.977053] [kworker/u66:18:1490] bio start,size:45585512,8 count=307 plug?1
	[53615.977056] [kworker/u66:18:1490] bio start,size:45585520,8 count=308 plug?1
	[53615.977058] [kworker/u66:18:1490] bio start,size:45585528,8 count=309 plug?1

2, Run 32 threads by command "stress -c 32 &"
	[50586.290521] [kworker/u66:6:32351] bio start,size:45806496,8 count=0 plug?1
	[50586.290526] [kworker/u66:6:32351] bio start,size:45806504,8 count=1 plug?1
	[50586.290529] [kworker/u66:6:32351] bio start,size:45806512,8 count=2 plug?1
	[50586.290531] [kworker/u66:6:32351] bio start,size:45806520,8 count=3 plug?1
	[50586.290533] [kworker/u66:6:32351] bio start,size:45806528,8 count=4 plug?1
	[50586.290535] [kworker/u66:6:32351] bio start,size:45806536,8 count=5 plug?1
	... ...
	[50586.299640] [kworker/u66:5:32350] bio start,size:45808576,8 count=252 plug?1
	[50586.299643] [kworker/u66:5:32350] bio start,size:45808584,8 count=253 plug?1
	[50586.299646] [kworker/u66:5:32350] bio start,size:45808592,8 count=254 plug?1
	[50586.299649] [kworker/u66:5:32350] bio start,size:45808600,8 count=255 plug?1
	[50586.299652] [kworker/u66:5:32350] bio start,size:45808608,8 count=256 plug?1
	[50586.299663] [kworker/u66:5:32350] bio start,size:45808616,8 count=257 plug?1
	[50586.299665] [kworker/u66:5:32350] bio start,size:45808624,8 count=258 plug?1
	[50586.299668] [kworker/u66:5:32350] bio start,size:45808632,8 count=259 plug?1

3, Don't run thread by stress
	[50861.355246] [kworker/u66:19:32376] bio start,size:13544504,8 count=0 plug?0
	[50861.355288] [kworker/u66:19:32376] bio start,size:13544512,8 count=0 plug?0
	[50861.355322] [kworker/u66:19:32376] bio start,size:13544520,8 count=0 plug?0
	[50861.355353] [kworker/u66:19:32376] bio start,size:13544528,8 count=0 plug?0
	[50861.355392] [kworker/u66:19:32376] bio start,size:13544536,8 count=0 plug?0
	[50861.355431] [kworker/u66:19:32376] bio start,size:13544544,8 count=0 plug?0
	[50861.355468] [kworker/u66:19:32376] bio start,size:13544552,8 count=0 plug?0
	[50861.355499] [kworker/u66:19:32376] bio start,size:13544560,8 count=0 plug?0
	[50861.355532] [kworker/u66:19:32376] bio start,size:13544568,8 count=0 plug?0
	[50861.355575] [kworker/u66:19:32376] bio start,size:13544576,8 count=0 plug?0
	[50861.355618] [kworker/u66:19:32376] bio start,size:13544584,8 count=0 plug?0
	[50861.355659] [kworker/u66:19:32376] bio start,size:13544592,8 count=0 plug?0
	[50861.355740] [kworker/u66:0:32346] bio start,size:13544600,8 count=0 plug?1
	[50861.355748] [kworker/u66:0:32346] bio start,size:13544608,8 count=1 plug?1
	[50861.355962] [kworker/u66:2:32347] bio start,size:13544616,8 count=0 plug?0
	[50861.356272] [kworker/u66:7:31962] bio start,size:13544624,8 count=0 plug?0
	[50861.356446] [kworker/u66:7:31962] bio start,size:13544632,8 count=0 plug?0
	[50861.356567] [kworker/u66:7:31962] bio start,size:13544640,8 count=0 plug?0
	[50861.356707] [kworker/u66:19:32376] bio start,size:13544648,8 count=0 plug?0
	[50861.356748] [kworker/u66:15:32355] bio start,size:13544656,8 count=0 plug?0
	[50861.356825] [kworker/u66:17:31970] bio start,size:13544664,8 count=0 plug?0

Analysis of above 3 test results with different system load:
From above test, we can see more and more continuous bios can be plugged
with system load increasing. When run "stress -c 64 &", 310 continuous
bios are plugged; When run "stress -c 32 &", 260 continuous bios are
plugged; When don't run stress, at most only 2 continuous bios are
plugged, in most cases, bio_list only contains one single bio.

How to explain above phenomenon:
We know, in submit_bio(), if the bio is a REQ_CGROUP_PUNT io, it will
queue a work to workqueue blkcg_punt_bio_wq. But when the workqueue is
scheduled, it depends on the system load.  When system load is low, the
workqueue will be quickly scheduled, and the bio in bio_list will be
quickly processed in blkg_async_bio_workfn(), so there is less chance
that the same io submit thread can add multiple continuous bios to
bio_list before workqueue is scheduled to run. The analysis aligned with
above test "3".
When system load is high, there is some delay before the workqueue can
be scheduled to run, the higher the system load the greater the delay.
So there is more chance that the same io submit thread can add multiple
continuous bios to bio_list. Then when the workqueue is scheduled to run,
there are more continuous bios in bio_list, which will be processed in
blkg_async_bio_workfn(). The analysis aligned with above test "1" and "2".

According to test, we can get io performance improved with the patch,
especially when system load is higher. Another optimazition is to use
the plug only when bio_list contains at least 2 bios.

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
---
 block/blk-cgroup.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index c195365c9..f35a205d5 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -119,6 +119,8 @@ static void blkg_async_bio_workfn(struct work_struct *work)
 					     async_bio_work);
 	struct bio_list bios = BIO_EMPTY_LIST;
 	struct bio *bio;
+	struct blk_plug plug;
+	bool need_plug = false;
 
 	/* as long as there are pending bios, @blkg can't go away */
 	spin_lock_bh(&blkg->async_bio_lock);
@@ -126,8 +128,15 @@ static void blkg_async_bio_workfn(struct work_struct *work)
 	bio_list_init(&blkg->async_bios);
 	spin_unlock_bh(&blkg->async_bio_lock);
 
+	/* start plug only when bio_list contains at least 2 bios */
+	if (bios.head && bios.head->bi_next) {
+		need_plug = true;
+		blk_start_plug(&plug);
+	}
 	while ((bio = bio_list_pop(&bios)))
 		submit_bio(bio);
+	if (need_plug)
+		blk_finish_plug(&plug);
 }
 
 /**
-- 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: MD是Linux内核中的一个模块,可实现磁盘阵列的软件级RAID,它和RAID0、RAID1、RAID4、RAID5、RAID6一样,就是一种磁盘阵列方案。 MD中最主要的部分是驱动程序,它运行在内核态中。它将多个磁盘设备组合在一起,成为一个逻辑设备,该逻辑设备对应着一个块设备文件。在这个逻辑设备上,可实现磁盘阵列的软件级RAID功能。 MD驱动程序的主要源代码是在/drivers/md目录下的md.c文件中,它包括了MD的全部源代码,还有一些其他相关文件,比如raid5.c等。 在这个文件中,最值得学习的是内核的模块化编程思想。模块化编程是一种将代码划分为模块的软件设计方法,通过将代码划分为不同的模块,实现代码的解耦、可重用、可维护性等目标。 在MD.c中我们还可以看到内核中的锁、内存管理等基本的内核技术的应用。通过对MD.c进行源代码解读,能够深入了解Linux内核的实现原理,特别是MD的RAID功能的实现,对于我们进一步学习Linux内核的相关知识和对其进行应用开发具有很大的帮助。 总之,通过对MD.c源代码的解读,我们可以学习到Linux内核模块化编程思想、内存管理、锁机制等基本内核技术,进一步掌握Linux内核的实现原理,从而在Linux应用开发中更加熟练娴熟。 ### 回答2: MD(Multiple Devices)是一种常用的软件RAID方案,可以在Linux内核中实现,同时也是Linux内核中最基本的RAID模式之一。MD在实现中使用了驱动程序和用户空间工具,其中驱动程序包含在内核中,因此我们需要对MD的源代码进行解读。 MD的源代码是由C语言编写的,主要包含在drivers/md/目录下。在这个目录下,可以看到一些重要的文件和子目录,例如md.c、md.h、raid1.c、raid5.c等。这些文件和子目录定义了MD的基本结构和函数,如磁盘阵列的基本信息结构、磁盘块的操作函数等。 MD的实现思路比较清晰,可以简单地理解为将多个物理磁盘组合在一起,形成一个虚拟的块设备。在这个虚拟的块设备上,可以进行读写等操作,而具体的数据操作则由MD提供的不同RAID模式实现。例如,MD支持的RAID1模式就是将数据同步写入两个物理磁盘,以实现磁盘容错。而MD支持的RAID5模式则是将数据分散写入多个物理磁盘,通过奇偶校验等方式实现磁盘容错。 在MD的源代码解读过程中,需要重点关注这些RAID模式的实现方式和相关函数。同时,还需要了解整个MD的插入和移除机制、数据恢复机制等,以便更好地理解和修改MD的源代码。 总之,对于想要深入了解Linux内核中RAID相关实现的开发者来说,对MD的源代码进行解读是一个非常有价值的学习和探索过程。 ### 回答3: md是linux内核中的一个重要模块,支持多种存储设备,包括硬盘、闪存和网络存储等。如果想要深入了解linux内核的运行机制,就必须掌握md的源代码。下面就对md源代码进行解读。 md源代码的核心是md.c文件。这个文件中定义了md模块的核心函数,包括md_init()、md_run()和md_stop()等。其中md_init()函数主要负责初始化md模块的各个子系统,包括raid核心、hotplugging、proc文件系统和sysfs文件系统等。md_run()函数则是md模块的主要循环,负责轮询设备状态并执行相应的IO操作。md_stop()函数则是md模块的关闭函数,用于释放模块占用的各种资源。 除了md.c文件外,md模块的代码还包括一些关键性质的文件,例如mddev.c、md.h和md_u.h等。其中,mddev.c文件定义了md设备的数据结构,包括磁盘阵列、线性设备和伪设备等。md.h和md_u.h文件则分别定义了用户空间和内核空间的md控制接口,包括创建和删除设备、添加和删除磁盘等。 在理解md源代码时需要注意的是,md模块涉及到多个子系统,包括块设备、文件系统和RAID等,因此需要对这些子系统的工作原理和相互关系有清晰的理解。同时,由于md模块的代码相当复杂,需要仔细地阅读和调试,才能完成内核的定制和优化工作。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值