1. 背景
本系列文章整体脉络回顾,
- Linux Block Driver - 1 介绍了一个只有 200 行源码的 Sampleblk 块驱动的实现。
- Linux Block Driver - 2 中,在 Sampleblk 驱动创建了 Ext4 文件系统,并做了一个
fio
顺序写测试。测试中我们利用 Linux 的各种跟踪工具,对这个fio
测试做了一个性能个性化分析。 - Linux Block Driver - 3 中,利用 Linux 跟踪工具和 Flamegraph 来对文件系统层面上的文件 IO 内部实现,有了一个概括性的了解。
- Linux Block Driver - 4 里,在之前同样的
fio
顺序写测试下,分析 Sampleblk 块设备的 IO 性能特征,大小,延迟,统计分布,IOPS,吞吐等。
本文将继续之前的实验,围绕这个简单的 fio
测试,探究 Linux 块设备驱动的运作机制。除非特别指明,本文中所有 Linux 内核源码引用都基于 4.6.0。其它内核版本可能会有较大差异。
2. 准备
阅读本文前,可能需要如下准备工作,
- 参考 Linux Block Driver - 1 中的内容,加载该驱动,格式化设备,装载 Ext4 文件系统。
- 按照 Linux Block Driver - 2 中的步骤,运行
fio
测试。
本文将在与前文完全相同 fio
测试负载下,使用 blktrace
在块设备层面对该测试做进一步的分析。
3. 使用 blktrace
blktrace(8) 是非常方便的跟踪块设备 IO 的工具。我们可以利用这个工具来分析前几篇文章中的 fio
测试时的块设备 IO 情况。
首先,在 fio
运行时,运行 blktrace
来记录指定块设备上的 IO 操作,
$ sudo blktrace /dev/sampleblk1
[sudo] password for yango:
^C=== sampleblk1 ===
CPU 0: 1168040 events, 54752 KiB data
Total: 1168040 events (dropped 0), 54752 KiB data
退出跟踪后,IO 操作的都被记录在日志文件里。可以使用 blkparse(1) 命令来解析和查看这些 IO 操作的记录。
虽然 blkparse(1) 手册给出了每个 IO 操作里的具体跟踪动作 (Trace Action) 字符的含义,但下面的表格,更近一步地包含了下面的信息,
- Trace Action 之间的时间顺序
- 每个
blkparse
的 Trace Action 对应的 Linux block tracepoints 的名字,和内核对应的 trace 函数。 - Trace Action 是否对块设备性能有正面或者负面的影响
- Trace Action 的额外说明,这个比 blkparse(1) 手册里的描述更贴近 Linux 实现
Order | Action | Linux block tracepoints | Kernel trace function | Perf impact | Description |
---|---|---|---|---|---|
1 | Q | block:block_bio_queue | trace_block_bio_queue | Neutral | Intent to queue a bio on a given reqeust_queue. No real requests exists yet. |
2 | B | block:block_bio_bounce | trace_block_bio_bounce | Negative | Pages in bio has copied to bounce buffer to avoid hardware (DMA) limits. |
3 | X | block:block_split | trace_block_split | Negative | Split a bio with smaller pieces due to underlying block device’s limits. |
4 | M | block:block_bio_backmerge | trace_block_bio_backmerge | Positive | A previously inserted request exists that ends on the boundary of where this bio begins, so IO scheduler merges them. |
5 | F | block:block_bio_frontmerge | trace_block_bio_frontmerge | Positive | Same as the back merge, except this i/o ends where a previously inserted requests starts. |
6 | S | block:block_sleeprq | trace_block_sleeprq | Negative | No available request structures were available (eg. memory pressure), so the issuer has to wait for one to be freed. |
7 | G | block:block_getrq | trace_block_getrq | Neutral | Allocated a free request struct successfully. |
8 | P | block:block_plug | trace_block_plug | Positive | I/O isn’t immediately dispatched to request_queue, instead it is held back by current process IO plug list. |
9 | I | block:block_rq_insert | trace_block_rq_insert | Neutral | A request is sent to the IO scheduler internal queue and later service by the driver. |
10 | U | block:block_unplug | trace_block_unplug | Positive | Flush queued IO request to device request_queue, could be triggered by timeout or intentionally function call. |
11 | A | block:block_rq_remap | trace_block_rq_remap | Neutral | Only used by stackable devices, for example, DM(Device Mapper) and raid driver. |
12 | D | block:block_rq_issue | trace_block_rq_issue | Neutral | Device driver code is picking up the request |
13 | C | block:block_rq_complete | trace_block_rq_complete | Neutral | A previously issued request has been completed. The output will detail the sector and size of that request. |