本文总结自西数对zonestorage的介绍,用模拟器来实现对ZNS SSD的初步模拟,原网站链接:https://zonedstorage.io/
系统要求
- 为保证系统的
Linux
内核支持ZBD
接口,必须满足两个条件:
- 内核版本是
4.10.0
或更高版本 - 启用内核编译配置选项
CONFIG_BLK_DEV_ZONED
。
内核版本
# uname -r
可用以查看内核版本
Zoned Block设备支持
# cat /boot/config-`uname -r` | grep CONFIG_BLK_DEV_ZONED
# cat /lib/modules/`uname -r`/config | grep CONFIG_BLK_DEV_ZONED
以上两条都可用来测试内核是否支持,如果输出为CONFIG_BLK_DEV_ZONED=y
则表示支持;否则输出为CONFIG_BLK_DEV_ZONED=n
- 如果您的内核通过
proc
文件系统导出配置,那么使用以下命令集之一来获取CONFIG_BLK_DEV_ZONED
的状态
# modprobe configs
# cat /proc/config.gz | gunzip | grep CONFIG_BLK_DEV_ZONED
# modprobe configs
# zcat /proc/config.gz | grep CONFIG_BLK_DEV_ZONED
命令顺序控制
为了保证Linux内核下发命令的顺序与Zone顺序写相同,所有支持分区块设备的内核都实现了一种Zone write lock mechanism
,该机制序列化对顺序分区的写操作。
对于内核版本在4.10
到4.15
之间,不需要特殊的配置,在内核版本4.16
中,Zone写锁定的实现被移到了deadline
和mq-deadline
块I/O调度程序中,必须将这个调度程序与分区块设备一起使用。
# cat /sys/block/sda/queue/scheduler
[none] mq-deadline kyber bfq
可用以查询分区磁盘的块I/O调度
如果像上面的例子一样,块I/O调度不使用mq-deadline
,则使用下面的命令来变换调度:
# echo deadline > /sys/block/sda/queue/scheduler
# cat sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none
libzbd 用户库
libzbd
是一个提供操作分区块设备函数的用户库,只允许访问正在运行的内核支持的分区块设备,包括物理设备(如支持ZBC
和ZAC
标准的硬盘)和由各种设备驱动实现的所有的逻辑块设备(如null_blk
和devicer mapper
设备驱动)。项目在github。
环境要求
- 安装
libzbd
需要以下package用于编译:
# apt-get install autoconf
# apt-get install autoconf-archive
# apt-get install automake
# apt-get install libtool
# apt-get install m4
当构建gzbd
和gzbd-viewer
图形应用时还需要GTK3
和GTK3 development header
包。
- 同时编译需要在安装了
blkzoned.h
的系统上进行,头文件安装在/usr/include/linux
下。
编译
在已下载的文件夹中,运行以下语句:
$ sh ./autogen.sh
$ ./configure
$ make
安装
sudo make install
库文件默认安装在/usr/lib
(或/usr/lib64
)下,头文件安装在/usr/include/libzbd
下
库函数
所有libzbd
函数都使用字节单位来度量与区域相关的信息。实现读访问时需与设备逻辑块大小对齐。在主机管理的分区块设备上,对顺序分区的写操作必须与设备保持一致。库函数可在/usr/include/libzbd/zbd.h
查看。
libzbd
没有多线程互斥机制,当程序没有顺序写控制时,会导致写错误。
使用man zbd
命令可查看操作手册。
使用null_blk进行分区块设备模拟
null_blk
驱动是可以模拟多种类型块设备的有力工具。
创建一个 Zoned null Block Device——最简单的例子
如下例,创建一个 用null_blk
模拟的 最简单的分区块设备的方法就是指定zone=1
,该参数应跟在命令modprobe **null_blk
后。
# modprobe null_blk nr_devices=1 zoned=1
这样就创建了一个最简单的、由主机控制的分区块设备,Zone大小为256M,总容量为256GB(包含1000个Zone)。这个简单的命令不会创建传统的Zone。
我们可以用lsblk
来查看分区块设备的简要信息:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
└─sda1 8:1 0 40G 0 part /
sr0 11:0 1 1024M 0 rom
nullb0 252:0 0 250G 0 disk
还可以用上一大步骤中安装的libzbd
来查看分区块设备的具体信息:
$ zbd report -i /dev/nullb0
Device /dev/nullb0:
Vendor ID: Unknown
Zone model: host-managed
Capacity: 268.435 GB (524288000 512-bytes sectors)
Logical blocks: 524288000 blocks of 512 B
Physical blocks: 524288000 blocks of 512 B
Zones: 1000 zones of 256.0 MB
Maximum number of open zones: no limit
Maximum number of active zones: no limit
Zone 00000: swr, ofst 00000000000000, len 00000268435456, cap 00000268435456, wp 00000000000000, em, non_seq 0, reset 0
Zone 00001: swr, ofst 00000268435456, len 00000268435456, cap 00000268435456, wp 00000268435456, em, non_seq 0, reset 0
Zone 00002: swr, ofst 00000536870912, len 00000268435456, cap 00000268435456, wp 00000536870912, em, non_seq 0, reset 0
Zone 00003: swr, ofst 00000805306368, len 00000268435456, cap 00000268435456, wp 00000805306368, em, non_seq 0, reset 0
Zone 00004: swr, ofst 00001073741824, len 00000268435456, cap 00000268435456, wp 00001073741824, em, non_seq 0, reset 0
...
Zone 00998: swr, ofst 00267898585088, len 00000268435456, cap 00000268435456, wp 00267898585088, em, non_seq 0, reset 0
Zone 00999: swr, ofst 00268167020544, len 00000268435456, cap 00000268435456, wp 00268167020544, em, non_seq 0, reset 0
删除一个由modprobe
创建(且不由configfs
创建)的模拟设备可用以下方法删除:
rmmod null_blk
创建一个 Zoned null Block Device——更高级的例子
若想在创建分区块设备时,通过modprobe
传递更多参数,可使用如下命令:
# modprobe null_blk nr_devices=1 \
zoned=1 \
zone_nr_conv=4 \
zone_size=64 \
nr_devices=1
表示仅创建一个设备;zoned=1
表示创建的所有设备都是分区设备;zone_nr_conv=4
表示传统的Zone个数为4;zone_size=64
表示每个Zone有64MB。
configfs
接口为创建模拟分区块设备提供了强大的手段。configfs
可供修改的参数如下:
# cat /sys/kernel/config/nullb/features
memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size,zone_capacity,zone_nr_conv,zone_max_open,zone_max_active,blocksize,max_sectors,virt_boundary
- 需要注意的是,不同的内核版本下,可供修改的参数是不同的,以下我用的是5.13版本的内核:
kernel | feature |
---|---|
4.10.0 | zoned |
4.10.0 | chunk_sectors |
4.20.0 | nr_zones |
5.8.0 | zone_append_max_bytes |
5.9.0 | max_open_zones |
5.9.0 | max_active_zones |
列出null_blk块设备参数
使用modinfo
指令可以列出与Zone相关的参数,这些参数可在null_blk
模型被载入后通过configfs
修改。
# modinfo null_blk
filename: /lib/modules/5.13.0-35-generic/kernel/drivers/block/null_blk/null_blk.ko
...
parm: zoned:Make device as a host-managed zoned block device. Default: false (bool)
parm: zone_size:Zone size in MB when block device is zoned. Must be power-of-two: Default: 256 (ulong)
parm: zone_capacity:Zone capacity in MB when block device is zoned. Can be less than or equal to zone size. Default: Zone size (ulong)
parm: zone_nr_conv:Number of conventional zones when block device is zoned. Default: 0 (uint)
parm: zone_max_open:Maximum number of open zones when block device is zoned. Default: 0 (no limit) (uint)
parm: zone_max_active:Maximum number of active zones when block device is zoned. Default: 0 (no limit) (uint)
configfs
接口可以用来用脚本创建具有不同zone配置的模拟zone块设备。
创建内容如下的脚本:
#!/bin/bash
if [ $# != 4 ]; then
echo "Usage: $0 <sect size (B)> <zone size (MB)> <nr conv zones> <nr seq zones>"
exit 1
fi
scriptdir=$(cd $(dirname "$0") && pwd)
modprobe null_blk nr_devices=0 || return $?
function create_zoned_nullb()
{
local nid=0
local bs=$1
local zs=$2
local nr_conv=$3
local nr_seq=$4
cap=$(( zs * (nr_conv + nr_seq) ))
while [ 1 ]; do
if [ ! -b "/dev/nullb$nid" ]; then
break
fi
nid=$(( nid + 1 ))
done
dev="/sys/kernel/config/nullb/nullb$nid"
mkdir "$dev"
echo $bs > "$dev"/blocksize
echo 0 > "$dev"/completion_nsec
echo 0 > "$dev"/irqmode
echo 2 > "$dev"/queue_mode
echo 1024 > "$dev"/hw_queue_depth
echo 1 > "$dev"/memory_backed
echo 1 > "$dev"/zoned
echo $cap > "$dev"/size
echo $zs > "$dev"/zone_size
echo $nr_conv > "$dev"/zone_nr_conv
echo 1 > "$dev"/power
echo mq-deadline > /sys/block/nullb$nid/queue/scheduler
echo "$nid"
}
nulldev=$(create_zoned_nullb $1 $2 $3 $4)
echo "Created /dev/nullb$nulldev"
运行脚本,脚本的四个参数分别为:
- 模拟设备的扇区大小(bytes)
- 模拟设备的zone大小(MiB)
- 传统zone个数
- 有顺序写限制的zone个数
运行结果如下:
# ./nullblk-zoned.sh 4096 64 4 8
Created /dev/nullb0
# zbd report -i /dev/nullb0
Device /dev/nullb0:
Vendor ID: Unknown
Zone model: host-managed
Capacity: 0.805 GB (1572864 512-bytes sectors)
Logical blocks: 196608 blocks of 4096 B
Physical blocks: 196608 blocks of 4096 B
Zones: 12 zones of 64.0 MB
Maximum number of open zones: no limit
Maximum number of active zones: no limit
Zone 00000: cnv, ofst 00000000000000, len 00000067108864, cap 00000067108864
Zone 00001: cnv, ofst 00000067108864, len 00000067108864, cap 00000067108864
Zone 00002: cnv, ofst 00000134217728, len 00000067108864, cap 00000067108864
Zone 00003: cnv, ofst 00000201326592, len 00000067108864, cap 00000067108864
Zone 00004: swr, ofst 00000268435456, len 00000067108864, cap 00000067108864, wp 00000268435456, em, non_seq 0, reset 0
Zone 00005: swr, ofst 00000335544320, len 00000067108864, cap 00000067108864, wp 00000335544320, em, non_seq 0, reset 0
Zone 00006: swr, ofst 00000402653184, len 00000067108864, cap 00000067108864, wp 00000402653184, em, non_seq 0, reset 0
Zone 00007: swr, ofst 00000469762048, len 00000067108864, cap 00000067108864, wp 00000469762048, em, non_seq 0, reset 0
Zone 00008: swr, ofst 00000536870912, len 00000067108864, cap 00000067108864, wp 00000536870912, em, non_seq 0, reset 0
Zone 00009: swr, ofst 00000603979776, len 00000067108864, cap 00000067108864, wp 00000603979776, em, non_seq 0, reset 0
Zone 00010: swr, ofst 00000671088640, len 00000067108864, cap 00000067108864, wp 00000671088640, em, non_seq 0, reset 0
Zone 00011: swr, ofst 00000738197504, len 00000067108864, cap 00000067108864, wp 00000738197504, em, non_seq 0, reset 0
用脚本创建的分区块设备的删除也需要使用脚本:
#!/bin/bash
if [ $# != 1 ]; then
echo "Usage: $0 <nullb ID>"
exit 1
fi
nid=$1
if [ ! -b "/dev/nullb$nid" ]; then
echo "/dev/nullb$nid: No such device"
exit 1
fi
echo 0 > /sys/kernel/config/nullb/nullb$nid/power
rmdir /sys/kernel/config/nullb/nullb$nid
echo "Destroyed /dev/nullb$nid"
运行结果如下:
# ./nullblk-del.sh 0
Destroyed /dev/nullb0
Zone操作
当我们创建一个分区块设备后,我们便可以对Zone进行操作,首先我们创建一个分区块设备
# ./nullblk-zoned.sh 4096 32 4 12
Created /dev/nullb0
查看它的属性
# zbd report -csv /dev/nullb0
zone num, type, ofst, len, cap, wp, cond, non_seq, reset
00000, 1, 00000000000000, 00000033554432, 00000033554432, 00000033554432, 0x0, 0, 0
00001, 1, 00000033554432, 00000033554432, 00000033554432, 00000067108864, 0x0, 0, 0
00002, 1, 00000067108864, 00000033554432, 00000033554432, 00000100663296, 0x0, 0, 0
00003, 1, 00000100663296, 00000033554432, 00000033554432, 00000134217728, 0x0, 0, 0
00004, 2, 00000134217728, 00000033554432, 00000033554432, 00000134217728, 0x1, 0, 0
00005, 2, 00000167772160, 00000033554432, 00000033554432, 00000167772160, 0x1, 0, 0
00006, 2, 00000201326592, 00000033554432, 00000033554432, 00000201326592, 0x1, 0, 0
00007, 2, 00000234881024, 00000033554432, 00000033554432, 00000234881024, 0x1, 0, 0
00008, 2, 00000268435456, 00000033554432, 00000033554432, 00000268435456, 0x1, 0, 0
00009, 2, 00000301989888, 00000033554432, 00000033554432, 00000301989888, 0x1, 0, 0
00010, 2, 00000335544320, 00000033554432, 00000033554432, 00000335544320, 0x1, 0, 0
00011, 2, 00000369098752, 00000033554432, 00000033554432, 00000369098752, 0x1, 0, 0
00012, 2, 00000402653184, 00000033554432, 00000033554432, 00000402653184, 0x1, 0, 0
00013, 2, 00000436207616, 00000033554432, 00000033554432, 00000436207616, 0x1, 0, 0
00014, 2, 00000469762048, 00000033554432, 00000033554432, 00000469762048, 0x1, 0, 0
00015, 2, 00000503316480, 00000033554432, 00000033554432, 00000503316480, 0x1, 0, 0
zbd工具实现了在很多zone上执行zone管理的功能,如下例,我们显示打开前两个顺序写的zone:
# zbd open -ofst 134217728 -len 67108864 /dev/nullb0
# zbd report /dev/nullb0
Device /dev/nullb0:
Vendor ID: Unknown
Zone model: host-managed
Capacity: 0.537 GB (1048576 512-bytes sectors)
Logical blocks: 131072 blocks of 4096 B
Physical blocks: 131072 blocks of 4096 B
Zones: 16 zones of 32.0 MB
Maximum number of open zones: no limit
Maximum number of active zones: no limit
Zone 00000: cnv, ofst 00000000000000, len 00000033554432, cap 00000033554432
Zone 00001: cnv, ofst 00000033554432, len 00000033554432, cap 00000033554432
Zone 00002: cnv, ofst 00000067108864, len 00000033554432, cap 00000033554432
Zone 00003: cnv, ofst 00000100663296, len 00000033554432, cap 00000033554432
Zone 00004: swr, ofst 00000134217728, len 00000033554432, cap 00000033554432, wp 00000134217728, oe, non_seq 0, reset 0
Zone 00005: swr, ofst 00000167772160, len 00000033554432, cap 00000033554432, wp 00000167772160, oe, non_seq 0, reset 0
Zone 00006: swr, ofst 00000201326592, len 00000033554432, cap 00000033554432, wp 00000201326592, em, non_seq 0, reset 0
Zone 00007: swr, ofst 00000234881024, len 00000033554432, cap 00000033554432, wp 00000234881024, em, non_seq 0, reset 0
Zone 00008: swr, ofst 00000268435456, len 00000033554432, cap 00000033554432, wp 00000268435456, em, non_seq 0, reset 0
Zone 00009: swr, ofst 00000301989888, len 00000033554432, cap 00000033554432, wp 00000301989888, em, non_seq 0, reset 0
Zone 00010: swr, ofst 00000335544320, len 00000033554432, cap 00000033554432, wp 00000335544320, em, non_seq 0, reset 0
Zone 00011: swr, ofst 00000369098752, len 00000033554432, cap 00000033554432, wp 00000369098752, em, non_seq 0, reset 0
Zone 00012: swr, ofst 00000402653184, len 00000033554432, cap 00000033554432, wp 00000402653184, em, non_seq 0, reset 0
Zone 00013: swr, ofst 00000436207616, len 00000033554432, cap 00000033554432, wp 00000436207616, em, non_seq 0, reset 0
Zone 00014: swr, ofst 00000469762048, len 00000033554432, cap 00000033554432, wp 00000469762048, em, non_seq 0, reset 0
Zone 00015: swr, ofst 00000503316480, len 00000033554432, cap 00000033554432, wp 00000503316480, em, non_seq 0, reset 0
使用dd
指令向第一个顺序写的zone写32MB使其变为“full”状态
# dd if=/dev/zero of=/dev/nullb0 oflag=direct bs=1M count=32 seek=128
记录了32+0 的读入
记录了32+0 的写出
33554432字节(34 MB,32 MiB)已复制,0.0266736 s,1.3 GB/s
# zbd report -i /dev/nullb0
Device /dev/nullb0:
Vendor ID: Unknown
Zone model: host-managed
Capacity: 0.537 GB (1048576 512-bytes sectors)
Logical blocks: 131072 blocks of 4096 B
Physical blocks: 131072 blocks of 4096 B
Zones: 16 zones of 32.0 MB
Maximum number of open zones: no limit
Maximum number of active zones: no limit
Zone 00000: cnv, ofst 00000000000000, len 00000033554432, cap 00000033554432
Zone 00001: cnv, ofst 00000033554432, len 00000033554432, cap 00000033554432
Zone 00002: cnv, ofst 00000067108864, len 00000033554432, cap 00000033554432
Zone 00003: cnv, ofst 00000100663296, len 00000033554432, cap 00000033554432
Zone 00004: swr, ofst 00000134217728, len 00000033554432, cap 00000033554432, wp 00000167772160, fu, non_seq 0, reset 0
Zone 00005: swr, ofst 00000167772160, len 00000033554432, cap 00000033554432, wp 00000167772160, em, non_seq 0, reset 0
Zone 00006: swr, ofst 00000201326592, len 00000033554432, cap 00000033554432, wp 00000201326592, em, non_seq 0, reset 0
Zone 00007: swr, ofst 00000234881024, len 00000033554432, cap 00000033554432, wp 00000234881024, em, non_seq 0, reset 0
Zone 00008: swr, ofst 00000268435456, len 00000033554432, cap 00000033554432, wp 00000268435456, em, non_seq 0, reset 0
Zone 00009: swr, ofst 00000301989888, len 00000033554432, cap 00000033554432, wp 00000301989888, em, non_seq 0, reset 0
Zone 00010: swr, ofst 00000335544320, len 00000033554432, cap 00000033554432, wp 00000335544320, em, non_seq 0, reset 0
Zone 00011: swr, ofst 00000369098752, len 00000033554432, cap 00000033554432, wp 00000369098752, em, non_seq 0, reset 0
Zone 00012: swr, ofst 00000402653184, len 00000033554432, cap 00000033554432, wp 00000402653184, em, non_seq 0, reset 0
Zone 00013: swr, ofst 00000436207616, len 00000033554432, cap 00000033554432, wp 00000436207616, em, non_seq 0, reset 0
Zone 00014: swr, ofst 00000469762048, len 00000033554432, cap 00000033554432, wp 00000469762048, em, non_seq 0, reset 0
Zone 00015: swr, ofst 00000503316480, len 00000033554432, cap 00000033554432, wp 00000503316480, em, non_seq 0, reset 0
可以看到第一个顺序写的zone已经变为“full”状态了。
图形界面
使用gzbd
图形界面也可以操控zone的状态,还可以用gzbd-viewer
来查看状态,在我们显式打开前两个顺序写的zone之后,两个工具显示如下。