用null_blk工具来实现模拟分区块设备

本文总结自西数对zonestorage的介绍,用模拟器来实现对ZNS SSD的初步模拟,原网站链接:https://zonedstorage.io/

系统要求

  • 为保证系统的Linux内核支持ZBD接口,必须满足两个条件:
  1. 内核版本是4.10.0或更高版本
  2. 启用内核编译配置选项CONFIG_BLK_DEV_ZONED
内核版本
# uname -r

可用以查看内核版本

Zoned Block设备支持
# cat /boot/config-`uname -r` | grep CONFIG_BLK_DEV_ZONED
# cat /lib/modules/`uname -r`/config | grep CONFIG_BLK_DEV_ZONED

以上两条都可用来测试内核是否支持,如果输出为CONFIG_BLK_DEV_ZONED=y则表示支持;否则输出为CONFIG_BLK_DEV_ZONED=n

  • 如果您的内核通过proc文件系统导出配置,那么使用以下命令集之一来获取CONFIG_BLK_DEV_ZONED的状态
# modprobe configs
# cat /proc/config.gz | gunzip | grep CONFIG_BLK_DEV_ZONED
# modprobe configs
# zcat /proc/config.gz | grep CONFIG_BLK_DEV_ZONED
命令顺序控制

为了保证Linux内核下发命令的顺序与Zone顺序写相同,所有支持分区块设备的内核都实现了一种Zone write lock mechanism,该机制序列化对顺序分区的写操作。
对于内核版本在4.104.15之间,不需要特殊的配置,在内核版本4.16中,Zone写锁定的实现被移到了deadlinemq-deadline块I/O调度程序中,必须将这个调度程序与分区块设备一起使用。

# cat /sys/block/sda/queue/scheduler
[none] mq-deadline kyber bfq

可用以查询分区磁盘的块I/O调度
如果像上面的例子一样,块I/O调度不使用mq-deadline,则使用下面的命令来变换调度:

# echo deadline > /sys/block/sda/queue/scheduler
# cat sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none

libzbd 用户库

libzbd是一个提供操作分区块设备函数的用户库,只允许访问正在运行的内核支持的分区块设备,包括物理设备(如支持ZBCZAC标准的硬盘)和由各种设备驱动实现的所有的逻辑块设备(如null_blkdevicer mapper设备驱动)。项目在github

环境要求
  • 安装libzbd需要以下package用于编译:
# apt-get install autoconf
# apt-get install autoconf-archive
# apt-get install automake
# apt-get install libtool
# apt-get install m4

当构建gzbdgzbd-viewer图形应用时还需要GTK3GTK3 development header包。

  • 同时编译需要在安装了blkzoned.h的系统上进行,头文件安装在/usr/include/linux下。
编译

在已下载的文件夹中,运行以下语句:

$ sh ./autogen.sh
$ ./configure
$ make
安装
sudo make install

库文件默认安装在/usr/lib(或/usr/lib64)下,头文件安装在/usr/include/libzbd

库函数

所有libzbd函数都使用字节单位来度量与区域相关的信息。实现读访问时需与设备逻辑块大小对齐。在主机管理的分区块设备上,对顺序分区的写操作必须与设备保持一致。库函数可在/usr/include/libzbd/zbd.h查看。
libzbd没有多线程互斥机制,当程序没有顺序写控制时,会导致写错误。
使用man zbd命令可查看操作手册。

使用null_blk进行分区块设备模拟

null_blk驱动是可以模拟多种类型块设备的有力工具。

创建一个 Zoned null Block Device——最简单的例子

如下例,创建一个 用null_blk模拟的 最简单的分区块设备的方法就是指定zone=1,该参数应跟在命令modprobe **null_blk后。

# modprobe null_blk nr_devices=1 zoned=1

这样就创建了一个最简单的、由主机控制的分区块设备,Zone大小为256M,总容量为256GB(包含1000个Zone)。这个简单的命令不会创建传统的Zone。
我们可以用lsblk来查看分区块设备的简要信息:

$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0    40G  0 disk 
└─sda1   8:1    0    40G  0 part /
sr0     11:0    1  1024M  0 rom  
nullb0 252:0    0   250G  0 disk 

还可以用上一大步骤中安装的libzbd来查看分区块设备的具体信息:

$ zbd report -i /dev/nullb0
Device /dev/nullb0:
    Vendor ID: Unknown
    Zone model: host-managed
    Capacity: 268.435 GB (524288000 512-bytes sectors)
    Logical blocks: 524288000 blocks of 512 B
    Physical blocks: 524288000 blocks of 512 B
    Zones: 1000 zones of 256.0 MB
    Maximum number of open zones: no limit
    Maximum number of active zones: no limit
Zone 00000: swr, ofst 00000000000000, len 00000268435456, cap 00000268435456, wp 00000000000000, em, non_seq 0, reset 0
Zone 00001: swr, ofst 00000268435456, len 00000268435456, cap 00000268435456, wp 00000268435456, em, non_seq 0, reset 0
Zone 00002: swr, ofst 00000536870912, len 00000268435456, cap 00000268435456, wp 00000536870912, em, non_seq 0, reset 0
Zone 00003: swr, ofst 00000805306368, len 00000268435456, cap 00000268435456, wp 00000805306368, em, non_seq 0, reset 0
Zone 00004: swr, ofst 00001073741824, len 00000268435456, cap 00000268435456, wp 00001073741824, em, non_seq 0, reset 0
...
Zone 00998: swr, ofst 00267898585088, len 00000268435456, cap 00000268435456, wp 00267898585088, em, non_seq 0, reset 0
Zone 00999: swr, ofst 00268167020544, len 00000268435456, cap 00000268435456, wp 00268167020544, em, non_seq 0, reset 0

删除一个由modprobe创建(且不由configfs创建)的模拟设备可用以下方法删除:

rmmod null_blk
创建一个 Zoned null Block Device——更高级的例子

若想在创建分区块设备时,通过modprobe传递更多参数,可使用如下命令:

# modprobe null_blk nr_devices=1 \
    zoned=1 \
    zone_nr_conv=4 \
    zone_size=64 \

nr_devices=1表示仅创建一个设备;zoned=1表示创建的所有设备都是分区设备;zone_nr_conv=4表示传统的Zone个数为4;zone_size=64表示每个Zone有64MB。
configfs接口为创建模拟分区块设备提供了强大的手段。configfs可供修改的参数如下:

# cat /sys/kernel/config/nullb/features
memory_backed,discard,bandwidth,cache,badblocks,zoned,zone_size,zone_capacity,zone_nr_conv,zone_max_open,zone_max_active,blocksize,max_sectors,virt_boundary
  • 需要注意的是,不同的内核版本下,可供修改的参数是不同的,以下我用的是5.13版本的内核:
kernelfeature
4.10.0zoned
4.10.0chunk_sectors
4.20.0nr_zones
5.8.0zone_append_max_bytes
5.9.0max_open_zones
5.9.0max_active_zones
列出null_blk块设备参数

使用modinfo指令可以列出与Zone相关的参数,这些参数可在null_blk模型被载入后通过configfs修改。

# modinfo null_blk
filename:       /lib/modules/5.13.0-35-generic/kernel/drivers/block/null_blk/null_blk.ko
...
parm:           zoned:Make device as a host-managed zoned block device. Default: false (bool)
parm:           zone_size:Zone size in MB when block device is zoned. Must be power-of-two: Default: 256 (ulong)
parm:           zone_capacity:Zone capacity in MB when block device is zoned. Can be less than or equal to zone size. Default: Zone size (ulong)
parm:           zone_nr_conv:Number of conventional zones when block device is zoned. Default: 0 (uint)
parm:           zone_max_open:Maximum number of open zones when block device is zoned. Default: 0 (no limit) (uint)
parm:           zone_max_active:Maximum number of active zones when block device is zoned. Default: 0 (no limit) (uint)

configfs接口可以用来用脚本创建具有不同zone配置的模拟zone块设备。
创建内容如下的脚本:

#!/bin/bash

if [ $# != 4 ]; then
        echo "Usage: $0 <sect size (B)> <zone size (MB)> <nr conv zones> <nr seq zones>"
        exit 1
fi

scriptdir=$(cd $(dirname "$0") && pwd)

modprobe null_blk nr_devices=0 || return $?

function create_zoned_nullb()
{
        local nid=0
        local bs=$1
        local zs=$2
        local nr_conv=$3
        local nr_seq=$4

        cap=$(( zs * (nr_conv + nr_seq) ))

        while [ 1 ]; do
                if [ ! -b "/dev/nullb$nid" ]; then
                        break
                fi
                nid=$(( nid + 1 ))
        done

        dev="/sys/kernel/config/nullb/nullb$nid"
        mkdir "$dev"

        echo $bs > "$dev"/blocksize
        echo 0 > "$dev"/completion_nsec
        echo 0 > "$dev"/irqmode
        echo 2 > "$dev"/queue_mode
        echo 1024 > "$dev"/hw_queue_depth
        echo 1 > "$dev"/memory_backed
        echo 1 > "$dev"/zoned

        echo $cap > "$dev"/size
        echo $zs > "$dev"/zone_size
        echo $nr_conv > "$dev"/zone_nr_conv

        echo 1 > "$dev"/power

        echo mq-deadline > /sys/block/nullb$nid/queue/scheduler

        echo "$nid"
}

nulldev=$(create_zoned_nullb $1 $2 $3 $4)
echo "Created /dev/nullb$nulldev"

运行脚本,脚本的四个参数分别为:

  1. 模拟设备的扇区大小(bytes)
  2. 模拟设备的zone大小(MiB)
  3. 传统zone个数
  4. 有顺序写限制的zone个数
    运行结果如下:
# ./nullblk-zoned.sh 4096 64 4 8 
Created /dev/nullb0
# zbd report -i /dev/nullb0
Device /dev/nullb0:
    Vendor ID: Unknown
    Zone model: host-managed
    Capacity: 0.805 GB (1572864 512-bytes sectors)
    Logical blocks: 196608 blocks of 4096 B
    Physical blocks: 196608 blocks of 4096 B
    Zones: 12 zones of 64.0 MB
    Maximum number of open zones: no limit
    Maximum number of active zones: no limit
Zone 00000: cnv, ofst 00000000000000, len 00000067108864, cap 00000067108864
Zone 00001: cnv, ofst 00000067108864, len 00000067108864, cap 00000067108864
Zone 00002: cnv, ofst 00000134217728, len 00000067108864, cap 00000067108864
Zone 00003: cnv, ofst 00000201326592, len 00000067108864, cap 00000067108864
Zone 00004: swr, ofst 00000268435456, len 00000067108864, cap 00000067108864, wp 00000268435456, em, non_seq 0, reset 0
Zone 00005: swr, ofst 00000335544320, len 00000067108864, cap 00000067108864, wp 00000335544320, em, non_seq 0, reset 0
Zone 00006: swr, ofst 00000402653184, len 00000067108864, cap 00000067108864, wp 00000402653184, em, non_seq 0, reset 0
Zone 00007: swr, ofst 00000469762048, len 00000067108864, cap 00000067108864, wp 00000469762048, em, non_seq 0, reset 0
Zone 00008: swr, ofst 00000536870912, len 00000067108864, cap 00000067108864, wp 00000536870912, em, non_seq 0, reset 0
Zone 00009: swr, ofst 00000603979776, len 00000067108864, cap 00000067108864, wp 00000603979776, em, non_seq 0, reset 0
Zone 00010: swr, ofst 00000671088640, len 00000067108864, cap 00000067108864, wp 00000671088640, em, non_seq 0, reset 0
Zone 00011: swr, ofst 00000738197504, len 00000067108864, cap 00000067108864, wp 00000738197504, em, non_seq 0, reset 0

用脚本创建的分区块设备的删除也需要使用脚本:

#!/bin/bash

if [ $# != 1 ]; then
    echo "Usage: $0 <nullb ID>"
    exit 1
fi

nid=$1

if [ ! -b "/dev/nullb$nid" ]; then
    echo "/dev/nullb$nid: No such device"
    exit 1
fi

echo 0 > /sys/kernel/config/nullb/nullb$nid/power
rmdir /sys/kernel/config/nullb/nullb$nid

echo "Destroyed /dev/nullb$nid"

运行结果如下:

# ./nullblk-del.sh 0
Destroyed /dev/nullb0

Zone操作

当我们创建一个分区块设备后,我们便可以对Zone进行操作,首先我们创建一个分区块设备

# ./nullblk-zoned.sh 4096 32 4 12
Created /dev/nullb0

查看它的属性

# zbd report -csv /dev/nullb0
zone num, type, ofst, len, cap, wp, cond, non_seq, reset
00000, 1, 00000000000000, 00000033554432, 00000033554432, 00000033554432, 0x0, 0, 0
00001, 1, 00000033554432, 00000033554432, 00000033554432, 00000067108864, 0x0, 0, 0
00002, 1, 00000067108864, 00000033554432, 00000033554432, 00000100663296, 0x0, 0, 0
00003, 1, 00000100663296, 00000033554432, 00000033554432, 00000134217728, 0x0, 0, 0
00004, 2, 00000134217728, 00000033554432, 00000033554432, 00000134217728, 0x1, 0, 0
00005, 2, 00000167772160, 00000033554432, 00000033554432, 00000167772160, 0x1, 0, 0
00006, 2, 00000201326592, 00000033554432, 00000033554432, 00000201326592, 0x1, 0, 0
00007, 2, 00000234881024, 00000033554432, 00000033554432, 00000234881024, 0x1, 0, 0
00008, 2, 00000268435456, 00000033554432, 00000033554432, 00000268435456, 0x1, 0, 0
00009, 2, 00000301989888, 00000033554432, 00000033554432, 00000301989888, 0x1, 0, 0
00010, 2, 00000335544320, 00000033554432, 00000033554432, 00000335544320, 0x1, 0, 0
00011, 2, 00000369098752, 00000033554432, 00000033554432, 00000369098752, 0x1, 0, 0
00012, 2, 00000402653184, 00000033554432, 00000033554432, 00000402653184, 0x1, 0, 0
00013, 2, 00000436207616, 00000033554432, 00000033554432, 00000436207616, 0x1, 0, 0
00014, 2, 00000469762048, 00000033554432, 00000033554432, 00000469762048, 0x1, 0, 0
00015, 2, 00000503316480, 00000033554432, 00000033554432, 00000503316480, 0x1, 0, 0

zbd工具实现了在很多zone上执行zone管理的功能,如下例,我们显示打开前两个顺序写的zone:

# zbd open -ofst 134217728 -len 67108864 /dev/nullb0 
# zbd report /dev/nullb0
Device /dev/nullb0:
    Vendor ID: Unknown
    Zone model: host-managed
    Capacity: 0.537 GB (1048576 512-bytes sectors)
    Logical blocks: 131072 blocks of 4096 B
    Physical blocks: 131072 blocks of 4096 B
    Zones: 16 zones of 32.0 MB
    Maximum number of open zones: no limit
    Maximum number of active zones: no limit
Zone 00000: cnv, ofst 00000000000000, len 00000033554432, cap 00000033554432
Zone 00001: cnv, ofst 00000033554432, len 00000033554432, cap 00000033554432
Zone 00002: cnv, ofst 00000067108864, len 00000033554432, cap 00000033554432
Zone 00003: cnv, ofst 00000100663296, len 00000033554432, cap 00000033554432
Zone 00004: swr, ofst 00000134217728, len 00000033554432, cap 00000033554432, wp 00000134217728, oe, non_seq 0, reset 0
Zone 00005: swr, ofst 00000167772160, len 00000033554432, cap 00000033554432, wp 00000167772160, oe, non_seq 0, reset 0
Zone 00006: swr, ofst 00000201326592, len 00000033554432, cap 00000033554432, wp 00000201326592, em, non_seq 0, reset 0
Zone 00007: swr, ofst 00000234881024, len 00000033554432, cap 00000033554432, wp 00000234881024, em, non_seq 0, reset 0
Zone 00008: swr, ofst 00000268435456, len 00000033554432, cap 00000033554432, wp 00000268435456, em, non_seq 0, reset 0
Zone 00009: swr, ofst 00000301989888, len 00000033554432, cap 00000033554432, wp 00000301989888, em, non_seq 0, reset 0
Zone 00010: swr, ofst 00000335544320, len 00000033554432, cap 00000033554432, wp 00000335544320, em, non_seq 0, reset 0
Zone 00011: swr, ofst 00000369098752, len 00000033554432, cap 00000033554432, wp 00000369098752, em, non_seq 0, reset 0
Zone 00012: swr, ofst 00000402653184, len 00000033554432, cap 00000033554432, wp 00000402653184, em, non_seq 0, reset 0
Zone 00013: swr, ofst 00000436207616, len 00000033554432, cap 00000033554432, wp 00000436207616, em, non_seq 0, reset 0
Zone 00014: swr, ofst 00000469762048, len 00000033554432, cap 00000033554432, wp 00000469762048, em, non_seq 0, reset 0
Zone 00015: swr, ofst 00000503316480, len 00000033554432, cap 00000033554432, wp 00000503316480, em, non_seq 0, reset 0

使用dd指令向第一个顺序写的zone写32MB使其变为“full”状态

# dd if=/dev/zero of=/dev/nullb0 oflag=direct bs=1M count=32 seek=128
记录了32+0 的读入
记录了32+0 的写出
33554432字节(34 MB,32 MiB)已复制,0.0266736 s,1.3 GB/s

# zbd report -i /dev/nullb0
Device /dev/nullb0:
    Vendor ID: Unknown
    Zone model: host-managed
    Capacity: 0.537 GB (1048576 512-bytes sectors)
    Logical blocks: 131072 blocks of 4096 B
    Physical blocks: 131072 blocks of 4096 B
    Zones: 16 zones of 32.0 MB
    Maximum number of open zones: no limit
    Maximum number of active zones: no limit
Zone 00000: cnv, ofst 00000000000000, len 00000033554432, cap 00000033554432
Zone 00001: cnv, ofst 00000033554432, len 00000033554432, cap 00000033554432
Zone 00002: cnv, ofst 00000067108864, len 00000033554432, cap 00000033554432
Zone 00003: cnv, ofst 00000100663296, len 00000033554432, cap 00000033554432
Zone 00004: swr, ofst 00000134217728, len 00000033554432, cap 00000033554432, wp 00000167772160, fu, non_seq 0, reset 0
Zone 00005: swr, ofst 00000167772160, len 00000033554432, cap 00000033554432, wp 00000167772160, em, non_seq 0, reset 0
Zone 00006: swr, ofst 00000201326592, len 00000033554432, cap 00000033554432, wp 00000201326592, em, non_seq 0, reset 0
Zone 00007: swr, ofst 00000234881024, len 00000033554432, cap 00000033554432, wp 00000234881024, em, non_seq 0, reset 0
Zone 00008: swr, ofst 00000268435456, len 00000033554432, cap 00000033554432, wp 00000268435456, em, non_seq 0, reset 0
Zone 00009: swr, ofst 00000301989888, len 00000033554432, cap 00000033554432, wp 00000301989888, em, non_seq 0, reset 0
Zone 00010: swr, ofst 00000335544320, len 00000033554432, cap 00000033554432, wp 00000335544320, em, non_seq 0, reset 0
Zone 00011: swr, ofst 00000369098752, len 00000033554432, cap 00000033554432, wp 00000369098752, em, non_seq 0, reset 0
Zone 00012: swr, ofst 00000402653184, len 00000033554432, cap 00000033554432, wp 00000402653184, em, non_seq 0, reset 0
Zone 00013: swr, ofst 00000436207616, len 00000033554432, cap 00000033554432, wp 00000436207616, em, non_seq 0, reset 0
Zone 00014: swr, ofst 00000469762048, len 00000033554432, cap 00000033554432, wp 00000469762048, em, non_seq 0, reset 0
Zone 00015: swr, ofst 00000503316480, len 00000033554432, cap 00000033554432, wp 00000503316480, em, non_seq 0, reset 0

可以看到第一个顺序写的zone已经变为“full”状态了。

图形界面

使用gzbd图形界面也可以操控zone的状态,还可以用gzbd-viewer来查看状态,在我们显式打开前两个顺序写的zone之后,两个工具显示如下。
在这里插入图片描述
在这里插入图片描述

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值