xen块设备体系结构(6)

blktap 续


blktap_device

blktap_device的结构很简单:

struct blktap_device {
spinlock_t                     lock;
struct gendisk                *gd;
};

其中struct gendisk结构是内核块设备结构block_device用到的通用disk结构


blktap_device_open

从内核的通用结构 block_device -> bd_disk 中得到硬盘数据结构 struct gendisk 。 从gendisk->private_data中得到 blktap_device

我们/dev/xen/blktap-2/tapdiskXXX 这些块设备用得就是 blktap_device 结构


blktap_device_release

通过传入的gendisk结构,得到blktap_device,  block_device,  blktap 等结构, 调用blktap_device_release之后,最后把blktap结构的 dev_inuse 位设置为BLKTAP_DEVICE_CLOSED, 调用 blktap_ring_kick_user ,wake_up blktap->ring设备里的 poll_wait 信号。


blktap_device_getgeo

返回struct hd_geometry,包含块设备head, cylinder, sector等信息


blktap_device_create

blktap 环设备 blktapXXX,当调用ioctl 并传入cmd为BLKTAP2_IOCTL_CREATE_DEVICE时,会调用 blktap_device_create 来创建tapdevXXX设备。


if (test_bit(BLKTAP_DEVICE, &tap->dev_inuse))

return -EEXIST;

if (blktap_device_validate_params(tap, params))
return -EINVAL;

gd = alloc_disk(1);
if (!gd) {
err = -ENOMEM;
goto fail;
}

if (minor < 26) {
sprintf(gd->disk_name, "td%c", 'a' + minor % 26);
} else if (minor < (26 + 1) * 26) {
sprintf(gd->disk_name, "td%c%c",
'a' + minor / 26 - 1,'a' + minor % 26);
} else {
const unsigned int m1 = (minor / 26 - 1) / 26 - 1;
const unsigned int m2 = (minor / 26 - 1) % 26;
const unsigned int m3 =  minor % 26;
sprintf(gd->disk_name, "td%c%c%c",
'a' + m1, 'a' + m2, 'a' + m3);
}

gd->major = blktap_device_major;
gd->first_minor = minor;
gd->fops = &blktap_device_file_operations;
gd->private_data = tapdev;

spin_lock_init(&tapdev->lock);
rq = blk_init_queue(blktap_device_do_request, &tapdev->lock);
if (!rq) {
err = -ENOMEM;
goto fail;
}
elevator_init(rq, "noop");

gd->queue     = rq;
rq->queuedata = tapdev;
tapdev->gd    = gd;

blktap_device_configure(tap, params);
add_disk(gd);

if (params->name[0])
strncpy(tap->name, params->name, sizeof(tap->name)-1);

set_bit(BLKTAP_DEVICE, &tap->dev_inuse);

dev_info(disk_to_dev(gd), "sector-size: %u capacity: %llu\n",
queue_logical_block_size(rq),
(unsigned long long)get_capacity(gd));

return 0;


test_bit 检查 tap设备是否在使用,如果已被使用报错退出。blktap_device_validate_params 检查blktap_params参数。比如sector size 不能 < 512 or > 4096,disk的capacity是否超过最大值 等。调用alloc_disk 创建一个gendisk结构,然后初始化这个gendisk结构,如下:

gd->major = blktap_device_major;
gd->first_minor = minor;
gd->fops = &blktap_device_file_operations;
gd->private_data = tapdev;

调用 blk_init_queue 初始化,关于blk_init_queue有如下描述

 * Description:
 *    If a block device wishes to use the standard request handling procedures,
 *    which sorts requests and coalesces adjacent requests, then it must
 *    call blk_init_queue().  The function @rfn will be called when there
 *    are requests on the queue that need to be processed.

调用 elevator_init 初始化 request_queue rq

调用 add_disk(gendisk *),把struct gendisk 在内核注册

调用 blktap_device_configure,对tapdevXXX设备进行配置,其中blktap_params 参数由copy_from_user从user space得到:

    set_capacity: 设置gendisk 磁盘大小 = 传入的 capacity

    blk_queue_logical_block_size: set logical block size = 传入的sector_size

    blk_queue_max_sectors:max_sectors 最小为8, 最大为1024个sector。注意这里的sector大小是块驱动认为的固定大小 512 bytes

    blk_queue_segment_boundary / blk_queue_max_segment_size : per segment 的 size是 4K

    blk_queue_max_phys_segments / blk_queue_max_hw_segments : request_queue 每个 request 最多有11个segment,每个segment 4k,相当于8个sectors大小

    

blktap_device_destroy

blktapXXX设备执行ioctl, command为BLKTAP2_IOCTL_REMOVE_DEVICE时,执行blktap_device_destroy。

blktap_device_destroy会调用 blk_cleanup_queue,这是内核的通用函数

void blk_cleanup_queue(struct request_queue *q)
{
/*
* We know we have process context here, so we can be a little
* cautious and ensure that pending block actions on this device
* are done before moving on. Going into this function, we should
* not have processes doing IO to this device.
*/
blk_sync_queue(q);

mutex_lock(&q->sysfs_lock);
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock);

if (q->elevator)
elevator_exit(q->elevator);

blk_put_queue(q);
}

我们知道request_queue里的IO请求都是异步的,在关闭tapdevXXX 设备的时候,这些请求是需要进行清理的。这通过blk_sync_queue来实现。

/**
 * blk_sync_queue - cancel any pending callbacks on a queue
 * @q: the queue
 *
 * Description:
 *     The block layer may perform asynchronous callback activity
 *     on a queue, such as calling the unplug function after a timeout.
 *     A block device may call blk_sync_queue to ensure that any
 *     such activity is cancelled, thus allowing it to release resources
 *     that the callbacks might use. The caller must already have made sure
 *     that its ->make_request_fn will not re-add plugging prior to calling
 *     this function.
 *
 */
void blk_sync_queue(struct request_queue *q)
{
del_timer_sync(&q->unplug_timer);
del_timer_sync(&q->timeout);
cancel_work_sync(&q->unplug_work);
}


blk_sync_queue应该对于没有返回的IO请求,取消之前的注册行为,相当于discard这些请求了。


blktap_device_fail_queue

该函数调用 __blktap_next_queued_rq 遍历 request_queue,对每个请求调用 __blktap_end_queued_rq(rq, -EIO)


我们回顾下blktapXXX设备提供了如下操作

static struct file_operations blktap_ring_file_operations = {
.owner    = THIS_MODULE,
.open     = blktap_ring_open,
.release  = blktap_ring_release,
.ioctl    = blktap_ring_ioctl,
.mmap     = blktap_ring_mmap,
.poll     = blktap_ring_poll,
};


blktap_ring_poll

blktap_ring_poll 会调用 blktap_device_run_queue,里面又是一个循环,对request_queue里的所有request, 调用 blktap_device_make_request 。 

blktap_device_make_request 首先调用blktap_ring_make_request,生成 blktap_request 结构,然后调用 blktap_request_get_pages 为blktap_request 分配页框,最后调用 blktap_ring_submit_request 

blktap_device_do_request 是 tapdevXXX 块设备初始化函数 blk_init_queue 传入的函数指针。这个指针具体做什么的请参考内核块设备。blktap_device_do_request 调用了blktap_ring_kick_user,用来 wake_up 一个 blktap_ring->poll_wait 结构。还记得之前的blktap_ring_poll函数么,该函数调用 poll_wait(filp, &ring->poll_wait, wait) 一直阻塞在 poll_wait 这个wait_queue list 上。所以可以认为 blktap_ring_kick_user 用来唤醒 blktap_ring_poll 函数,把request_queue里的request submit上去。

blktap_ring_submit_request 把请求放到IO环里,下一步应该是tapdisk2 来处理这些IO请求了

 




  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值