virtio-blk原理

最新推荐文章于 2025-04-14 12:42:27 发布

whutyuxinghai

最新推荐文章于 2025-04-14 12:42:27 发布

阅读量2.7k

点赞数

分类专栏： linux相关文章标签： virtio

本文链接：https://blog.csdn.net/yuxinghai2008/article/details/8935244

版权

linux相关专栏收录该内容

37 篇文章

订阅专栏

virtio-blk是Linux内核中用于处理块设备I/O的一种高效驱动，通过使用virtio_ring进行数据交换。其处理数据请求有两种路径：request路径和bio路径，bio路径能跳过io调度层提升性能。请求通过request_queue发送到qemu-kvm，完成后由virtblk_done处理，此过程涉及virtblk_request和virtblk_make_request函数。virtio-blk层的入口和出口点明确，而bio结构体用于指示块设备的读写操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

virtio-blk原理：
1.处理数据请求有两条路径
1).request路径：virtblk_request
virtio_blk结构体中的gendisk结构的request_queue队列接收block层的bio请求，按照request_queue队列默认处理过程，bio请求会在io调度层转化为request，然后进入request_queue队列，最后调用virtblk_request将request转化为vbr结构。

2).bio路径：virtblk_make_request
不按照默认路径走，将bio直接转化为vbr结构。

2.通过request队列（virtio-blk初始化时向virtio_ring申请到的）将vdr发送给qemu-kvm处理。

3.qemu-kvm处理过vdr后会将它加入到virtio_ring的request队列，并发一个中断给队列，队列的中断响应函数vring_interrupt调用队列的回调函数virtblk_done。

4.virtblk_done判断是bio处理方式则调用virtblk_bio_done，否则调用virtblk_request_done

virtio-blk层进入点有两个：
1.request方式：virtblk_request
2.bio方式：virtblk_make_request
bio方式的优势在于跳过了io调度层（主要工作是合并多个bio到request），实现了性能的提高，但是对于低速设备，性能却有下降。

virtio-blk层返回点只有一个，那就是队列的回调函数virtblk_done

1.bio结构体的作用就是指明要读或写块设备的哪些地址及长度，所以它最主要的

成员就是bio_vec数组，一个bio_vec对应一个地址和长度（即，一块区域），

bio_vec数组的作用就是读或写指明的多个块设备区域。

2.bio被传递到io调度层时，就会被转换成request结构体，一个request可能包含

多个读取地址区域相邻的bio从而提高读写性能。

3.块设备所包含的gendisk结构中包含一个request_queue，这个队列就是用来接

收io调度层发送过来的request。

4.gendisk结构的request_queue队列包含各种回调函数来处理整个request的生命

流程：
queue的各种回调函数:
/* request process function - 处理request函数 */
request_fn_proc  *request_fn;
/* make request function - 将bio转化为request函数 */
make_request_fn  *make_request_fn;
/* prepare request function - 创建request时执行的函数 */
prep_rq_fn  *prep_rq_fn;
/* unprepared request function - */
unprep_rq_fn  *unprep_rq_fn;
/* merge bio_vec function - 合并bio到一个request */
merge_bvec_fn  *merge_bvec_fn;
/* 软中断处理函数，request处理完成时的回调函数 */
softirq_done_fn  *softirq_done_fn;
/* 超时处理函数 */
rq_timed_out_fn  *rq_timed_out_fn;
dma_drain_needed_fn *dma_drain_needed;
lld_busy_fn  *lld_busy_fn;

进入block层的接口：generic_make_request
/**
* generic_make_request - hand a buffer to its device driver for I/O
* @bio: The bio describing the location in memory and on the device.
*
* generic_make_request() is used to make I/O requests of block
* devices. It is passed a &struct bio, which describes the I/O that

needs
* to be done.
*
* generic_make_request() does not return any status. The
* success/failure status of the request, along with notification of
* completion, is delivered asynchronously through the bio->bi_end_io
* function described (one day) else where.
*
* The caller of generic_make_request must make sure that bi_io_vec
* are set to describe the memory buffer, and that bi_dev and bi_sector

are
* set to describe the device address, and the
* bi_end_io and optionally bi_private are set to describe how
* completion notification should be signaled.
*
* generic_make_request and the drivers it calls may use bi_next if

this
* bio happens to be merged with someone else, and may resubmit the bio

to
* a lower device by calling into generic_make_request recursively,

which
* means the bio should NOT be touched after the call to -

>make_request_fn.
*/
void generic_make_request(struct bio *bio)
{
struct bio_list bio_list_on_stack;

if (!generic_make_request_checks(bio))
return;

/*
* We only want one ->make_request_fn to be active at a time,

else
* stack usage with stacked devices could be a problem. So use
* current->bio_list to keep a list of requests submited by a
* make_request_fn function. current->bio_list is also used as

a
* flag to say if generic_make_request is currently active in

this
* task or not. If it is NULL, then no make_request is active.

If
* it is non-NULL, then a make_request is active, and new

requests
* should be added at the tail
*/
if (current->bio_list) {
bio_list_add(current->bio_list, bio);
return;
}

/* following loop may be a bit non-obvious, and so deserves

some
* explanation.
* Before entering the loop, bio->bi_next is NULL (as all

callers
* ensure that) so we have a list with a single bio.
* We pretend that we have just taken it off a longer list, so
* we assign bio_list to a pointer to the bio_list_on_stack,
* thus initialising the bio_list of new bios to be
* added. ->make_request() may indeed add some more bios
* through a recursive call to generic_make_request. If it
* did, we find a non-NULL value in bio_list and re-enter the

loop
* from the top. In this case we really did just take the bio
* of the top of the list (no pretending) and so remove it from
* bio_list, and call into ->make_request() again.
*/
BUG_ON(bio->bi_next);
bio_list_init(&bio_list_on_stack);
current->bio_list = &bio_list_on_stack;
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);

q->make_request_fn(q, bio);

bio = bio_list_pop(current->bio_list);
} while (bio);
current->bio_list = NULL; /* deactivate */
}

block层默认创建request的函数：blk_make_request
/**
* blk_make_request - given a bio, allocate a corresponding struct

request.
* @q: target request queue
* @bio: The bio describing the memory mappings that will be submitted

for IO.
* It may be a chained-bio properly constructed by block/bio

layer.
* @gfp_mask: gfp flags to be used for memory allocation
*
* blk_make_request is the parallel of generic_make_request for

BLOCK_PC
* type commands. Where the struct request needs to be farther

initialized by
* the caller. It is passed a &struct bio, which describes the memory

info of
* the I/O transfer.
*
* The caller of blk_make_request must make sure that bi_io_vec
* are set to describe the memory buffers. That bio_data_dir() will

return
* the needed direction of the request. (And all bio's in the passed

bio-chain
* are properly set accordingly)
*
* If called under none-sleepable conditions, mapped bio buffers must

not
* need bouncing, by calling the appropriate masked or flagged

allocator,
* suitable for the target device. Otherwise the call to

blk_queue_bounce will
* BUG.
*
* WARNING: When allocating/cloning a bio-chain, careful consideration

should be
* given to how you allocate bios. In particular, you cannot use

__GFP_WAIT for
* anything but the first bio in the chain. Otherwise you risk waiting

for IO
* completion of a bio that hasn't been submitted yet, thus resulting

in a
* deadlock. Alternatively bios should be allocated using bio_kmalloc()

instead
* of bio_alloc(), as that avoids the mempool deadlock.
* If possible a big IO should be split into smaller parts when

allocation
* fails. Partial allocation should not be an error, or you risk a

live-lock.
*/
struct request *blk_make_request(struct request_queue *q, struct bio

*bio,
gfp_t gfp_mask)
{
struct request *rq = blk_get_request(q, bio_data_dir(bio),

gfp_mask);

if (unlikely(!rq))
return ERR_PTR(-ENOMEM);

for_each_bio(bio) {
struct bio *bounce_bio = bio;
int ret;

  blk_queue_bounce(q, &bounce_bio);
  ret = blk_rq_append_bio(q, rq, bounce_bio);
  if (unlikely(ret)) {
   blk_put_request(rq);
   return ERR_PTR(ret);
  }
}

return rq;
}

block层通用执行request函数：blk_execute_rq

/**
* blk_execute_rq - insert a request into queue for execution
* @q:  queue to insert the request in
* @bd_disk: matching gendisk
* @rq:  request to insert
* @at_head:    insert request at head or tail of queue
*
* Description:
*    Insert a fully prepared request at the back of the I/O scheduler

queue
* for execution and wait for completion.
*/
int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
struct request *rq, int at_head)
{
DECLARE_COMPLETION_ONSTACK(wait);
char sense[SCSI_SENSE_BUFFERSIZE];
int err = 0;
unsigned long hang_check;

/*
* we need an extra reference to the request, so we can look at
* it after io completion
*/
rq->ref_count++;

if (!rq->sense) {
  memset(sense, 0, sizeof(sense));
  rq->sense = sense;
  rq->sense_len = 0;
}

rq->end_io_data = &wait;
blk_execute_rq_nowait(q, bd_disk, rq, at_head,

blk_end_sync_rq);

/* Prevent hang_check timer from firing at us during very long

I/O */
hang_check = sysctl_hung_task_timeout_secs;
if (hang_check)
while (!wait_for_completion_io_timeout(&wait,

hang_check * (HZ/2)));
else
wait_for_completion_io(&wait);

if (rq->errors)
err = -EIO;

return err;
}

/* return id (s/n) string for *disk to *id_str
*/
static int virtblk_get_id(struct gendisk *disk, char *id_str)
{
struct virtio_blk *vblk = disk->private_data;
struct request *req;
struct bio *bio;
int err;
/* 创建一个bio，并且把id_str转换为块设备可理解的页地址，然后将

地址添加到bio的成员bio_vec结构体数组中（该数组一个元素就是一个bio_vec结

构体变量，一个变量就对应一个起始地址和长度，因此该数组就对应几块存储区

），因为id_str的长度可能会需要几个页，所以bio_vec结构体数组元素的个数也

就是id_str占用的页数 */
bio = bio_map_kern(vblk->disk->queue, id_str,

VIRTIO_BLK_ID_BYTES,
GFP_KERNEL);
if (IS_ERR(bio))
return PTR_ERR(bio);

req = blk_make_request(vblk->disk->queue, bio, GFP_KERNEL);
if (IS_ERR(req)) {
bio_put(bio);
return PTR_ERR(req);
}

req->cmd_type = REQ_TYPE_SPECIAL;
err = blk_execute_rq(vblk->disk->queue, vblk->disk, req,

false);
blk_put_request(req);

return err;
}