virtio-blk原理

virtio-blk是Linux内核中用于处理块设备I/O的一种高效驱动,通过使用virtio_ring进行数据交换。其处理数据请求有两种路径:request路径和bio路径,bio路径能跳过io调度层提升性能。请求通过request_queue发送到qemu-kvm,完成后由virtblk_done处理,此过程涉及virtblk_request和virtblk_make_request函数。virtio-blk层的入口和出口点明确,而bio结构体用于指示块设备的读写操作。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

virtio-blk原理:
1.处理数据请求有两条路径
1).request路径:virtblk_request
virtio_blk结构体中的gendisk结构的request_queue队列接收block层的bio请求,按照request_queue队列默认处理过程,bio请求会在io调度层转化为request,然后进入request_queue队列,最后调用virtblk_request将request转化为vbr结构。

2).bio路径:virtblk_make_request
不按照默认路径走,将bio直接转化为vbr结构。

2.通过request队列(virtio-blk初始化时向virtio_ring申请到的)将vdr发送给qemu-kvm处理。

3.qemu-kvm处理过vdr后会将它加入到virtio_ring的request队列,并发一个中断给队列,队列的中断响应函数vring_interrupt调用队列的回调函数virtblk_done。

4.virtblk_done判断是bio处理方式则调用virtblk_bio_done,否则调用virtblk_request_done


virtio-blk层进入点有两个:
1.request方式:virtblk_request
2.bio方式:virtblk_make_request
bio方式的优势在于跳过了io调度层(主要工作是合并多个bio到request),实现了性能的提高,但是对于低速设备,性能却有下降。

virtio-blk层返回点只有一个,那就是队列的回调函数virtblk_done

 

1.bio结构体的作用就是指明要读或写块设备的哪些地址及长度,所以它最主要的

成员就是bio_vec数组,一个bio_vec对应一个地址和长度(即,一块区域),

bio_vec数组的作用就是读或写指明的多个块设备区域。

2.bio被传递到io调度层时,就会被转换成request结构体,一个request可能包含

多个读取地址区域相邻的bio从而提高读写性能。

3.块设备所包含的gendisk结构中包含一个request_queue,这个队列就是用来接

收io调度层发送过来的request。

4.gendisk结构的request_queue队列包含各种回调函数来处理整个request的生命

流程:
queue的各种回调函数:
 /* request process function - 处理request函数 */
 request_fn_proc  *request_fn;
 /* make request function - 将bio转化为request函数 */
 make_request_fn  *make_request_fn;
 /* prepare request function - 创建request时执行的函数 */
 prep_rq_fn  *prep_rq_fn;
 /* unprepared request function -  */
 unprep_rq_fn  *unprep_rq_fn;
 /* merge bio_vec function - 合并bio到一个request */
 merge_bvec_fn  *merge_bvec_fn;
 /* 软中断处理函数,request处理完成时的回调函数 */
 softirq_done_fn  *softirq_done_fn;
 /* 超时处理函数 */
 rq_timed_out_fn  *rq_timed_out_fn;
 dma_drain_needed_fn *dma_drain_needed;
 lld_busy_fn  *lld_busy_fn;

进入block层的接口:generic_make_request
/**
 * generic_make_request - hand a buffer to its device driver for I/O
 * @bio:  The bio describing the location in memory and on the device.
 *
 * generic_make_request() is used to make I/O requests of block
 * devices. It is passed a &struct bio, which describes the I/O that

needs
 * to be done.
 *
 * generic_make_request() does not return any status.  The
 * success/failure status of the request, along with notification of
 * completion, is delivered asynchronously through the bio->bi_end_io
 * function described (one day) else where.
 *
 * The caller of generic_make_request must make sure that bi_io_vec
 * are set to describe the memory buffer, and that bi_dev and bi_sector

are
 * set to describe the device address, and the
 * bi_end_io and optionally bi_private are set to describe how
 * completion notification should be signaled.
 *
 * generic_make_request and the drivers it calls may use bi_next if

this
 * bio happens to be merged with someone else, and may resubmit the bio

to
 * a lower device by calling into generic_make_request recursively,

which
 * means the bio should NOT be touched after the call to -

>make_request_fn.
 */
void generic_make_request(struct bio *bio)
{
 struct bio_list bio_list_on_stack;

 if (!generic_make_request_checks(bio))
  return;

 /*
  * We only want one ->make_request_fn to be active at a time,

else
  * stack usage with stacked devices could be a problem.  So use
  * current->bio_list to keep a list of requests submited by a
  * make_request_fn function.  current->bio_list is also used as

a
  * flag to say if generic_make_request is currently active in

this
  * task or not.  If it is NULL, then no make_request is active.

 If
  * it is non-NULL, then a make_request is active, and new

requests
  * should be added at the tail
  */
 if (current->bio_list) {
  bio_list_add(current->bio_list, bio);
  return;
 }

 /* following loop may be a bit non-obvious, and so deserves

some
  * explanation.
  * Before entering the loop, bio->bi_next is NULL (as all

callers
  * ensure that) so we have a list with a single bio.
  * We pretend that we have just taken it off a longer list, so
  * we assign bio_list to a pointer to the bio_list_on_stack,
  * thus initialising the bio_list of new bios to be
  * added.  ->make_request() may indeed add some more bios
  * through a recursive call to generic_make_request.  If it
  * did, we find a non-NULL value in bio_list and re-enter the

loop
  * from the top.  In this case we really did just take the bio
  * of the top of the list (no pretending) and so remove it from
  * bio_list, and call into ->make_request() again.
  */
 BUG_ON(bio->bi_next);
 bio_list_init(&bio_list_on_stack);
 current->bio_list = &bio_list_on_stack;
 do {
  struct request_queue *q = bdev_get_queue(bio->bi_bdev);

  q->make_request_fn(q, bio);

  bio = bio_list_pop(current->bio_list);
 } while (bio);
 current->bio_list = NULL; /* deactivate */
}

block层默认创建request的函数:blk_make_request
/**
 * blk_make_request - given a bio, allocate a corresponding struct

request.
 * @q: target request queue
 * @bio:  The bio describing the memory mappings that will be submitted

for IO.
 *        It may be a chained-bio properly constructed by block/bio

layer.
 * @gfp_mask: gfp flags to be used for memory allocation
 *
 * blk_make_request is the parallel of generic_make_request for

BLOCK_PC
 * type commands. Where the struct request needs to be farther

initialized by
 * the caller. It is passed a &struct bio, which describes the memory

info of
 * the I/O transfer.
 *
 * The caller of blk_make_request must make sure that bi_io_vec
 * are set to describe the memory buffers. That bio_data_dir() will

return
 * the needed direction of the request. (And all bio's in the passed

bio-chain
 * are properly set accordingly)
 *
 * If called under none-sleepable conditions, mapped bio buffers must

not
 * need bouncing, by calling the appropriate masked or flagged

allocator,
 * suitable for the target device. Otherwise the call to

blk_queue_bounce will
 * BUG.
 *
 * WARNING: When allocating/cloning a bio-chain, careful consideration

should be
 * given to how you allocate bios. In particular, you cannot use

__GFP_WAIT for
 * anything but the first bio in the chain. Otherwise you risk waiting

for IO
 * completion of a bio that hasn't been submitted yet, thus resulting

in a
 * deadlock. Alternatively bios should be allocated using bio_kmalloc()

instead
 * of bio_alloc(), as that avoids the mempool deadlock.
 * If possible a big IO should be split into smaller parts when

allocation
 * fails. Partial allocation should not be an error, or you risk a

live-lock.
 */
struct request *blk_make_request(struct request_queue *q, struct bio

*bio,
     gfp_t gfp_mask)
{
 struct request *rq = blk_get_request(q, bio_data_dir(bio),

gfp_mask);

 if (unlikely(!rq))
  return ERR_PTR(-ENOMEM);

 for_each_bio(bio) {
  struct bio *bounce_bio = bio;
  int ret;

  blk_queue_bounce(q, &bounce_bio);
  ret = blk_rq_append_bio(q, rq, bounce_bio);
  if (unlikely(ret)) {
   blk_put_request(rq);
   return ERR_PTR(ret);
  }
 }

 return rq;
}

block层通用执行request函数:blk_execute_rq

/**
 * blk_execute_rq - insert a request into queue for execution
 * @q:  queue to insert the request in
 * @bd_disk: matching gendisk
 * @rq:  request to insert
 * @at_head:    insert request at head or tail of queue
 *
 * Description:
 *    Insert a fully prepared request at the back of the I/O scheduler

queue
 *    for execution and wait for completion.
 */
int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
     struct request *rq, int at_head)
{
 DECLARE_COMPLETION_ONSTACK(wait);
 char sense[SCSI_SENSE_BUFFERSIZE];
 int err = 0;
 unsigned long hang_check;

 /*
  * we need an extra reference to the request, so we can look at
  * it after io completion
  */
 rq->ref_count++;

 if (!rq->sense) {
  memset(sense, 0, sizeof(sense));
  rq->sense = sense;
  rq->sense_len = 0;
 }

 rq->end_io_data = &wait;
 blk_execute_rq_nowait(q, bd_disk, rq, at_head,

blk_end_sync_rq);

 /* Prevent hang_check timer from firing at us during very long

I/O */
 hang_check = sysctl_hung_task_timeout_secs;
 if (hang_check)
  while (!wait_for_completion_io_timeout(&wait,

hang_check * (HZ/2)));
 else
  wait_for_completion_io(&wait);

 if (rq->errors)
  err = -EIO;

 return err;
}


/* return id (s/n) string for *disk to *id_str
 */
static int virtblk_get_id(struct gendisk *disk, char *id_str)
{
 struct virtio_blk *vblk = disk->private_data;
 struct request *req;
 struct bio *bio;
 int err;
 /* 创建一个bio,并且把id_str转换为块设备可理解的页地址,然后将

地址添加到bio的成员bio_vec结构体数组中(该数组一个元素就是一个bio_vec结

构体变量,一个变量就对应一个起始地址和长度,因此该数组就对应几块存储区

),因为id_str的长度可能会需要几个页,所以bio_vec结构体数组元素的个数也

就是id_str占用的页数 */
 bio = bio_map_kern(vblk->disk->queue, id_str,

VIRTIO_BLK_ID_BYTES,
      GFP_KERNEL);
 if (IS_ERR(bio))
  return PTR_ERR(bio);

 req = blk_make_request(vblk->disk->queue, bio, GFP_KERNEL);
 if (IS_ERR(req)) {
  bio_put(bio);
  return PTR_ERR(req);
 }

 req->cmd_type = REQ_TYPE_SPECIAL;
 err = blk_execute_rq(vblk->disk->queue, vblk->disk, req,

false);
 blk_put_request(req);

 return err;
}

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值