最近在看多队列的东西,看到下面两篇文章,记录下。以后自己也深入写个这方向的知识。
0:背景
当今的I/O虚拟化主要有几种模式:
1):通过设备的模拟
设备的模拟主要分为两种,一种是直接在VMM中完成模拟(如xen、vmware),一种是在另一个应用程序中完成模拟(如kvm/qemu)
图一 基于VMM模拟的I/O虚拟化
图二 基于用户空间模拟的I/O虚拟化
2):I/O设备直通
针对某些不好模拟的I/O设备(如显卡、串口等),也可以使用直通技术直接将设备透传给虚拟机使用(Intel的VT-D、AMD的IOMMU),设备直通由于host保存了设备的一些信息,因此对迁移的支持相对不够。
图三 设备直通技术
3):SR-IOV
通过 SR-IOV,一个 PCIe 设备不仅可以导出多个PCI 物理功能,还可以导出共享该 I/O 设备上的资源的一组虚拟功能,在这个模型中,不需要任何透传,因为虚拟化在终端设备上发生,从而允许管理程序简单地将虚拟功能映射到 VM 上以实现本机设备性能和隔离安全。
图四 SR-IOV模型
1:virtio
在完全虚拟化场景下,Guest每次I/O操作时,都会被kvm捕获,kvm再通知qemu完成I/O的软件模拟,模拟完成后再回到kvm并将I/0的结果返回给Guest,整体流程较长,性能差。
因此现在kvm的磁盘I/O基本采用半虚拟化virtio方案。半虚拟化下,通过Guest和Qemu完成前后端配合,前后端之间通过1个或多个vqueue环形队列完成信息传递。后端直接与host宿主机的设备交互,不需要qemu进行软件模拟,并且可以将多次的I/O操作合并打包,减少Guest、Host之间切换次数,对I/O操作性能有较大提升。
图五 virtio架构图
1):在GuestOS实现了前端驱动
virtio_blk ----- drivers/block/virtio_blk.c
virtio_net ----- ./drivers/net/virtio_net.c
virtio_ballon----- ./drivers/virtio/virtio_balloon.c
virtio_scsi ---- ./drivers/scsi/virtio_scsi.c
virtio_console--- ./drivers/char/virtio_console.c
2):在qemu侧实现了后端驱动
./hw/block/dataplane/virtio-blk.c
./hw/virtio/virtio-balloon.c
./hw/net/virtio-net.c
./hw/char/virtio-console.c
./hw/scsi/virtio-scsi.c
2:磁盘I/O虚拟化
kvm可以在host创建存储池(默认路径/etc/libvirt/storage),这样一些本地目录或远端(scsi、san、nfs等)分配过来的存储目录就可以通过存储池设置给虚拟机使用。这些存储池里的卷被映射成虚拟机里的一个个虚拟磁盘(磁盘类型可以在xml里指定,比如把host本地的一个卷(该卷可以是任一类型的存储介质)映射成虚拟机里的virtio设备(vda、vdb。。。)、scsi设备(sda、sdb。。。)等),虚拟机在对这些磁盘做I/O操作的时候,可以使用完全虚拟化技术让虚拟机不感知其跑在Guest环境下,也可以使用半虚拟化技术提升I/O性能。
1:virtio_blk
virtio_blk通过在GuestOS增加相应的ko模块作为前端驱动,与Qemu后端之间通过virtqueue方式通讯,可以减少数据的拷贝,并且virtqueue可以将多次的I/O操作合并打包并做统一提交,这样可以减少vm陷出的次数,提升了整体的I/O性能。
2:读磁盘I/O过程分析
图六 virtio_blk处理流程
1):GuestOS调用write系统调用进行写操作,以ext4文件系统为例,最终会调用到ext4_io_submit完成I/O操作的提交
2):根据submit里的I/0请求,执行request操作
do {
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
if (likely(blk_queue_enter(q, false) == 0)) {
struct bio_list lower, same;
/* Create a fresh bio_list for all subordinate requests */
bio_list_on_stack[1] = bio_list_on_stack[0];
bio_list_init(&bio_list_on_stack[0]);
ret = q->make_request_fn(q, bio);
blk_queue_exit(q);
/* sort new bios into those for a lower level
* and those for the same level
*/
bio_list_init(&lower);
bio_list_init(&same);
while ((bio = bio_list_pop(&bio_list_on_stack[0])) != NULL)
if (q == bdev_get_queue(bio->bi_bdev))
bio_list_add(&same, bio);
else
bio_list_add(&lower, bio);
/* now assemble so we handle the lowest level first */
bio_list_merge(&bio_list_on_stack[0], &lower);
bio_list_merge(&bio_list_on_stack[0], &same);
bio_list_merge(&bio_list_on_stack[0], &bio_list_on_stack[1]);
} else {
bio_io_error(bio);
}
bio = bio_list_pop(&bio_list_on_stack[0]);
} while (bio);
3):这里的make_request_fn会在virtbrk_probe的时候设置为blk_queue_make_request,
在blk_queue_make_request里将每个I/0操作都封装成一个request请求
4):进一步调用到blk_mq_run_hw_queue,完成请求的分发
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
{
__blk_mq_delay_run_hw_queue(hctx, async, 0);
}
5):blk_mq_dispatch_rq_list从virtqueue里获取可用的buffer,并将请求的I/0数据填充进去
do {
struct blk_mq_queue_data bd;
rq = list_first_entry(list, struct request, queuelist);
if (!blk_mq_get_driver_tag(rq, &hctx, false)) {
if (!queued && reorder_tags_to_front(list))
continue;
/*
* The initial allocation attempt failed, so we need to
* rerun the hardware queue when a tag is freed.
*/
if (!blk_mq_dispatch_wait_add(hctx))
break;
/*
* It's possible that a tag was freed in the window
* between the allocation failure and adding the
* hardware queue to the wait queue.
*/
if (!blk_mq_get_driver_tag(rq, &hctx, false))
break;
}
list_del_init(&rq->queuelist);
bd.rq = rq;
/*
* Flag last if we have no more requests, or if we have more
* but can't assign a driver tag to it.
*/
if (list_empty(list))
bd.last = true;
else {
struct request *nxt;
nxt = list_first_entry(list, struct request, queuelist);
bd.last = !blk_mq_get_driver_tag(nxt, NULL, false);
}
ret = q->mq_ops->queue_rq(hctx, &bd);
switch (ret) {
case BLK_MQ_RQ_QUEUE_OK:
queued++;
break;
case BLK_MQ_RQ_QUEUE_BUSY:
blk_mq_put_driver_tag_hctx(hctx, rq);
list_add(&rq->queuelist, list);
__blk_mq_requeue_request(rq);
break;
default:
pr_err("blk-mq: bad return on queue: %d\n", ret);
case BLK_MQ_RQ_QUEUE_ERROR:
errors++;
blk_mq_end_request(rq, -EIO);
break;
}
if (ret == BLK_MQ_RQ_QUEUE_BUSY)
break;
} while (!list_empty(list));
6):将I/0操作填入vbuffer之后,调用virtqueue_notify,在notify函数里会执行一个iowrite操作(iowrite16(vq->index,(void __iomem *)vq->priv);),该操作会导致vm陷出到host。这里的vq->priv会在setup_vq里设置为VIRTIO_PCI_QUEUE_NOTIFY的偏移地址(vq->priv =(void __force *)vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY),后续kvm检测到I/0操作,将控制权交给qemu的时候,qemu根据这个偏移找到I/0虚拟化后端处理入口(virtio_queue_notify)。
bool virtqueue_notify(struct virtqueue *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);
if (unlikely(vq->broken))
return false;
/* Prod other side to tell it about changes. */
if (!vq->notify(_vq)) {
vq->broken = true;
return false;
}
return true;
}
7):kvm捕获到I/O访问异常后,根据异常向量表,找到对应的处理函数handle_io;
static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_EXCEPTION_NMI] = handle_exception,
[EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt,
[EXIT_REASON_TRIPLE_FAULT] = handle_triple_fault,
[EXIT_REASON_NMI_WINDOW] = handle_nmi_window,
[EXIT_REASON_IO_INSTRUCTION] = handle_io,
[EXIT_REASON_CR_ACCESS] = handle_cr,
[EXIT_REASON_DR_ACCESS] = handle_dr,
[EXIT_REASON_CPUID] = handle_cpuid,
[EXIT_REASON_MSR_READ] = handle_rdmsr,
[EXIT_REASON_MSR_WRITE] = handle_wrmsr,
[EXIT_REASON_PENDING_INTERRUPT] = handle_interrupt_window,
[EXIT_REASON_HLT] = handle_halt,
[EXIT_REASON_INVD] = handle_invd,
[EXIT_REASON_INVLPG] = handle_invlpg,
[EXIT_REASON_RDPMC] = handle_rdpmc,
[EXIT_REASON_VMCALL] = handle_vmcall,
[EXIT_REASON_VMCLEAR] = handle_vmclear,
[EXIT_REASON_VMLAUNCH] = handle_vmlaunch,
[EXIT_REASON_VMPTRLD] = handle_vmptrld,
[EXIT_REASON_VMPTRST] = handle_vmptrst,
[EXIT_REASON_VMREAD] = handle_vmread,
[EXIT_REASON_VMRESUME] = handle_vmresume,
[EXIT_REASON_VMWRITE] = handle_vmwrite,
[EXIT_REASON_VMOFF] = handle_vmoff,
[EXIT_REASON_VMON] = handle_vmon,
[EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold,
[EXIT_REASON_APIC_ACCESS] = handle_apic_access,
[EXIT_REASON_APIC_WRITE] = handle_apic_write,
[EXIT_REASON_EOI_INDUCED] = handle_apic_eoi_induced,
[EXIT_REASON_WBINVD] = handle_wbinvd,
[EXIT_REASON_XSETBV] = handle_xsetbv,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch,
[EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check,
[EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
[EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
[EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
[EXIT_REASON_MWAIT_INSTRUCTION] = handle_mwait,
[EXIT_REASON_MONITOR_TRAP_FLAG] = handle_monitor_trap,
[EXIT_REASON_MONITOR_INSTRUCTION] = handle_monitor,
[EXIT_REASON_INVEPT] = handle_invept,
[EXIT_REASON_INVVPID] = handle_invvpid,
[EXIT_REASON_XSAVES] = handle_xsaves,
[EXIT_REASON_XRSTORS] = handle_xrstors,
[EXIT_REASON_PML_FULL] = handle_pml_full,
[EXIT_REASON_PREEMPTION_TIMER] = handle_preemption_timer,
};
在handle_io里最终会调用到emulator_pio_in_out,并将退出原因置为KVM_EXIT_IO(PIO的英文拼写是“Programming Input/Output Model”,PIO模式是一种通过CPU执行I/O端口指令来进行数据的读写的数据交换模式。)在kernel_pio里会判断当前如果有vhost_net,则会唤醒vhost内核线程,完成后端处理,如果没有,则回到qemu继续后端处理。
static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size,
unsigned short port, void *val,
unsigned int count, bool in)
{
vcpu->arch.pio.port = port;
vcpu->arch.pio.in = in;
vcpu->arch.pio.count = count;
vcpu->arch.pio.size = size;
if (!kernel_pio(vcpu, vcpu->arch.pio_data)) {
vcpu->arch.pio.count = 0;
return 1;
}
vcpu->run->exit_reason = KVM_EXIT_IO;
vcpu->run->io.direction = in ? KVM_EXIT_IO_IN : KVM_EXIT_IO_OUT;
vcpu->run->io.size = size;
vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE;
vcpu->run->io.count = count;
vcpu->run->io.port = port;
return 0;
}
8):vcpu退出大循环后,回到qemu用户态
case KVM_EXIT_IO: DPRINTF("handle_io\n"); /* Called outside BQL */ kvm_handle_io(run->io.port, attrs, (uint8_t *)run + run->io.data_offset, run->io.direction, run->io.size, run->io.count); ret = 0; break;
在qemu退出vcpu_run后,检测退出原因为KVM_EXIT_IO,调用kvm_handle_io,最终找到后端处理入口virtio_queue_notify
9):virtio_queue_notify通知virtio后端有数据需要处理,virtio后端通过virtqueue_pop(在virtio_blk_get_request里调用)将virtqueue里设置的buffer信息提取出来,并通过virtio_blk_handle_request将请求合并,在virtio_blk_submit_multireq里统一提交处理。
bool virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
{
VirtIOBlockReq *req;
MultiReqBuffer mrb = {};
bool progress = false;
aio_context_acquire(blk_get_aio_context(s->blk));
blk_io_plug(s->blk);
do {
virtio_queue_set_notification(vq, 0);
while ((req = virtio_blk_get_request(s, vq))) {
progress = true;
if (virtio_blk_handle_request(req, &mrb)) {
virtqueue_detach_element(req->vq, &req->elem, 0);
virtio_blk_free_request(req);
break;
}
}
virtio_queue_set_notification(vq, 1);
} while (!virtio_queue_empty(vq));
if (mrb.num_reqs) {
virtio_blk_submit_multireq(s->blk, &mrb);
}
blk_io_unplug(s->blk);
aio_context_release(blk_get_aio_context(s->blk));
return progress;
}
10):qemu完成处理后,重新回到KVM_RUN的vcpu循环里,并通过kvm将vcpu拉回到Guest里继续执行。
2:磁盘I/O虚拟化下的几种cache策略
图七 虚拟化磁盘I/O路径
如上图为虚拟化下的磁盘I/O路径,通过图可以看出,Guest执行一个I/O操作时,需要先经过GuestOS层的vfs、pagecache、I/O schedule等过程写到虚拟磁盘上,这里的虚拟磁盘相对于host上的一个文件,因此host上也需要经过host层面的vfs、pagecache、I/Oschedule等过程才能将最终的数据写到物理磁盘上。
在host上,有几种cache策略可选择,当前qemu/kvm默认为writethough
writethough:直接将数据写到物理磁盘上,这种方式比较安全,但是必须直接操作物理磁盘,保证I/O操作完成后才能返回,性能较差。
writeback:I/O数据写到pagecache层面就可以返回。这种方式性能较好,但是安全性不行,如果数据存在pagecahe还没来的及刷新到物理磁盘上时机器断电,则数据会丢失。
none:I/O数据写到buffer cache返回。性能好于writethough,安全性好于writeback。
intel实现了一种writeback+passing through的方案,兼顾了性能与安全 ----- 后续研究下
3:virtio_blk性能加速方案(Red hat实现,当前还未合入社区)
1):Bio-based virtio-blk
2):vhost-blk
图八 vhost-blk架构
如上图为vhost-blk架构,vhost-blk与virtio-blk的区别在于后端的实现放在host的一个模块里(vhost-blk.ko)。当加载vhost-blk.ko的时候,会在创建一个内核线程vhost-pid,该线程正常情况处于睡眠状态。当Guest读写 I/O时,同样会陷出到host,在host(kvm)里模拟PIO的时候会唤醒vhost线程,然后在vhost线程里完成I/O读写,读写完成的结果通过中断注入的方式通知给Guest。
这种方案的优势是直接在host完成了后端模拟,而不需要回到Qemu用户态,减少了内核态、用户态切换次数。
3:网络I/O虚拟化
1:虚拟网卡的生成
2:virtio-net
与virtio-blk类似,只不过这里实现了两个virtqueue(一个收、一个发),而virtio-blk只有一个virtqueue队列。
图九 virtio_net整体架构图
3:vhost-net
如下图为vhost_net整体框图,可以看出vhost_net不需要qemu作为后端,而是在内核新增了一个vhost_net模块。vhost_net内核模块与Guest共享一个virtqueue,这样就可以避免了处理消息包时从host切换到qemu用户态。
图十 vhost-net整体架构图
1):插入vhost_net.ko的时候会生成/dev/vhost-net设备文件
2):qemu启动的时候,获取到/dev/vhost-net描述符,并通过VHOST_SET_OWNERioctl接口通知host创建一个vhost-pid的内核线程,其中pid为当前qemu线程的pid。
long vhost_dev_set_owner(struct vhost_dev *dev)
{
struct task_struct *worker;
int err;
/* Is there an owner already? */
if (vhost_dev_has_owner(dev)) {
err = -EBUSY;
goto err_mm;
}
/* No owner, become one */
dev->mm = get_task_mm(current);
worker = kthread_create(vhost_worker, dev, "vhost-%d", current->pid);
if (IS_ERR(worker)) {
err = PTR_ERR(worker);
goto err_worker;
}
dev->worker = worker;
wake_up_process(worker); /* avoid contributing to loadavg */
err = vhost_attach_cgroups(dev);
if (err)
goto err_cgroup;
err = vhost_dev_alloc_iovecs(dev);
if (err)
goto err_cgroup;
return 0;
err_cgroup:
kthread_stop(worker);
dev->worker = NULL;
err_worker:
if (dev->mm)
mmput(dev->mm);
dev->mm = NULL;
err_mm:
return err;
}
3):GuestOS将待发送的数据包通过try_fill_recv函数添加到virtqueue的buffer里,然后通过virtqueue_kick陷出到host,在host里将vhost线程唤醒
static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
gfp_t gfp)
{
int err;
bool oom;
gfp |= __GFP_COLD;
do {
if (vi->mergeable_rx_bufs)
err = add_recvbuf_mergeable(vi, rq, gfp);
else if (vi->big_packets)
err = add_recvbuf_big(vi, rq, gfp);
else
err = add_recvbuf_small(vi, rq, gfp);
oom = err == -ENOMEM;
if (err)
break;
} while (rq->vq->num_free);
virtqueue_kick(rq->vq);
return !oom;
}
4):从vhost_work处理函数可以看出,当该线程被唤醒时会调用work->fn完成数据包的收发处理,处理完之后又将自己睡眠,接下来分析下fn是在哪里设置的?
static int vhost_worker(void *data)
{
struct vhost_dev *dev = data;
struct vhost_work *work, *work_next;
struct llist_node *node;
mm_segment_t oldfs = get_fs();
set_fs(USER_DS);
use_mm(dev->mm);
for (;;) {
/* mb paired w/ kthread_stop */
set_current_state(TASK_INTERRUPTIBLE);
if (kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
break;
}
node = llist_del_all(&dev->work_list);
if (!node)
schedule();
node = llist_reverse_order(node);
/* make sure flag is seen after deletion */
smp_wmb();
llist_for_each_entry_safe(work, work_next, node, node) {
clear_bit(VHOST_WORK_QUEUED, &work->flags);
__set_current_state(TASK_RUNNING);
work->fn(work);
if (need_resched())
schedule();
}
}
unuse_mm(dev->mm);
set_fs(oldfs);
return 0;
}
当open/dev/vhost-net设备的时候,会执行vhost-net-open函数,该函数首先为vqueue的收发队列分别设置一个handle_kick函数;
static int vhost_net_open(struct inode *inode, struct file *f)
{
struct vhost_net *n;
struct vhost_dev *dev;
struct vhost_virtqueue **vqs;
int i;
n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_REPEAT);
if (!n)
return -ENOMEM;
vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
if (!vqs) {
kvfree(n);
return -ENOMEM;
}
dev = &n->dev;
vqs[VHOST_NET_VQ_TX] = &n->vqs[VHOST_NET_VQ_TX].vq;
vqs[VHOST_NET_VQ_RX] = &n->vqs[VHOST_NET_VQ_RX].vq;
n->vqs[VHOST_NET_VQ_TX].vq.handle_kick = handle_tx_kick;
n->vqs[VHOST_NET_VQ_RX].vq.handle_kick = handle_rx_kick;
for (i = 0; i < VHOST_NET_VQ_MAX; i++) {
n->vqs[i].ubufs = NULL;
n->vqs[i].ubuf_info = NULL;
n->vqs[i].upend_idx = 0;
n->vqs[i].done_idx = 0;
n->vqs[i].vhost_hlen = 0;
n->vqs[i].sock_hlen = 0;
}
vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
f->private_data = n;
return 0;
然后在执行vhost设备的初始化vhost_dev_init,在该函数里最终调用vhost_work_init将work->fn设置为刚才设置的handle_kick函数。这样,当vhost线程被唤醒的时候就会执行handle_kick函数(handle_tx_kick、handle_rx_kick)。 handle_tx_kick里将消息包从vringbuffer里取出来,调用send_msg发送出去。
void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
{
clear_bit(VHOST_WORK_QUEUED, &work->flags);
work->fn = fn;
init_waitqueue_head(&work->done);
}
5):当tap设备收到消息包时,进入tun_net_xmit,在该函数里将vhost线程唤醒,唤醒后执行handle_rx_kick,handle_rx_kick里将数据包填充到vringbuffer里,然后通过中断注入的方式(vp_interrrupt)通知Guest获取消息包。
static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct tun_struct *tun = netdev_priv(dev);
int txq = skb->queue_mapping;
struct tun_file *tfile;
u32 numqueues = 0;
rcu_read_lock();
tfile = rcu_dereference(tun->tfiles[txq]);
numqueues = ACCESS_ONCE(tun->numqueues);
/* Drop packet if interface is not attached */
if (txq >= numqueues)
goto drop;
#ifdef CONFIG_RPS
if (numqueues == 1 && static_key_false(&rps_needed)) {
/* Select queue was not called for the skbuff, so we extract the
* RPS hash and save it into the flow_table here.
*/
__u32 rxhash;
rxhash = skb_get_hash(skb);
if (rxhash) {
struct tun_flow_entry *e;
e = tun_flow_find(&tun->flows[tun_hashfn(rxhash)],
rxhash);
if (e)
tun_flow_save_rps_rxhash(e, rxhash);
}
}
#endif
tun_debug(KERN_INFO, tun, "tun_net_xmit %d\n", skb->len);
BUG_ON(!tfile);
/* Drop if the filter does not like it.
* This is a noop if the filter is disabled.
* Filter can be enabled only for the TAP devices. */
if (!check_filter(&tun->txflt, skb))
goto drop;
if (tfile->socket.sk->sk_filter &&
sk_filter(tfile->socket.sk, skb))
goto drop;
if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
goto drop;
skb_tx_timestamp(skb);
/* Orphan the skb - required as we might hang on to it
* for indefinite time.
*/
skb_orphan(skb);
nf_reset(skb);
if (skb_array_produce(&tfile->tx_array, skb))
goto drop;
/* Notify and wake up reader process */
if (tfile->flags & TUN_FASYNC)
kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
tfile->socket.sk->sk_data_ready(tfile->socket.sk);
rcu_read_unlock();
return NETDEV_TX_OK;
drop:
this_cpu_inc(tun->pcpu_stats->tx_dropped);
skb_tx_error(skb);
kfree_skb(skb);
rcu_read_unlock();
return NET_XMIT_DROP;
}
4:vhsot_user
上面提到的vhost_net可以直接在内核态完成数据包的收发,相比virtio可以减少内核态、用户态的切换次数及系统调用次数(Qemu发送数据到tap的时候需要通过系统调用的方式)。但是由于vhost_net还是使用了内核的网络协议栈进行处理,因此跟当前的一些用户态协议栈(如DPDK)比,性能也还是有一定的差距。因此,后面又发展了一种vhost_user,其实就是把vhost_net内核态部分的处理挪到用户态,然后配合DPDK、ovs等技术直接在用户态操作物理网卡,完成收发包处理。
---------------------
作者:zgy666
来源:CSDN
原文:https://blog.csdn.net/zgy666/article/details/78469142
版权声明:本文为博主原创文章,转载请附上博文链接!
https://www.cnblogs.com/scottieyuyang/p/6053376.html