PCI passthrough in nested virtualization (rhel7)

Frist, you need an Intel host support Vt-d, and enabled nested virtualization.
By default, nested virtualization is disabled:

[root@kvm-hypervisor ~]# cat /sys/module/kvm_intel/parameters/nested
N
  1. To enable nested virtualization:
    [root@kvm-hypervisor ~]# vi /etc/modprobe.d/kvm-nested.conf
    options kvm-intel nested=1
    options kvm-intel enable_shadow_vmcs=1
    options kvm-intel enable_apicv=1
    options kvm-intel ept=1

    Save & exit the file

    [root@kvm-hypervisor ~]# modprobe -r kvm_intel
    [root@kvm-hypervisor ~]# modprobe -a kvm_intel

Now verify whether nested virtualization feature enabled or not.

[root@kvm-hypervisor ~]# cat /sys/module/kvm_intel/parameters/nested
Y
  1. host should enable Vt-d and "iommu=pt intel_iommu=on" in kernel cmdline.

  2. To enabled L2 guest to use the PCI passthrough, need to configure the L1 guest as below:
<domain>
......
<os>
    <type arch='x86_64' machine='pc-q35-rhelxxx'>hvm</type>
   ......
  </os>
  <features>
......
    <ioapic driver='qemu'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <feature policy='require' name='vmx'/>
  </cpu>
    ......
    <devices>
    ......
    <iommu model='intel'>
      <driver intremap='on' caching_mode='on' iotlb='on'/>
    </iommu>
        ...
      <controller type='pci' index='8' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='254'>
        <node>1</node>
      </target>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x17'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </controller>
    <interface type='hostdev' managed='yes'>
      <mac address='52:54:00:19:78:c6'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x82' slot='0x0a' function='0x3'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </interface>
        ...
  </devices>
</domain>
  • vmx is need for nested virtualization(L1 guest should use 'host-model' or 'host-passthrough' cpu or have the vmx as required);
  • The guest vIOMMU is a general device in QEMU. Currently only Q35 platform supports guest vIOMMU;
  • intremap=[on|off] shows whether the guest vIOMMU will support interrupt remapping. To fully enable vIOMMU functionality, we need to provide intremap=on here. Currently, interrupt remapping does not support full kernel irqchip, only "split" and "off" are supported, It depends on <ioapic driver='qemu'/>;
  • Most of the full emulated devices (like e1000 mentioned above) should be able to work seamlessly now with Intel vIOMMU. However there are some special devices that need extra cares. These devices are:
     Assigned devices (like, vfio-pci)
     Virtio devices (like, virtio-net-pci)
  • caching-mode=on is required when we have assigned devices with the intel-iommu device. The above example assigned the host PCI device 02:00.0 to the guest;
  • They will make qemu cmdline like this:
    ......kernel_irqchip=split .... -device intel-iommu,intremap=on,caching-mode=on
  • virtio devices need "iommu_platform=on,ats=on" defined in device like memballoon device as above. And "device-iotlb=on" in the iommu device;
  1. And on L1 guest, enable "iommu=pt intel_iommu=on" in kernel cmdline.
    # vim /etc/default/grub  (apend "intel_iommu=on" to GRUB_CMDLINE_LINUX)

    if you use seabios:

    # grub2-mkconfig -o /boot/grub2//grub.cfg

    if you use OVMF:

    # grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

    Reboot the L1 guest, then on L1 guest check if the env is ok:
    1). the kvm device is there, otherwise, check the 'enable nested virtualization' step(step 1)

    # ls -al /dev/kvm
    crw-rw-rw-. 1 root kvm 10, 232 Jul  3 14:30 /dev/kvm
    # lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                10
    On-line CPU(s) list:   0-9
    Thread(s) per core:    1
    Core(s) per socket:    1
    Socket(s):             10
    NUMA node(s):          1
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 63
    Model name:            Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
    Stepping:              2
    CPU MHz:               2397.222
    BogoMIPS:              4794.44
    Virtualization:        VT-x
    Hypervisor vendor:     KVM
    Virtualization type:   full
    ......

2). Checkpoint for vIOMMU enable

# dmesg  | grep -i DMAR
[    0.000000] ACPI: DMAR 0x000000007FFE2541 000048 (v01 BOCHS  BXPCDMAR 00000001 BXPC 00000001)
[    0.000000] DMAR: IOMMU enabled
[    0.203737] DMAR: Host address width 39
[    0.203739] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[    0.203776] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 12008c22260206 ecap f02
[    2.910862] DMAR: No RMRR found
[    2.910863] DMAR: No ATSR found
[    2.914870] DMAR: dmar0: Using Queued invalidation
[    2.914924] DMAR: Setting RMRR:
[    2.914926] DMAR: Prepare 0-16MiB unity mapping for LPC
[    2.915039] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    2.915140] DMAR: Intel(R) Virtualization Technology for Directed I/O

Make sure the "DMAR: Intel(R) Virtualization Technology for Directed I/O" is there – if that’s missing something went wrong – don’t be mislead by the earlier “DMAR: IOMMU enabled” line which merely says the kernel saw the “intel_iommu=on” command line option.

3). The IOMMU should also have registered the PCI devices into various groups

# dmesg  | grep -i iommu  |grep device
[    2.915212] iommu: Adding device 0000:00:00.0 to group 0
[    2.915226] iommu: Adding device 0000:00:01.0 to group 1
...snip...
[    5.588723] iommu: Adding device 0000:b5:00.0 to group 14
[    5.588737] iommu: Adding device 0000:b6:00.0 to group 15
[    5.588751] iommu: Adding device 0000:b7:00.0 to group 16

Now you can assgin the 3 interfaces to L2 guest.

Above steps expected to works well, but in fact, some devices share the same iommu group. How to make the devices into separated iommu group?

Reference:
https://www.linuxtechi.com/enable-nested-virtualization-kvm-centos-7-rhel-7/
https://www.linux-kvm.org/page/Nested_Guests
https://www.redhat.com/en/blog/inception-how-usable-are-nested-kvm-guests
https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/
https://wiki.qemu.org/Features/VT-d

转载于:https://blog.51cto.com/11527071/2135675

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值