系统环境,处理器为:Intel® Core™ i7-4790K CPU @ 4.00GHz。
# cat /etc/issue
Ubuntu 20.04 LTS \n \l
#
# uname -a
Linux flyingshark 5.4.0-31-generic #35-Ubuntu SMP Thu May 7 20:20:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
#
BIOS需要开启VT-x、VT-d支持,并且内核启动参数增加:"iommu=pt intel_iommu=on"参数。
DPDK版本使用的是:dpdk-stable-20.02.1。编译命令如下编译完成l2fwd程序,ubuntu 20.04需要安装libnuma-dev包。
# make config T=x86_64-native-linux-gcc
# make T=x86_64-native-linux-gcc
# export RTE_SDK=`pwd`
# cd examples/l2fwd
# make
准备工作,设置hugepages数量为64。
# echo 64 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# mkdir -p /mnt/huge
# mount -t hugetlbfs nodev /mnt/huge
如下所示,Hugepagesize的大小为2M,设置数量为64,即总的大小为128M。
# cat /proc/meminfo | grep Huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 64
HugePages_Free: 47
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 131072 kB
默认情况下,ubuntu 20.04将vfio-pci驱动编译进了内核中,不需要动态加载。修改vfio权限。
# ls /sys/bus/pci/drivers/vfio-pci
# chmod a+x /dev/vfio
最后,绑定网络设备到vfio-pci驱动,系统中有如下网络设备。其中4个X710万兆网口,4个82599ES万兆网卡。测试l2fwd,准备将01:00.0和01:00.1两个X710网卡换成VFIO驱动。
# lspci -t -v
-[0000:00]-+-00.0 Intel Corporation 4th Gen Core Processor DRAM Controller
+-01.0-[01]--+-00.0 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
| +-00.1 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
| +-00.2 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
| \-00.3 Intel Corporation Ethernet Controller X710 for 10GbE SFP+
+-01.1-[02-06]----00.0-[03-06]--+-00.0-[04]--+-00.0 Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection
| | \-00.1 Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection
| +-01.0-[05]--+-00.0 Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection
| | \-00.1 Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection
| \-08.0-[06]--
+-02.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller
+-03.0 Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller
如下绑定两个网卡到vfio-pci驱动:
# echo "0000:01:00.0" > /sys/bus/pci/drivers/i40e/unbind
# echo "0000:01:00.1" > /sys/bus/pci/drivers/i40e/unbind
# echo "vfio-pci" > /sys/bus/pci/devices/0000:01:00.0/driver_override
# echo "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
# echo "\00" > /sys/bus/pci/devices/0000:01:00.0/driver_override
# echo "vfio-pci" > /sys/bus/pci/devices/0000:01:00.1/driver_override
# echo "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/bind
# echo "\00" > /sys/bus/pci/devices/0000:01:00.1/driver_override
查看网卡驱动,如下01:00.0已经换成了vfio-pci驱动。
# lspci -n -s 0000:01:00.0 -v
01:00.0 0200: 8086:1572 (rev 02)
Subsystem: 8086:0000
Flags: fast devsel, IRQ 16
Memory at f0000000 (64-bit, prefetchable) [size=8M]
Memory at f2800000 (64-bit, prefetchable) [size=32K]
Expansion ROM at f7c80000 [disabled] [size=512K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable- Count=129 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 7d-c8-6f-ff-ff-e0-60-00
Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
Capabilities: [1a0] Transaction Processing Hints
Capabilities: [1b0] Access Control Services
Capabilities: [1d0] Secondary PCI Express
Kernel driver in use: vfio-pci
Kernel modules: i40e
最后,启动l2fwd。发生如下的错误:VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound。
# ./l2fwd -l 0-3 -n 4 -- -q 8 -p 3
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:1572 net_i40e
EAL: 0000:01:00.0 VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound
EAL: Requested device 0000:01:00.0 cannot be used
EAL: PCI device 0000:01:00.1 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:1572 net_i40e
EAL: 0000:01:00.1 VFIO group is not viable! Not all devices in IOMMU group bound to VFIO or unbound
经过查询,需要将使用vfio-pci驱动的两个网络接口所在的IOMMU组中的所有设备绑定到VFIO,或者卸载。如下所示,两个网卡位于组1内。
$ readlink /sys/bus/pci/devices/0000:01:00.0/iommu_group
../../../../kernel/iommu_groups/1
$
$ readlink /sys/bus/pci/devices/0000:01:00.1/iommu_group
../../../../kernel/iommu_groups/1
# ls /dev/vfio/1 -l
crw------- 1 root root 243, 0 May 26 07:18 /dev/vfio/1
如下IOMMU组1内的设备,分别为4个X710网卡(0000:01:00.0 0000:01:00.2 0000:01:00.1 0000:01:00.3),4个82599网卡(0000:04:00.0 0000:04:00.1 0000:05:00.0 0000:05:00.1)。
另外,两个Intel PCI网桥(0000:00:01.0 0000:00:01.1),其它为PLX的PCI Switch。这些IOMMU组1内的设备结构图参见以上的lspci显示。
$ ls /sys/kernel/iommu_groups/1/devices/
0000:00:01.0 0000:01:00.0 0000:01:00.2 0000:02:00.0 0000:03:01.0 0000:04:00.0 0000:05:00.0
0000:00:01.1 0000:01:00.1 0000:01:00.3 0000:03:00.0 0000:03:08.0 0000:04:00.1 0000:05:00.1
$
之后,将4个82599网卡替换为vfio-pci驱动(将这些接口unbind,删除ixgbe驱动,应该也可以),再次启动l2fwd。
# ./l2fwd -l 0-3 -n 4 -- -q 8 -p 3
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
...
Checking link statusdone
Port0 Link Up. Speed 10000 Mbps - full-duplex
Port1 Link Up. Speed 10000 Mbps - full-duplex
L2FWD: lcore 1 has nothing to do
L2FWD: lcore 2 has nothing to do
L2FWD: lcore 3 has nothing to do
L2FWD: entering main loop on lcore 0
L2FWD: -- lcoreid=0 portid=0
L2FWD: -- lcoreid=0 portid=1
测试仪测试两个X710端口的UDP吞吐,64字节小包,双向流量测试,结果是,每个方向能达到5784Mbps,带宽的一半多。这是在使用两个核心的情况下,如果配置多队列多核心处理,应当可以达到线速。对比内核的i40e驱动,把个队列平均分配到8个核心上的情况,每个方向流量也是到50%左右。
# ./build/app/testpmd -l 0-3 -n 4 -- -i --portmask=0x1 --nb-cores=2
testpmd>
testpmd> show port info all
********************* Infos for port 0 *********************
MAC address: 00:60:E0:6F:C8:7D
Device name: 0000:01:00.0
Driver name: net_i40e
...
Current number of RX queues: 1
Max possible RX queues: 192
...
Current number of TX queues: 1
Max possible TX queues: 192