网络数据流,先了解一下用户态协议栈在什么位置
这里以DPDK为例:(目的是为了获得原始的网络数据,除了DPDK,socket raw,netmap也能获取获取以太网数据)
1 默认数据流
默认情况下,网络数据经物理网卡,内核协议栈,VFS,最后到达APP
2 DPDK
DPDK接管网卡,它可以把数据送入用户态协议栈,也可以把数据传到sk_buffer中。
因为dpdk的这种特性,实现的用户态协议栈可以直接读写应用程序内存,更加灵活地控制网络数据流,实现更多自定义功能;可以避免系统调用、内核态切换等开销,减少网络数据包传输时的延迟和CPU使用率,提高处理吞吐量。
DPDK 编译与配置
设置环境变量
sudo su
cd 至 dpdk-stable-19.08.2
export RTE_SDK=“路径/”dpdk-stable-19.08.2/
export RTE_TARGET=x86_64-native-linux-gcc
运行./usertools/dpdk-setup.sh ,部分选项解释
Step 1: Select the DPDK environment to build 选择一种编译器编译
[1] arm64-armada-linuxapp-gcc
[2] arm64-armada-linux-gcc
[3] arm64-armv8a-linuxapp-clang
[4] arm64-armv8a-linuxapp-gcc
[5] arm64-armv8a-linux-clang
[6] arm64-armv8a-linux-gcc
[7] arm64-bluefield-linuxapp-gcc
[8] arm64-bluefield-linux-gcc
[9] arm64-dpaa2-linuxapp-gcc
[10] arm64-dpaa2-linux-gcc
[11] arm64-dpaa-linuxapp-gcc
[12] arm64-dpaa-linux-gcc
[13] arm64-octeontx2-linuxapp-gcc
[14] arm64-octeontx2-linux-gcc
[15] arm64-stingray-linuxapp-gcc
[16] arm64-stingray-linux-gcc
[17] arm64-thunderx2-linuxapp-gcc
[18] arm64-thunderx2-linux-gcc
[19] arm64-thunderx-linuxapp-gcc
[20] arm64-thunderx-linux-gcc
[21] arm64-xgene1-linuxapp-gcc
[22] arm64-xgene1-linux-gcc
[23] arm-armv7a-linuxapp-gcc
[24] arm-armv7a-linux-gcc
[25] i686-native-linuxapp-gcc
[26] i686-native-linuxapp-icc
[27] i686-native-linux-gcc
[28] i686-native-linux-icc
[29] ppc_64-power8-linuxapp-gcc
[30] ppc_64-power8-linux-gcc
[31] x86_64-native-bsdapp-clang
[32] x86_64-native-bsdapp-gcc
[33] x86_64-native-freebsd-clang
[34] x86_64-native-freebsd-gcc
[35] x86_64-native-linuxapp-clang
[36] x86_64-native-linuxapp-gcc
[37] x86_64-native-linuxapp-icc
[38] x86_64-native-linux-clang
[39] x86_64-native-linux-gcc 我这里选择 x86_64-native-linux-gcc,因为我用的系统 ubuntu server x64
[40] x86_64-native-linux-icc
[41] x86_x32-native-linuxapp-gcc
[42] x86_x32-native-linux-gcc
Step 2: Setup linux environment
[43] Insert IGB UIO module // 插入Intel Gigabit Ethernet驱动程序的用户态I/O(UIO)模块。这个模块可以帮助操作系统与Intel网卡进行通信,并提供网络连接服务。
[44] Insert VFIO module // 将物理设备分配给虚拟机以进行直接访问。这使得虚拟机可以在不影响主机性能的情况下获得更好的 I/O 性能,并提供更高的安全性和隔离性。
[45] Insert KNI module // (KNI)模块,以支持在用户空间和内核空间之间传输数据包。
[46] Setup hugepage mappings for non-NUMA systems //设置巨页系统,
[47] Setup hugepage mappings for NUMA systems
// NUMA systems是一种多处理器计算机体系结构,在多核,多内存条,实现统一编址访问
// 如果接收10G数据,只设置4k大小的内存页的话,就需要频繁访问页表,内存页置换,效率不高,这里根据实际情况设置巨页就很有必要。
[48] Display current Ethernet/Baseband/Crypto device settings
// 显示当前以太网/基带/加密设备的设置。这通常指的是计算机或网络设备上的硬件设置,例如网络适配器的速度、双工模式、MAC地址和加密协议等
[49] Bind Ethernet/Baseband/Crypto device to IGB UIO module // 将以太网/基带/加密设备绑定到IGB UIO模块
[50] Bind Ethernet/Baseband/Crypto device to VFIO module // 将以太网/基带/加密设备绑定到VFIO模块的
[51] Setup VFIO permissions // 为VFIO设备分配权限,以便可以在虚拟机中使用该设备
Step 3: Run test application for linux environment
[52] Run test application ( R T E T A R G E T / a p p / t e s t ) [ 53 ] R u n t e s t p m d a p p l i c a t i o n i n i n t e r a c t i v e m o d e ( RTE_TARGET/app/test) [53] Run testpmd application in interactive mode ( RTETARGET/app/test)[53]Runtestpmdapplicationininteractivemode(RTE_TARGET/app/testpmd)
Step 4: Other tools
[54] List hugepage info from /proc/meminfo
Step 5: Uninstall and system cleanup
[55] Unbind devices from IGB UIO or VFIO driver
[56] Remove IGB UIO module
[57] Remove VFIO module
[58] Remove KNI module
[59] Remove hugepage mappings
[60] Exit Script
Option: // 这里输入39回车,就对应step1中的编译环境([39] x86_64-native-linux-gcc)编译了
编译一次即可,编译完成后,就可以按需step2。配置好后,可运行step3中的测试程序。step4,5可根据实际情况使用。
运行./usertools/dpdk-setup.sh shell步骤记录
- 输入43, 设置uio module
Option: 43
Unloading any existing DPDK UIO module
Loading uio module
Loading DPDK UIO module
- 输入44, 设置VFIO module
Option: 44
Unloading any existing VFIO module
Loading VFIO module
chmod /dev/vfio
OK
- 输入45, 设置KNI module
Option: 45
Unloading any existing DPDK KNI module
Loading DPDK KNI module
- 输入46,设置hugepages
Option: 46
Removing currently reserved hugepages
Unmounting /mnt/huge and removing directory
Input the number of 1048576kB hugepages
Example: to have 128MB of hugepages available in a 2MB huge page system,
enter '64' to reserve 64 * 2MB pages
Number of pages: 512
Reserving hugepages
Creating /mnt/huge and mounting as hugetlbfs
- 输入47,设置hugepages for each node
Option: 47
Removing currently reserved hugepages
Unmounting /mnt/huge and removing directory
Input the number of 1048576kB hugepages for each node
Example: to have 128MB of hugepages available per node in a 2MB huge page system,
enter '64' to reserve 64 * 2MB pages on each node
Number of pages for node0: 512
Reserving hugepages
Creating /mnt/huge and mounting as hugetlbfs
- 输入48,显示设备
Network devices using kernel driver
===================================
0000:02:01.0 '82545EM Gigabit Ethernet Controller (Copper) 100f' if=eth2 drv=e1000 unused=igb_uio,vfio-pci
0000:02:06.0 '82545EM Gigabit Ethernet Controller (Copper) 100f' if=eth3 drv=e1000 unused=igb_uio,vfio-pci
0000:03:00.0 'VMXNET3 Ethernet Controller 07b0' if=eth0 drv=vmxnet3 unused=igb_uio,vfio-pci
0000:0b:00.0 'VMXNET3 Ethernet Controller 07b0' if=eth1 drv=vmxnet3 unused=igb_uio,vfio-pci *Active*
No 'Baseband' devices detected
==============================
No 'Crypto' devices detected
============================
No 'Eventdev' devices detected
==============================
No 'Mempool' devices detected
=============================
No 'Compress' devices detected
==============================
No 'Misc (rawdev)' devices detected
===================================
- 输入49,修改绑定的设备,这里是: bind to IGB UIO driver: eth0(或者填入:0000:03:00.0 都可以)
这里的绑定是为了让DPDK接管网卡
Option: 49
Network devices using kernel driver
===================================
0000:02:01.0 '82545EM Gigabit Ethernet Controller (Copper) 100f' if=eth2 drv=e1000 unused=igb_uio,vfio-pci
0000:02:06.0 '82545EM Gigabit Ethernet Controller (Copper) 100f' if=eth3 drv=e1000 unused=igb_uio,vfio-pci
0000:03:00.0 'VMXNET3 Ethernet Controller 07b0' if=eth0 drv=vmxnet3 unused=igb_uio,vfio-pci
0000:0b:00.0 'VMXNET3 Ethernet Controller 07b0' if=eth1 drv=vmxnet3 unused=igb_uio,vfio-pci *Active*
No 'Baseband' devices detected
==============================
No 'Crypto' devices detected
============================
No 'Eventdev' devices detected
==============================
No 'Mempool' devices detected
=============================
No 'Compress' devices detected
==============================
No 'Misc (rawdev)' devices detected
===================================
Enter PCI address of device to bind to IGB UIO driver: (这里输入pci地址)0000:0b:00.0
Warning: routing table indicates that interface 0000:0b:00.0 is active. Not modifying ====> 注意有警告,是因为这个设备被占用了,
这个是后可以另起一个终端,执行sudo ifconfig eth0 down, 把它关掉
(查看网卡信息 lspci -k | grep -A 2 -i "Ethernet")
OK
绑定好的可以通过输入55,删除绑定,如下,vmxnet3正是.vmx文件中设置的 (删除绑定)
Enter PCI address of device to unbind: 0000:03:00.0
Enter name of kernel driver to bind the device to: vmxnet3
注意事项
在有多个网卡的情况下,ifconfig 看到的eth0,eth1与.vmx文件中的 ethernet0, ethernet1可能不是一一对应的关系
测试
Option: 53
Enter hex bitmask of cores to execute testpmd app on
Example: to execute app on cores 0 to 7, enter 0xff
bitmask: 7
Launching app
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:02:01.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:100f net_e1000_em
EAL: PCI device 0000:02:06.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:100f net_e1000_em
EAL: PCI device 0000:03:00.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 15ad:7b0 net_vmxnet3
EAL: PCI device 0000:0b:00.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 15ad:7b0 net_vmxnet3
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=163456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
Configuring Port 0 (socket 0)
Port 0: 00:0C:29:A3:11:BF
Checking link statuses...
Done
testpmd> help
Help is available for the following sections:
help control : Start and stop forwarding.
help display : Displaying port, stats and config information.
help config : Configuration information.
help ports : Configuring ports.
help registers : Reading and setting port registers.
help filters : Filters configuration help.
help traffic_management : Traffic Management commmands.
help devices : Device related cmds.
help all : All of the above sections.
testpmd> help control
Control forwarding:
-------------------
start
Start packet forwarding with current configuration.
start tx_first
Start packet forwarding with current config after sending one burst of packets.
stop
Stop packet forwarding, and display accumulated statistics.
quit
Quit to prompt.
testpmd> start
io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=1
port 0: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
RX desc=0 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x0
TX queue: 0
TX desc=0 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x0 - TX RS bit threshold=0
testpmd> show port info 0
********************* Infos for port 0 *********************
MAC address: 00:0C:29:A3:11:BF
Device name: 0000:03:00.0
Driver name: net_vmxnet3
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 10000 Mbps
Link duplex: full-duplex
MTU: 1500
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 1
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
strip off
filter off
qinq(extend) off
Supported RSS offload flow types:
ipv4
ipv4-tcp
ipv6
ipv6-tcp
Minimum size of RX buffer: 1646
Maximum configurable length of RX packet: 16384
Current number of RX queues: 1
Max possible RX queues: 16
Max possible number of RXDs per queue: 4096
Min possible number of RXDs per queue: 128
RXDs number alignment: 1
Current number of TX queues: 1
Max possible TX queues: 8
Max possible number of TXDs per queue: 4096
Min possible number of TXDs per queue: 512
TXDs number alignment: 1
Max segment number per packet: 255
Max segment number per MTU/TSO: 16
文章参考与<零声教育>的C/C++linux服务期高级架构系统教程学习:链接