Linux网卡总结

线速

万兆网卡小包线速:
64B + 7B(Preamble) + 1B(SFD) + 12B(IFG) = 84B
10*10^9/84/8 = 14880952 pps

万兆网卡大包线速:
1518B + 7B(Preamble) + 1B(SFD) + 12B(IFG) = 1538B
10*10^9/1538/8 = 812743 pps

查看网卡信息

# 查看网卡信息
[root@localhost ~]# ethtool enp7s0f0
Settings for enp7s0f0:
	...
	Speed: 10000Mb/s # 网卡速率
	Duplex: Full
	Port: FIBRE
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: off
	Supports Wake-on: d
	Wake-on: d
	Current message level: 0x00000007 (7)
			       drv probe link
	Link detected: yes
# 查看驱动信息
[root@localhost ~]# ethtool -i enp7s0f0
driver: ixgbe # 驱动类型
version: 5.1.0-k-rh7.5 # 驱动版本
firmware-version: 0x8000084b
expansion-rom-version:
bus-info: 0000:07:00.0 # 总线号
...
# 查看offload
[root@localhost ~]# ethtool -k enp7s0f0|grep offload
tcp-segmentation-offload: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
...

查看PCI信息

# 查看PCI信息
[root@localhost ~]# lspci -vvvs 07:00.0
07:00.0 Ethernet controller: Intel Corporation Ethernet Connection X553 10 GbE SFP+ (rev 11)
	...
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at df800000 (64-bit, prefetchable) [size=2M]
	Region 4: Memory at dfa04000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at dfc80000 [disabled] [size=512K]
	...
	# 最多支持64个MSI-X中断
	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00002000
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
			MaxPayload 128 bytes, MaxReadReq 256 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		# PCIe带宽,参见https://en.wikipedia.org/wiki/PCI_Express
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		...
总线号:domain:bus:slot.func(16位域号、8位总线号、5位设备号、3位功能号)
查看所有PCI设备:lspci -vvv
查看所有网卡:lspci -vvv|grep Ethernet
查看PCI设备所属NUMA:cat /sys/bus/pci/devices/0000\:07\:00.0/numa_node
查看网卡所属NUMA:cat /sys/class/net/enp7s0f0/device/numa_node

RSS(Receive Side Scaling)

若逻辑CPU数不大于16,只需要RSS分散中断;否则,需要RSS + RPS分散中断

# 查看中断
[root@localhost ~]# cat /proc/interrupts|egrep 'CPU|enp7s0f0'
           CPU0       CPU1       CPU2       CPU3
 78:        880          0          0          0   PCI-MSI-edge      enp7s0f0-TxRx-0
 79:        862          0          0          0   PCI-MSI-edge      enp7s0f0-TxRx-1
 80:        868          0          0          0   PCI-MSI-edge      enp7s0f0-TxRx-2
 81:        860          0          0          0   PCI-MSI-edge      enp7s0f0-TxRx-3
 82:          2          0          0          0   PCI-MSI-edge      enp7s0f0

# 查看CPU亲和性
[root@localhost ~]# cat /proc/irq/78/smp_affinity
1

# 修改CPU亲和性
[root@localhost ~]# echo 1 > /proc/irq/78/smp_affinity

# 查看hash indirection table
[root@localhost ~]# ethtool -x enp7s0f0
RX flow hash indirection table for enp7s0f0 with 4 RX ring(s):
    0:      0     1     2     3     0     1     2     3
    8:      0     1     2     3     0     1     2     3
   16:      0     1     2     3     0     1     2     3
   24:      0     1     2     3     0     1     2     3
   32:      0     1     2     3     0     1     2     3
   40:      0     1     2     3     0     1     2     3
   48:      0     1     2     3     0     1     2     3
   56:      0     1     2     3     0     1     2     3
   64:      0     1     2     3     0     1     2     3
   72:      0     1     2     3     0     1     2     3
   80:      0     1     2     3     0     1     2     3
   88:      0     1     2     3     0     1     2     3
   96:      0     1     2     3     0     1     2     3
  104:      0     1     2     3     0     1     2     3
  112:      0     1     2     3     0     1     2     3
  120:      0     1     2     3     0     1     2     3
  ...

# 修改hash indirection table
[root@localhost ~]# ethtool -X enp7s0f0 equal 16

# 查看hash input
[root@localhost ~]# ethtool -n enp7s0f0 rx-flow-hash udp4
UDP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA

# 修改hash input
[root@localhost ~]# ethtool -N enp7s0f0 rx-flow-hash udp4 sdfn

RPS(Receive Packet Steering)

# 查看队列
[root@localhost ~]# ls /sys/class/net/enp7s0f0/queues
rx-0  rx-1  rx-2  rx-3  tx-0  tx-1  tx-2  tx-3

# 查看CPU亲和性
[root@localhost ~]# cat /sys/class/net/enp7s0f0/queues/rx-0/rps_cpus
1

# 修改CPU亲和性
[root@localhost ~]# echo 1 > /sys/class/net/enp7s0f0/queues/rx-0/rps_cpus

# 查看网络软中断
[root@localhost ~]# cat /proc/softirqs|egrep 'CPU|TX|RX'
                    CPU0       CPU1       CPU2       CPU3
      NET_TX:        954       1301          0          0
      NET_RX:      87032          0          0          0

XPS(Transmit Packet Steering)

# 查看队列
[root@localhost ~]# ls /sys/class/net/enp7s0f0/queues
rx-0  rx-1  rx-2  rx-3  tx-0  tx-1  tx-2  tx-3

# 查看CPU亲和性
[root@localhost ~]# cat /sys/class/net/enp7s0f0/queues/tx-0/xps_cpus
1

# 修改CPU亲和性
[root@localhost ~]# echo 1 > /sys/class/net/enp7s0f0/queues/tx-0/xps_cpus

# 查看网络软中断
[root@localhost ~]# cat /proc/softirqs|egrep 'CPU|TX|RX'
                    CPU0       CPU1       CPU2       CPU3
      NET_TX:        954       1301          0          0
      NET_RX:      87032          0          0          0

FD(Flow Director)

FD和RSS都是针对接收方向,FD的优先级高于RSS,FD一个比较典型的例子是保证回包也落到发包的队列

RSS通过五元组hash实现了数据包在各个队列之间的负载均衡,但是不能保证回包也落到同一个队列,对称hash(src和dst交换后hash不变)可以部分解决该问题,但是对于一些需要做NAT的设备(比如负载均衡设备)就失效了,FD可以解决该问题,参见MGW——美团点评高性能四层负载均衡

# on/off表示支持FD,[fixed]表示不支持FD
[root@localhost ~]# ethtool -k enp7s0f0|grep ntuple
ntuple-filters: off

# 打开FD
[root@localhost ~]# ethtool -K enp7s0f0 ntuple on

# 关闭FD
[root@localhost ~]# ethtool -K enp7s0f0 ntuple off

# 将目的IP为192.168.0.1的UDP流绑定到队列0
[root@localhost ~]# ethtool -N enp7s0f0 flow-type udp4 dst-ip 192.168.0.1 action 0

Rx/Tx Ring Buffer

# 查看Rx/Tx Ring Buffer大小
[root@localhost ~]# ethtool -g enp7s0f0
Ring parameters for enp7s0f0:
Pre-set maximums:
RX:     4096
RX Mini:    0
RX Jumbo:   0
TX:     4096
Current hardware settings:
RX:     512
RX Mini:    0
RX Jumbo:   0
TX:     512

# 修改Rx/Tx Ring Buffer大小
[root@localhost ~]# ethtool -G enp7s0f0 rx 4096 tx 4096

网卡多队列

我们以82599为例介绍,82599一共有128个硬件发送队列(Tx FIFO)和128个硬件接收队列(Rx FIFO),实际使用的队列数主要由DCB和RSS决定

DCB(Data Center Bridging)
Packets are classified into one of several (up to eight) Traffic Classes (TCs). Each TC is associated with a single unique packet buffer. Packets that reside in a specific packet buffer are then routed to one of a set of Rx queues based on their TC value and other considerations such as RSS and virtualization.

RSS(Receive Side Scaling)
RSS assigns to each received packet an RSS index. Packets are routed to one of a set of Rx queues based on their RSS index and other considerations such as DCB and virtualization.

如下图所示,硬件接收队列的index有7位(其中高3位由DCB决定,低4位由RSS决定),RSS最多支持2^4 = 16个队列
在这里插入图片描述
四种情况

No RSSRSS
No DCBQueue 0 is used for all packetsA set of 16 queues is allocated for RSS
DCBA single queue is allocated per TC to a total of eight queues (if the number of TCs is eight), or to a total of four queues (if the number of TCs is four)A packet is assigned to one of 128 queues (8 TCs x 16 RSS) or one of 64 queues (4 TCs x 16 RSS)

我们以下图为例,同时使能DCB和RSS,其中DCB有4个TC,每个TC对应16个队列。上面一排的64个硬件队列用于4个TC,在TC0、1、2、3中,RSS分别使用8、4、4、8个硬件队列。下面一排的64个硬件队列用于其它Filters
在这里插入图片描述
数据包经过各个Filters,如果匹配,送到对应队列,否则,计算RSS index和TC index,综合得到队列index,送到对应队列
在这里插入图片描述

  • 5
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值