Linux 网络: 网卡速度异常案例(1)

1. 前言

限于作者能力水平,本文可能存在谬误,因此而给读者带来的损失,作者不做任何承诺。

2. 问题描述

TIAM335x 平台适配了一个 1000Mbps 的 PHY 芯片,通过 iperf 打流,TCP 带宽测试结果如下:

# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.1.201, port 55944
[  5] local 192.168.1.88 port 5201 connected to 192.168.1.201 port 55945
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  51.5 MBytes   431 Mbits/sec                  
[  5]   1.00-2.00   sec  53.0 MBytes   444 Mbits/sec                  
[  5]   2.00-3.00   sec  52.1 MBytes   438 Mbits/sec                  
[  5]   3.00-4.00   sec  52.6 MBytes   442 Mbits/sec                  
[  5]   4.00-5.00   sec  53.0 MBytes   443 Mbits/sec                  
[  5]   5.00-6.00   sec  52.4 MBytes   441 Mbits/sec                  
[  5]   6.00-7.00   sec  52.0 MBytes   435 Mbits/sec                  
[  5]   7.00-8.00   sec  52.1 MBytes   438 Mbits/sec                  
[  5]   8.00-9.00   sec  52.2 MBytes   439 Mbits/sec                  
[  5]   9.00-10.00  sec  53.0 MBytes   444 Mbits/sec                  
[  5]  10.00-11.00  sec  52.9 MBytes   442 Mbits/sec                  
[  5]  11.00-12.00  sec  52.9 MBytes   445 Mbits/sec                  
[  5]  12.00-13.00  sec  52.4 MBytes   439 Mbits/sec                  
[  5]  13.00-14.00  sec  52.6 MBytes   441 Mbits/sec                  
[  5]  14.00-15.00  sec  51.6 MBytes   433 Mbits/sec                  
[  5]  15.00-16.00  sec  52.5 MBytes   442 Mbits/sec                  
[  5]  16.00-17.00  sec  52.2 MBytes   438 Mbits/sec                  
[  5]  17.00-18.00  sec  52.1 MBytes   437 Mbits/sec                  
[  5]  18.00-19.00  sec  52.1 MBytes   438 Mbits/sec                  
[  5]  19.00-19.54  sec  27.8 MBytes   431 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-19.54  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-19.54  sec  1023 MBytes   439 Mbits/sec                  receiver
-----------------------------------------------------------
# iperf3 -c 192.168.1.201 -p 5201 -n 2G
Connecting to host 192.168.1.201, port 5201
[  4] local 192.168.1.88 port 45292 connected to 192.168.1.201 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.02   sec  45.0 MBytes   371 Mbits/sec    0    130 KBytes       
[  4]   1.02-2.01   sec  43.8 MBytes   370 Mbits/sec    0    130 KBytes       
[  4]   2.01-3.02   sec  45.0 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]   3.02-4.01   sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]   4.01-5.02   sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]   5.02-6.01   sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]   6.01-7.02   sec  45.0 MBytes   374 Mbits/sec    0    130 KBytes       
[  4]   7.02-8.00   sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]   8.00-9.01   sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]   9.01-10.03  sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  10.03-11.01  sec  43.8 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  11.01-12.02  sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  12.02-13.00  sec  43.8 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  13.00-14.02  sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  14.02-15.00  sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  15.00-16.02  sec  45.0 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  16.02-17.00  sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  17.00-18.02  sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  18.02-19.00  sec  43.8 MBytes   371 Mbits/sec    0    130 KBytes       
[  4]  19.00-20.01  sec  45.0 MBytes   374 Mbits/sec    0    130 KBytes       
[  4]  20.01-21.03  sec  45.0 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  21.03-22.01  sec  43.8 MBytes   374 Mbits/sec    0    130 KBytes       
[  4]  22.01-23.02  sec  45.0 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  23.02-24.01  sec  43.8 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  24.01-25.03  sec  45.0 MBytes   371 Mbits/sec    0    130 KBytes       
[  4]  25.03-26.01  sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  26.01-27.00  sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  27.00-28.01  sec  45.0 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  28.01-29.00  sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  29.00-30.01  sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  30.01-31.00  sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  31.00-32.02  sec  45.0 MBytes   371 Mbits/sec    0    130 KBytes       
[  4]  32.02-33.01  sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  33.01-34.02  sec  45.0 MBytes   374 Mbits/sec    0    130 KBytes       
[  4]  34.02-35.03  sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  35.03-36.01  sec  43.8 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  36.01-37.03  sec  45.0 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  37.03-38.02  sec  43.8 MBytes   371 Mbits/sec    0    130 KBytes       
[  4]  38.02-39.00  sec  43.8 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  39.00-40.01  sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  40.01-41.02  sec  45.0 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  41.02-42.01  sec  43.8 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  42.01-43.03  sec  45.0 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  43.03-44.02  sec  43.8 MBytes   371 Mbits/sec    0    130 KBytes       
[  4]  44.02-45.00  sec  43.8 MBytes   373 Mbits/sec    0    130 KBytes       
[  4]  45.00-46.02  sec  45.0 MBytes   372 Mbits/sec    0    130 KBytes       
[  4]  46.02-46.16  sec  6.25 MBytes   368 Mbits/sec    0    130 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-46.16  sec  2.00 GBytes   372 Mbits/sec    0             sender
[  4]   0.00-46.16  sec  2.00 GBytes   372 Mbits/sec                  receiver

iperf Done.

可以看到,TCP 带宽测试,接收(RX)方向的平均带宽为 439 Mbits/sec,发送方向的平均带宽为 372 Mbits/sec,无论哪个方向,都离 1000Mbps 的很远。
进行 UDP 带宽测试,在 TI AM335x 一侧运行的 iperf 作为 server 时,发现有极高的丢包率

3. 问题简析

先交代下背景,TIAM335x CPU 是单核的,最高频率为 1GHz,系统 RAM256MB。另外,测试当中去掉所有无干或可能产生对测试产生干扰的程序。首先排除硬件相关的问题,譬如网线、变压器、电阻电容,以及上电、复位时序,都有可能造成网络通信异常,但不属于本文案例中的情形。既然排查了硬件问题,剩下的自然是排查软件问题。在进行 TCP 带宽测试时(无论是 server 还是 client 模式运行),首先用 top 观察了基础状况:

# top
Mem: 25632K used, 220844K free, 180K shrd, 0K buff, 1528K cached
CPU:   0% usr  63% sys   0% nic   0% idle   0% io   0% irq  36% sirq
Load average: 0.45 0.17 0.06 3/51 133
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
  132     1 root     R     2084   1%  97% iperf3 -s -D
    9     2 root     SW       0   0%   2% [ksoftirqd/0]
   10     2 root     RW       0   0%   1% [rcu_preempt]
  133   126 root     R     2076   1%   0% top
   16     2 root     IW       0   0%   0% [kworker/0:1-eve]
    7     2 root     IW       0   0%   0% [kworker/u2:0-ev]
  126     1 root     S     2076   1%   0% -sh
  124     1 root     S     2072   1%   0% inetd
    1     0 root     S     1976   1%   0% init
   71     1 root     S     1976   1%   0% /sbin/syslogd -n
   72     1 root     S     1976   1%   0% /sbin/klogd -n
   45     2 root     IW       0   0%   0% [kworker/u2:1-ev]
   13     2 root     SW       0   0%   0% [kdevtmpfs]
   56     2 root     SW       0   0%   0% [ubi_bgt0d]
   57     2 root     SW       0   0%   0% [ubifs_bgt0_0]
    2     0 root     SW       0   0%   0% [kthreadd]
    3     2 root     IW<      0   0%   0% [rcu_gp]
    4     2 root     IW<      0   0%   0% [rcu_par_gp]
    5     2 root     IW       0   0%   0% [kworker/0:0-pm]

从上面看到,RAM 使用上毫无压力;但 iperf 程序的 CPU 的占用率,已经高达97%。进一步从 CPU 那一行的信息:

CPU:   0% usr  63% sys   0% nic   0% idle   0% io   0% irq  36% sirq

可以看到,CPU 消耗在内核空间的时间 sys63%,同时 softirq 耗时占比为 36%,而 io 耗时占比则为 0%。我们知道,网络子系统收发包都会经由 softirq 处理,所以一部分时间消耗在 softirq 上,而另外一部分时间消耗在 Linux 网络协议栈的处理过程中,经过进一步通过 ftrace 分析网络协议栈数据传输流程,验证了这一点。由于某些原因,这部分验证过程的相关信息没能在本文中有所系统。
另一方面,也排查了相关 MAC 和 PHY 驱动:PHY 的排查是通过厂商提供可读调试寄存器,在网络通信后,读取其中的调试信息,没有发现有问题出现;而对 MAC 的排查,除了对其驱动代码进行检验外,也通过 ethtool 导出数据通信后的信息,如下(只取了某侧测试的结果):

# ethtool -S eth0
NIC statistics:
     Good Rx Frames: 1108831
     Broadcast Rx Frames: 63
     Multicast Rx Frames: 2762
     Pause Rx Frames: 0
     Rx CRC Errors: 0
     Rx Align/Code Errors: 0
     Oversize Rx Frames: 0
     Rx Jabbers: 0
     Undersize (Short) Rx Frames: 0
     Rx Fragments: 0
     Rx Octets: 1141776072
     Good Tx Frames: 1104727
     Broadcast Tx Frames: 2
     Multicast Tx Frames: 20
     Pause Tx Frames: 0
     Deferred Tx Frames: 0
     Collisions: 0
     Single Collision Tx Frames: 0
     Multiple Collision Tx Frames: 0
     Excessive Collisions: 0
     Late Collisions: 0
     Tx Underrun: 0
     Carrier Sense Errors: 0
     Tx Octets: 1141221834
     Rx + Tx 64 Octet Frames: 370190
     Rx + Tx 65-127 Octet Frames: 369612
     Rx + Tx 128-255 Octet Frames: 198
     Rx + Tx 256-511 Octet Frames: 34
     Rx + Tx 512-1023 Octet Frames: 6
     Rx + Tx 1024-Up Octet Frames: 1473518
     Net Octets: 2282997906
     Rx Start of Frame Overruns: 0
     Rx Middle of Frame Overruns: 0
     Rx DMA Overruns: 0
     Rx DMA chan 0: head_enqueue: 1
     Rx DMA chan 0: tail_enqueue: 1106205
     Rx DMA chan 0: pad_enqueue: 0
     Rx DMA chan 0: misqueued: 0
     Rx DMA chan 0: desc_alloc_fail: 0
     Rx DMA chan 0: pad_alloc_fail: 0
     Rx DMA chan 0: runt_receive_buf: 0
     Rx DMA chan 0: runt_transmit_bu: 0
     Rx DMA chan 0: empty_dequeue: 0
     Rx DMA chan 0: busy_dequeue: 1071101
     Rx DMA chan 0: good_dequeue: 1106078
     Rx DMA chan 0: requeue: 0
     Rx DMA chan 0: teardown_dequeue: 0
     Tx DMA chan 0: head_enqueue: 369476
     Tx DMA chan 0: tail_enqueue: 735251
     Tx DMA chan 0: pad_enqueue: 0
     Tx DMA chan 0: misqueued: 17
     Tx DMA chan 0: desc_alloc_fail: 0
     Tx DMA chan 0: pad_alloc_fail: 0
     Tx DMA chan 0: runt_receive_buf: 0
     Tx DMA chan 0: runt_transmit_bu: 368846
     Tx DMA chan 0: empty_dequeue: 369476
     Tx DMA chan 0: busy_dequeue: 447896
     Tx DMA chan 0: good_dequeue: 1104727
     Tx DMA chan 0: requeue: 3
     Tx DMA chan 0: teardown_dequeue: 0

从上面的信息也没有看到有问题的地方。每家 MAC 驱动的输出信息都会有所不同,具体的含义要结合驱动代码来看。
通过一系列的排查,最终将问题定为 CPU 资源不够导致。在单核系统下,网卡多队列、RPS、RFS 等等内核内置的优化策略都毫无意义,因为只有一个 CPU 核。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值