【研发问题系列】e1000网卡异常

问题现象:

服务器单板上主从CPU通信,主CPU执行systemctl restart network之后发现从CPU网卡驱动报了异常

[49549.895528] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[49820.809561] e1000e: eth0 NIC Link is Down
[49831.436862] e1000e: eth0 NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx
[49831.436867] e1000e 0000:00:1f.6 eth0: 10/100 speed: disabling TSO
[49847.586075] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
TDH <87>
TDT <97>
next_to_use <97>
next_to_clean <87>
buffer_info[next_to_clean]:
time_stamp <102f3ce93>
next_to_watch <87>
jiffies <102f40da4>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <7800>
PHY Extended Status <3000>
PCI Status <10>
[49849.586002] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
TDH <87>
TDT <97>
next_to_use <97>
next_to_clean <87>
buffer_info[next_to_clean]:
time_stamp <102f3ce93>
next_to_watch <87>
jiffies <102f41574>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <7800>
PHY Extended Status <3000>
PCI Status <10>
[49849.887810] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[49849.888980] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[49849.890104] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[49849.891214] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[49851.585989] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
TDH <87>
TDT <97>
next_to_use <97>
next_to_clean <87>
buffer_info[next_to_clean]:
time_stamp <102f3ce93>
next_to_watch <87>
jiffies <102f41d44>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <7800>
PHY Extended Status <3000>
PCI Status <10>
[49853.585948] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
TDH <87>
TDT <97>
next_to_use <97>
next_to_clean <87>
buffer_info[next_to_clean]:
time_stamp <102f3ce93>
next_to_watch <87>
jiffies <102f42514>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <7800>
PHY Extended Status <3000>
PCI Status <10>
[49855.585869] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
TDH <87>
TDT <97>
next_to_use <97>
next_to_clean <87>
buffer_info[next_to_clean]:
time_stamp <102f3ce93>
next_to_watch <87>
jiffies <102f42ce4>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <7800>
PHY Extended Status <3000>
PCI Status <10>
[49856.589607] ------------[ cut here ]------------
[49856.589634] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x21b/0x230()
[49856.589637] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
[49856.589638] Modules linked in: iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ntb_netdev(O) ntb_hw_plx86xx(O) ntb_transport(O) ntb(O) dma_hw_plx86xx(O) eapi_drv(O) x86_pkg_temp_thermal i2c_i801 i915 coretemp ip_tables
[49856.589650] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.4.0 #-svn188955
[49856.589652] Hardware name: Default string Default string/SKYBAY, BIOS 0.8.0-DH12 12/15/2018
[49856.589654] ffffffff81ea7075 ffff88046dc03dc0 ffffffff81333518 ffff88046dc03e08
[49856.589657] ffff88046dc03df8 ffffffff81054671 0000000000000000 ffff880436f48000
[49856.589659] 0000000000000001 0000000000000000 ffff880436f48000 ffff88046dc03e58
[49856.589662] Call Trace:
[49856.589664] [] dump_stack+0x44/0x5c
[49856.589671] [] warn_slowpath_common+0x81/0xc0
[49856.589674] [] warn_slowpath_fmt+0x47/0x50
[49856.589677] [] dev_watchdog+0x21b/0x230
[49856.589680] [] ? dev_deactivate_queue.constprop.34+0x60/0x60
[49856.589684] [] call_timer_fn+0x30/0xe0
[49856.589686] [] ? dev_deactivate_queue.constprop.34+0x60/0x60
[49856.589690] [] run_timer_softirq+0x1d2/0x280
[49856.589692] [] __do_softirq+0xc7/0x240
[49856.589695] [] irq_exit+0x86/0x90
[49856.589698] [] smp_apic_timer_interrupt+0x3d/0x50
[49856.589701] [] apic_timer_interrupt+0x7f/0x90
[49856.589702] [] ? cpuidle_enter_state+0xac/0x210
[49856.589708] [] ? cpuidle_enter_state+0x8f/0x210
[49856.589711] [] cpuidle_enter+0x12/0x20
[49856.589714] [] call_cpuidle+0x2d/0x50
[49856.589717] [] ? cpuidle_select+0xe/0x10
[49856.589720] [] cpu_startup_entry+0x20c/0x2c0
[49856.589723] [] rest_init+0x77/0x80
[49856.589726] [] start_kernel+0x435/0x442
[49856.589728] [] ? set_init_arg+0x55/0x55
[49856.589731] [] x86_64_start_reservations+0x2a/0x2c
[49856.589734] [] x86_64_start_kernel+0xea/0xed
[49856.589736] —[ end trace e4ec7fa8f18d3ca9 ]—
[49856.590327] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
[49860.114975] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[50149.892722] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[50149.893841] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[50149.894981] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[50149.896097] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[50449.870398] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[50449.873351] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[50449.874480] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[50449.875602] IPv6: ADDRCONF(NETDEV_UP): irlan0: link is not ready
[50465.128222] systemd-journald[1589]: /run/log/journal/edc2b61d888c48a5801a7e759bc51f46/system.journal: Allocation limit reached, rotating.
[50465.128227] systemd-journald[1589]: Rotating…
[50465.129108] systemd-journald[1589]: Reserving 181418 entries in hash table.
[50465.133203] systemd-journald[1589]: Vacuuming…
[50465.133278] systemd-journald[1589]: Deleted archived journal /run/log/journal/edc2b61d888c48a5801a7e759bc51f46/system@7796662e0cbb4065b78909299601f32a-00000000000cd4df-0005bd9446c359e9.journal (99.6M).
[50465.133282] systemd-journald[1589]: Vacuuming done, freed 99.6M of archived journals on disk.
[50465.133285] systemd-journald[1589]: Retrying write.
[root@MCU9530_v2 ~]#

问题分析
1) 该问题的现象除了从机等了3分钟以上才注册上来;还有之前提到过的主机ssh登录从CPU提示输入密码之后需要等好久才能登录问题;以及从CPU的eth0缺失192.168.17*.**的ip 2个问题现象

2) 由于web配置eno2会重启整个主CPU的网络功能,而从CPU的eth0的ip是通过dhcp request到主机之后获得的,主从节点通信的网络肯定会中断几秒钟

3) 查看发现有一个CPU不会主动发dhcp request,且网卡驱动有报错,另外一个从CPU正常,属于必现,查看上面的日志

4) 目前修改从CPU的eth0 ip获取策略,系统启动之后,固定其ip,无需再通过dhcp request获取;
但是有1个从CPU在主CPU网络功能重启之后会重启网卡驱动(需要厂家介入分析两个从CPU内部网络的差异),1分钟左右才能恢复网络功能,注册服务成功

  • 4
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值