Delayed outgoing packets causing NFS timeouts

Environment

SUSE Linux Enterprise Server 15 SP2
SUSE Linux Enterprise Server 12 SP5
 

Situation

This problem manifests itself in several ways; the one observed in the field was Linux NFS client timing out, with the following message logged in the system log:

        nfs: server *HOSTNAME* not responding, still trying

and after several minutes

        nfs: server *HOSTNAME* OK

Because of an in-kernel retransmit timer (or another packet being queued), the stuck packet will eventually be sent out, after a delay.

In tcpdump packet capture analysis, this problem can be identified by spurious resend attempts of the same packet (with equal TSVal) a long time apart.

Resolution

This problem has been reported upstream [3] and the proper fix is still in the works by the Linux Kernel community.

SUSE has released kernel maintanance update that will mitigate this problem by disabling the lockless optimization on pfifo_fast qdisc (which is the only qdisc currently making use of this optimization) [4].
The issue is solved in the following kernel versions:

  • SLES15 SP2: 5.3.18-24.61
  • SLES12 SP5: 4.12.14-122.66

Alternatively, to address the problem without the kernel update / reboot, the problem can be mitigated by switching away form pfifo_fast qdisc and making sure the change stays functional across reboots by the following commands:

  echo 'net.core.default_qdisc = fq_codel' >>/etc/sysctl.conf
  sysctl -w net.core.default_qdisc=fq_codel
  tc qdisc add dev $devname root handle 1: mq
  tc qdisc del dev $devname root

In case the $devname above is not a multiqueue-capable device, the following commands have to be used instead:

  echo 'net.core.default_qdisc = fq_codel' >>/etc/sysctl.conf
  sysctl -w net.core.default_qdisc=fq_codel
  tc qdisc add dev $devname root handle 1: fq_codel
  tc qdisc del dev $devname root

Cause

Linux kernel implements various algorithms -- called queuing disciplines (qdiscs) for scheduling outgoing network packets. Starting with Linux Kernel 4.16, there is an optimization that allows for these algorithms to process the packets without acquiring any locks, with the ultimate goal of improving throughput. These changes have been implemented upstream in commits [1] [2], and those commits have been backported by SUSE to SLE12-SP5 and SLE15-SP2 codestreams.

However, the lockless optimization has a design flaw which (under certain very specific circumstances) opens a window for a race condition that causes the "last" packet in the queue to be stuck (and not sent out to the wire) for a potentially unbound amount of time, causing network stalls.

Additional Information

[1] kernel/git/torvalds/linux.git - Linux kernel source tree
[2] kernel/git/torvalds/linux.git - Linux kernel source tree
[3] Packet gets stuck in NOLOCK pfifo_fast qdisc
[4] https://github.com/openSUSE/kernel-source/commit/1c59b584ef0cc166f6f5c9f8ed6f47e2e811e1c0
[5] https://github.com/openSUSE/kernel-source/commit/3aa0c01fad38360cc9cd840d49bdfdc565e2e718

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值