https://access.redhat.com/solutions/2217521
SOLUTION 已验证 - 已更新 2018年八月13日12:45 -
环境
- Red Hat Enterprise Linux (all versions)
- Bonding or Teaming
- Large streaming TCP traffic such as NFS, Samba/CIFS, ISCSI, rsync over SSH/SCP, backups
问题
- What is the best bonding mode for TCP traffic such as NFS and Samba/CIFS?
- NFS repeatedly logs
nfs: server not responding, still trying
when no network issue is present - A packet capture displays many TCP retransmission, TCP Out-of-order, RPC retransmission, when there should be no reason for this.
决议
Use a bonding mode which guarantees in-order delivery of TCP traffic such as:
- Bonding Mode 1 (
active-backup
) - Bonding Mode 2 (
balance-xor
) - Bonding Mode 4 (
802.3ad
aka LACP) - Bonding Mode 5 (
balance-tlb
) withtlb_dynamic_lb=0
- Bonding Mode 6 (
balance-alb
) withtlb_dynamic_lb=0
Note that Bonding Mode 2 (balance-xor
) requires an EtherChannel or similar configured on the switch, and Mode 4 (802.3ad
) requires an EtherChannel with LACP on the switch. Bonding Mode 1 (active-backup
) requires no switch configuration.
Bonding Modes 5 (balance-tlb
) and 6 (balance-alb
) do not require switch configuration. Mode 5 has no capability to balance traffic back into the bond. Mode 6 balances transmit by intercepting ARP requests, so may not be suitable for all situations such as where traffic mostly goes through a default gateway.
For advice on configuring bonding, refer to How do I configure a bonding device on Red Hat Enterprise Linux (RHEL)?
For advice on picking a specific hash policy for your traffic, refer to Why are all interfaces not used in bonding Mode 2 or Mode 4?
根源
The following bonding modes:
- Bonding Mode 0 (
round-robin
) - Bonding Mode 3 (
broadcast
) - Bonding Mode 5 (
balance-tlb
) withtlb_dynamic_lb=1
- Bonding Mode 6 (
balance-alb
) withtlb_dynamic_lb=1
Do not guarantee in-order delivery of TCP streams, as each packet of a stream may be transmitted down a different slave, and no switch guarantees that packets received in different switchports will be delivered in order.
Given the following example configuration:
.---------------------------.
| bond0 in 0 (round-robin) |
'---------------------------'
| eth0 | eth1 | eth2 | eth3 |
'--=---'--=---'---=--'---=--'
| | | |
| | | |
.--=------=-------=------=--.
| switch |
'---------------------------'
The bond system may send traffic out each slave in a correct order, like ABCD ABCD ABCD
, but the switch may forward this traffic in any random order, like CADB BDCA DACB
.
As TCP on the receiver expects to be presented a TCP stream in-order, this causes the receiver to think it's missed packets and request retransmissions, to spend a great deal of time reassembling out-of-order traffic in to be in the correct order, and for the sender to waste bandwidth sending retransmissions which are not really required.
The following bonding modes:
- Bonding Mode 1 (
active-backup
) - Bonding Mode 2 (
balance-xor
) - Bonding Mode 4 (
802.3ad
aka LACP)
Avoid this issue by transmitting traffic for one destination down the one slave. Mode 2 and Mode 4's balancing algorithm can be altered by the xmit_hash_policy
bonding option, but they will never balance a single TCP stream down different ports, and so will avoid the problematic behaviour discussed above.
It is not possible to effectively balance a single TCP stream across multiple bonding or teaming devices. If higher speed is required for a single stream, then faster interfaces (and possibly faster network infrastructure) must be used.
This theory applies to all TCP streams. The most common occurrences of this issue are seen on high-speed long-lived TCP streams such as NFS, Samba/CIFS, ISCSI, rsync over SSH/SCP, and so on.
诊断步骤
Inspect syslog for nfs: server X not responding, still trying
and nfs: server X OK
messages when there are no other network issues.
Inspect a packet capture for many occurrences of TCP retransmission, TCP Out-of-Order, RPC retransmission, or other similar messages.
Inspect bonding mode in /proc/net/bonding/bondX
.