在大多数操作系统中都默认开启了PMTUD功能,在发送数据报的时候都设置了DF位,并由此来发现path的mtu,但是由于网络上的种种原因导致PMTUD检测失败出现数据包不能被分片,应用层服务不能正常访问。

当两台远程PC互联的时候,它们的数据需要穿过很多的路由器和各种各样的网络媒介才能到达对端,网络中不同媒介的MTU各不相同,就好比一长段的水管,由不同粗细的水管组成(MTU不同 )通过这段水管最大水量就要由中间最细的水管决定。
     对于网络层的上层协议而言(我们以TCP/IP协议族为例)它们对水管粗细不在意它们认为这个是网络层的事情。网络层IP协议会检查每个从上层协议下来的数据包的大小,并根据本机MTU的大小决定是否作“分片”处理。分片最大的坏处就是降低了传输性能,本来一次可以搞定的事情,分成多次搞定,所以在网络层更高一层(就是传输层)的实现中往往会对此加以注意!有些高层因为某些原因就会要求我这个面包不能切片,我要完整地面包,所以会在IP数据包包头里面加上一个标签:DF(Donot Fragment)。这样当这个IP数据包在一大段网络(水管里面)传输的时候,如果遇到MTU小于IP数据包的情况,转发设备就会根据要求丢弃这个数据包。然后返回一个错误信息给发送者。这样往往会造成某些通讯上的问题,不过幸运的是大部分网络链路都是MTU1500或者大于1500。
     对于UDP协议而言,这个协议本身是无连接的协议,对数据包的到达顺序以及是否正确到达不甚关心,所以一般UDP应用对分片没有特殊要求。
     对于TCP协议而言就不一样了,这个协议是面向连接的协议,对于TCP协议而言它非常在意数据包的到达顺序以及是否传输中有错误发生。所以有些TCP应用对分片有要求---不能分片(DF)。


对于解决此问题:
1,在操作系统中失效PMTUD功能
2.   在Cisco路由器上使用route-map,set df=0,来清除df位。
3.    在cisco路由器接口下调节ip tcp-mss value, 以此来影响tcp在syn阶段mss的协商。

附件:
Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and 
IPSEC.pdf.pdf详细的描述了出现的原因以及解决的方式。

Due to network hardware malfunction, misconfiguration, or software defects, you might observe a condition where small TCP data transfers work without a problem. But large data transfers, ones with full-length packets, hang and then time out. A workaround is to configure the sending nodes to do one or both of these actions:
  • Disable PMTUD.
  • Shrink the TCP MSS and/or the IP MTU in order to reduce the maximum packet size.

Problem Description and Possible Causes

Sometimes, over some IP paths, a TCP/IP node can send small amounts of data (typically less than 1500 bytes) with no difficulty, but transmission attempts with larger amounts of data hang, then time out. Often this is observed as a unidirectional problem in that large data transfers succeed in one direction but fail in the other direction. This problem is likely caused by the TCP MSS value, PMTUD failure, different LAN media types, or defective links. These subsections describe the problems:

TCP MSS Value

The TCP MSS value specifies the maximum amount of TCP data in a single IP datagram that the local system can accept (reassemble). The IP datagram can be fragmented into multiple packets when sent. Theoretically, this value can be as large as 65495, but such a large value is never used. Typically, an end system uses the "outgoing interface MTU" minus 40 as its reported MSS. For example, an Ethernet MSS value is 1460 (1500 - 40 = 1460).

PMTUD Failure

PMTUD is an algorithm described in  RFC 1191  leavingcisco.com and implemented in recent TCP/IP stacks. This algorithm attempts to discover the largest IP datagram that can be sent without fragmentation through an IP path and maximizes data transfer throughput.
PMTUD is implemented when you have an IP sender set the "Don't Fragment" (DF) flag in the IP header. If an IP packet with this flag set reaches a router whose next-hop link has too small an MTU to send the packet without fragmentation, that router discards that packet and sends an ICMP "Fragmentation needed but DF set" error to the IP sender. When the IP sender receives this Internet Control Message Protocol (ICMP) message, it learns to use a smaller IP MTU for packets sent to this destination, and subsequent packets are able to get through.
Various problems can cause the PMTUD algorithm to fail. The IP sender never learns the smaller path MTU but continues unsuccessfully to retransmit the too-large packet, until the retransmissions time out. Some problems include:
  • The router with the too-small next hop path fails to generate the necessary ICMP error message.
  • A router in the reverse path between the small-MTU router and the IP sender discards the ICMP error message before it can reach the IP sender.
  • Confusion in the IP sender's stack in which it ignores the received ICMP error message.
A workaround for these problems is to configure the IP sender to disable PMTUD. This causes the IP sender to send their datagrams with the DF flag clear. When the large packets reach the small-MTU router, that router fragments the packets into multiple smaller ones. The smaller, fragmented data reaches the destination where it is reassembled into the original large packet.