tcp - 长肥管道 rfc1323

1. rfc1323

匹配高性能的网络,tcp在高带宽和高延迟的情节下存在的问题。

2. 长肥管道

  TCP  performance problems arise when the bandwidth*delay product is
  large.  We refer to an Internet path operating in this region as a
  "long, fat pipe", and a network containing this path as an "LFN"
  (pronounced "elephan(t)")

3. 长肥管道的问题

3.1 窗口大小限制问题

tcp的窗口大小2字节(64k),对于长肥管道不够,通过tcp option来增加。

     To circumvent this problem, Section 2 of this memo defines a
     new TCP option, "Window Scale", to allow windows larger than
     2**16.  This option defines an implicit scale factor, which
     is used to multiply the window size value found in a TCP
     header to obtain the true window size.

     This option may be sent in an initial <SYN> segment (i.e., a
     segment with the SYN bit on and the ACK bit off).  It may also
     be sent in a <SYN,ACK> segment, but only if a Window Scale op-
     tion was received in the initial <SYN> segment.  A Window Scale
     option in a segment without a SYN bit should be ignored.
     The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment
     itself is never scaled.

    TCP Window Scale Option (WSopt):
    Kind: 3 Length: 3 bytes
            +---------+---------+---------+
            | Kind=3  |Length=3 |shift.cnt|
            +---------+---------+---------+

3.2 丢包恢复

tcp对于长肥管道来说,重传是个大问题,如果头部丢失,导致大量rto超时,这里需要解决。
快速恢复和快速重传解决一个包的丢失问题,超过1个包会导致重传超时和慢启动(这里是早期的拥塞控制算法)。
选择重传用来解决快速恢复/重传的多丢包问题

       Recently, the Fast Retransmit and Fast Recovery
       algorithms [Jacobson90c] have been introduced.  Their
       combined effect is to recover from one packet loss per
       window, without draining the pipeline.  However, more than
       one packet loss per window typically results in a
       retransmission timeout and the resulting pipeline drain and
       slow start.

       To generalize the Fast Retransmit/Fast Recovery mechanism to
       handle multiple packets dropped per window, selective
       acknowledgments are required. 
      
       However, in the non-LFN
       regime, selective acknowledgments reduce the number of
       packets retransmitted but do not otherwise improve
       performance, making their complexity of questionable value.
       However, selective acknowledgments are expected to become
       much more important in the LFN regime.

3.3 rtt的测量

长肥管道第三个问题是rtt的测量,长肥管道如果出现大量重传,rtt无法测试,ack确认后无法确认是第一次到达还是重传到达,在长时间的拥塞情况下,rtt无法更新,导致rto不准,影响重传问题。

4. 解决的思路

4.1 seq重复的解决思路

tcp的seq重复,影响tcp的可靠性。

   Duplication of sequence numbers might happen in either of two
  ways:
  (1)  Sequence number wrap-around on the current connection
       A TCP sequence number contains 32 bits.  At a high enough
       transfer rate, the 32-bit sequence space may be "wrapped"
       (cycled) within the time that a segment is delayed in queues.

  (2)  Earlier incarnation of the connection
       Suppose that a connection terminates, either by a proper
       close sequence or due to a host crash, and the same
       connection (i.e., using the same pair of sockets) is
       immediately reopened.  A delayed segment from the terminated
       connection could fall within the current window for the new
       incarnation and be accepted as valid.

解决思路

  问题1)出现环绕可能性 (the bandwidth B)
                B > 16G/MSL (bps) = 16G/120 ~= 130M bps,
  可以通过64位(这个不兼容之前的tcp)或者paws通过timestamp option来解决。
                 2**31 / B  <  MSL (secs)                     [1]

  A possible fix for the problem of cycling the sequence space would
  be to increase the size of the TCP sequence number field.  For
  example, the sequence number field (and also the acknowledgment
  field) could be expanded to 64 bits.  This could be done either by
  changing the TCP header or by means of an additional option.

  PAWS uses the TCP Timestamps option
  defined in Section 4 to protect against old duplicates from the
  same connection.

   问题2)可以通过2msl来解决,这里如果不等待2msl,比如一些快速回收机制,
   极端情况可能导致接收数据不是自身的数据问题

4.2 tcp timestamp option的问题

头部20字节增加12字节(1 kind + 1 length + 4 timestamp + 4 echo time + 2 align),这种overhead的收益要在大于减少重传比例的时候才有正收益!现在的tcp header已经越来越大了,如果网络不佳,这个option是可以去掉的。

5. rtt测量方法

统计学方法和rttm机制

  A good RTT estimator with a conservative retransmission timeout
  calculation can tolerate aliasing when the sampling frequency is
  "close" to the data frequency.   For example, with a window of 8
  packets, the sample rate is 1/8 the data frequency -- less than an
  order of magnitude different.  However, when the window is tens or
  hundreds of packets, the RTT estimator may be seriously in error,
  resulting in spurious retransmissions.

  Using TCP options, the sender places
  a timestamp in each data segment, and the receiver reflects these
  timestamps back in ACK segments.  Then a single subtract gives the
  sender an accurate RTT measurement for every ACK segment (which
  will correspond to every other data segment, with a sensible
  receiver).  We call this the RTTM (Round-Trip Time Measurement)
  mechanism.
  
  A TSecr value received in a segment is used to update the
  averaged RTT measurement only if the segment acknowledges
  some new data, i.e., only if it advances the left edge of the
  send window.

  TCP Timestamps Option (TSopt):
  Kind: 8
  Length: 10 bytes
      +-------+-------+---------------------+---------------------+
      |Kind=8 |  10   |   TS Value (TSval)  |TS Echo Reply (TSecr)|
      +-------+-------+---------------------+---------------------+
          1       1              4                     4

5.1 tcp应该回哪个包

1. delayed acks -- 最早未确认的包,这里rtt比实际的高
       Many TCP's acknowledge only every Kth segment out of a group
       of segments arriving within a short time interval; this
       policy is known generally as "delayed ACKs".  The data-sender
       TCP must measure the effective RTT, including the additional
       time due to delayed ACKs, or else it will retransmit
       unnecessarily.  Thus, when delayed ACKs are in use, the
       receiver should reply with the TSval field from the earliest
       unacknowledged segment.
2. A hole in the sequence space  -- select ack 最晚的包,这里以最准确的rtt来回
       The lost segment is probably a sign of congestion, and in
       that situation the sender should be conservative about
       retransmission.  Furthermore, it is better to overestimate
       than underestimate the RTT.  An ACK for an out-of-order
       segment should therefore contain the timestamp from the most
       recent segment that advanced the window.
3. fill hole -- 必须回这段,最准确的来回
       The segment that fills the hole represents the most recent
       measurement of the network characteristics.  On the other
       hand, an RTT computed from an earlier segment would probably
       include the sender's retransmit time-out, badly biasing the
       sender's average RTT estimate.  Thus, the timestamp from the
       latest segment (which filled the hole) must be echoed.

5.2 PAWS: PROTECT AGAINST WRAPPED SEQUENCE NUMBERS ( 如何去重seq)

timestamp除了计算rtt,还用来检查包的合法性,时间戳的递增关系,用于丢包

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值