TCP调优 - 背景知识

最新推荐文章于 2022-07-21 07:22:40 发布

weixin_34221112

最新推荐文章于 2022-07-21 07:22:40 发布

阅读量165

点赞数

文章标签：网络操作系统

原文链接：https://my.oschina.net/astute/blog/92969

版权

为什么80%的码农都做不了架构师？>>>

原文：http://fasterdata.es.net/host-tuning/background/

概述

恰当的主机调优能带来多大100倍的性能提升。

下面陈述原因。

TCP Buffer调整

TCP使用一种称为“拥塞窗口”（congestion window，CWND），来确定一次可以发送多少包。拥塞窗口越大，吞吐量越大。TCP慢启动（slow start）和拥塞避免（congestion avoidance）算法来确定拥塞窗口的大小。拥塞窗口的最大值跟内核为每一个 socket 分配的缓冲区数量有关。每一个 socket，缓冲区大小有一个默认值，在打开 socket 之前，可以通过系统调用来改变缓冲区大小（setsockopt）。也有一个内核控制的最大缓冲区大小。socket的发送和接收端都可以调整缓冲区大小。

为了获得最大的吞吐量，为你当前使用的链路选择最优的TCP发送和接收缓冲区大小是至关重要的。如果缓冲区太小，TCP拥塞窗口不能完全打开。如果接收缓冲区太大，TCP流控（Flow Control）失效，发送会溢出接收，这会导致TCP关闭窗口。这种情况可能发生在发送主机比接收主机更快。只要你有足够的内存，过大的发送缓冲区通常并不是一个问题。

最优的缓冲区大小是带宽时延积的两倍

buffer size = 2 * bandwidth * delay

ping 可以用来获取时延。确定两点间的容量（线路中最慢节点的贷款）是非常困难的，可能需要你询问路径中各种网络的容量。像 pathrate 这种工具会测量出网络容量的一个估计值。因为 ping 给出的是 RTT，所以上面的公式可以被替换为：

buffer size = bandwidth * RTT

例如，ping rtt是50ms，端到端的网络包含只有1G或10G的以太网，TCP 缓冲区应该是：

.05 sec * (1 Gbit / 8 bits) = 6.25 MBytes.

历史上为了充分利用带宽，需要应用指定缓冲区大小，通过 setsockopt 来设置。幸运的是，Linux，BSD，Windowns，Mac OS现在都支持 TCP 自动调整（autotuning），因此用户不用再担心设置默认的缓冲区大小。

TCP 自动调整

从Linux 2.6, Mac OSX 10.5, Windows Vista, and FreeBSD 7.0开始，发送和接收都可以自动调整，不需要手动设置缓冲区大小。然而对于许多高速网络来说，最大缓冲区大小仍就太小，必须根据操作系统的描述增加。

TCP 拥塞避免算法

好多年来，在所有的TCP实现中 reno 都是默认的拥塞避免算法。然而随着网络变得越来越快，reno算法变得不再适用于大的带宽时延积网络。为了解决这个问题，开发了许多拥塞避免算法，包括：

· reno: Traditional TCP used by almost all other operating systems. (default)

· cubic: CUBIC-TCP

· bic: BIC-TCP

· htcp: Hamilton TCP

· vegas: TCP Vegas

· westwood: optimized for lossy networks

大多数Linux发行版都默认使用 cubic，windows使用 compound tcp。如果你使用老版本的Linux，确信要替换默认的 reno 到 cubic 或 htcp。

原文：

TCP turning - Background Information

Overview

Proper host tuning can lead to up to 100x performance increases.
Here are the reasons why.

TCP Buffer Sizing

TCP uses what is called the "congestion window", or CWND, to determine how many packets can be sent at one time. The larger the congestion window size, the higher the throughput. The TCP "slow start" and "congestion avoidance" algorithms determine the size of the congestion window. The maximum congestion window is related to the amount of buffer space that the kernel allocates for each socket. For each socket, there is a default value for the buffer size, which can be changed by the program using a system library call just before opening the socket. There is also a kernel enforced maximum buffer size. The buffer size can be adjusted for both the send and receive ends of the socket.
To get maximal throughput it is critical to use optimal TCP send and receive socket buffer sizes for the link you are using. If the buffers are too small, the TCP congestion window will never fully open up. If the receiver buffers are too large, TCP flow control breaks and the sender can overrun the receiver, which will cause the TCP window to shut down. This is likely to happen if the sending host is faster than the receiving host. Overly large windows on the sending side is not usually a problem as long as you have excess memory.
The optimal buffer size is twice the bandwidth*delay product of the link:

buffer size = 2 * bandwidth * delay

The ping program can be used to get the delay. Determining the end-to-end capacity (the bandwidth of the slowest hop in your path) is trickier, and may require you to ask around to find out the capacity of various networks in the path. Tools such as pathrate will also give you an estimate of the network capacity. Since ping gives the round trip time (RTT), this formula can be used instead of the previous one:

buffer size = bandwidth * RTT

For example, if your ping time is 50 ms, and the end-to-end network consists of all 1G or 10G Ethernet, the TCP buffers should be:

.05 sec * (1 Gbit / 8 bits) = 6.25 MBytes.

Historically in order get full bandwidth required the the user to specify the buffer size for the network path being used, and the the application programmer had to set use the SO_SNDBUF and SO_RCVBUF options of the BSD setsockopt() call to set the buffer size for the sender an receiver. Luckily Linux, FreeBSD, Windows, and Mac OSX all now support TCP autotuning, so you no longer need to worry about setting the default buffer sizes.

More details on TCP buffer sizing can be found in my LAMP article and in the PSC TCP Tuning Guide.

TCP Autotuning

Beginning with Linux 2.6, Mac OSX 10.5, Windows Vista, and FreeBSD 7.0, both sender and receiver autotuning became available, eliminating the need to set the TCP send and receive buffers by hand for each path. However the maximum buffer sizes are still too small for many high-speed network path, and must be increased as described on the pages for each operating system.
For more information see: http://www.psc.edu/networking/projects/auto/

TCP Autotuning Maximum

guidelines on what to set the max autotuning size too.
TCP Congestion Avoidance Algorithms
The TCP reno congestion avoidance algorithm was the default in all TCP implementations for many years. However, as networks got faster and faster it became clear that reno would not work well for high bandwidth delay product networks. To address this a number of new congestion avoidance algorithms were developed, including:
reno: Traditional TCP used by almost all other operating systems. (default)
cubic: CUBIC-TCP
bic: BIC-TCP
htcp: Hamilton TCP
vegas: TCP Vegas
westwood: optimized for lossy networks
Most Linux distributions now use cubic by default, and Windows now uses compound tcp. If you are using an older version of Linux, be sure to change the default from reno to cubic or htcp.
More details on can be found at: http://en.wikipedia.org/wiki/TCP_congestion_avoidance_algorithm