Optimizing Your Linux Stack for Maximum Mobile Web Performance[优化Linux 协议栈提升移动互联网性能]

http://blog.chunshengster.me/2013/12/optimizing_your_linux_stack_for_maximum_mobile_web_performance.html


原文请见:  http://blog.cloudflare.com/optimizing-the-linux-stack-for-mobile-web-per

 

下面是这篇是我们系统工程师团队的lan Applegate (@AppealingTea)写的技术文章,关于如何优化Linux TCP 栈来优化手机访问性能。这篇文章最早发布在  2012 Web Performance Calendar. 在 CloudFlare,我们花了大量的时间来优化网络栈以保证无论什么设备从无论什么网络访问的我们都是最优化的。我们想分享这些技术细节来帮助那些正在寻找移动网络性能优化的公司,即便他们不使用CloudFlare。当然,如果你正在使用CloudFlare,当手机用户访问你的网站的时候你已经获得了最快的TCP效率。

 

We spend a lot of time at CloudFlare thinking about how to make the Internet fast on mobile devices. Currently there are over 1.2 billion active mobile users and that number is growing rapidly. Earlier this year mobile Internet access passed fixed Internet access in India and that’s likely to be repeated the world over. So, mobile network performance will only become more and more important.

我们(CloudFlare)用了很多时间来思考手机设备如何获得最快的网络速度。现在,已经有12亿活跃的手机用户并且还在迅速增长。今年(2012)早些时候,印度的移动互联网接入用户已经超过固网接入用户,这样的情况将在全球蔓延。因此,手机网络的性能优化变得越来越重要。

Most of the focus today on improving mobile performance is on Layer 7 with front end optimizations(最佳化) (FEO). At CloudFlare, we’ve done significant work in this area with front end optimization technologies like Rocket Loader, Mirage, and Polish that dynamically(动态地) modify(修改) web content to make it load quickly whatever device is being used. However, while FEO is important to make mobile fast, the unique(独特的) characteristics(特征) of mobile networks also means we have to pay attention to the underlying(潜在的) performance of the technologies down at Layer 4 of the network stack.

目前大多数移动网络的优化聚焦于7层(前端服务器,FEO)的优化。在CloudFlare,我们在前端服务器上也花了很多的时间来左右话,比如说:Rocket Loader, Mirage, and Polish,无论什么样的设备来访问,我们都能动态的修改页面使得内容能够被快速的展现。然而,尽管在手机访问优化方面FEO很重要,但是移动网络的特征同样意味着我们必须在4层的网络栈优化上,也许会有一些技术方式能够提升性能。

This article is about the challenges mobile devices present, how the default TCP configuration(配置) is ill-suited for optimal(最佳的) mobile performance, and what you can do to improve performance for visitors connecting via mobile networks. Before diving into the details, a quick technical note. At CloudFlare, we’ve build most of our systems on top of a custom version of Linux so, while the underlying technologies can apply to other operating systems, the examples I’ll use are from Linux.

这篇文章针对于现在的移动设备思考,什么样的默认TCP策略,能够适应手机获得最佳的网络性能,并且针对通过移动网络访问的用户我们如何操作来提升性能。在深入细节之前,我们先来快速了解一下这个:在CloudFlare,我们大多数系统都是在定制的Linux上构建的,虽然底层的技术可以适用于其他的操作系统。不过,下面的例子都是来源于Linux环境。

TCP Congestion Control

TCP拥塞控制

To understand the challenges of mobile network performance at Layer 4 of the networking stack you need to understand TCP Congestion(拥挤) Control. TCP Congestion Control is the gatekeeper that determines how to control the flow of packets from your server to your clients. Its goal is to prevent Internet congestion by detecting(察觉) when congestion occurs and slowing down the rate data is transmitted(传输). This helps ensure that the Internet is available to everyone, but can cause problems on mobile network when TCP mistakes mobile network problems for congestion.

为了更好的理解4层网络栈对移动网络性能优化的挑战,我们需要首先理解TCP的拥塞控制机制。TCP的拥塞控制机制是控制数据包如何从你的服务器到客户端流动的看门人。他的目标就是在拥塞发生时发现它并且减慢数据包的传输速率,来达到防止网络拥塞发生的目的。这种机制帮助我们确保网络对于每个节点都是可用的,但这在移动移动网络中,当TCP机制误解了移动网络拥塞问题的时候却会引发一些问题。

 

TCP Congestion Control holds back the floodgates if it detects congestion (i.e. packet loss) on the remote end. A network is, inherently, a shared resource. The purpose of TCP Congestion Control was to ensure that every device on the network cooperates to not overwhelm its resource. On a wired network, if packet loss is detected it is a fairly reliable indicator that a port along the connection is overburdened. What is typically going on in these cases is that a memory buffer in a switch somewhere has filled beyond its capacity because packets are coming in faster than they can be sent out and data is being discarded. TCP Congestion Control on clients and servers is setup to “back off” in these cases in order to ensure that the network remains available for all its users.

TCP拥塞会在察觉与远端通讯产生拥塞的时候控制住数据闸门。而网络,是一个共享的资源。TCP的目标就是确保网络中的每一台设备都是合作状态,而不会压跨它占有的资源。在无线网络中,当丢包发生时,可以非常肯定的认为是这个连接中的某一个端口负载过重了。典型的状况是某台交换设备的流入速率远大于流出速率,导致它的内存缓冲用满了,从而数据被丢弃掉。TCP拥塞控制就是在这种情况下,通过服务器和客户端的控制“回退”,以确保网络对于其他的所有用户仍然可用。

But figuring out what packet loss means on a mobile network is a different matter. Radio networks are inherently susceptible to interference which results in packet loss. If pakcets are being dropped does that mean a switch is overburdened, like we can infer on a wired network? Or did someone travel from an undersubscribed wireless cell to an oversubscribed one? Or did someone just turn on a microwave? Or maybe it was just a random solar flare? Since it’s not as clear what packet loss means on a mobile network, it’s not clear what action a TCP Congestion Control algorithm should take.

但是,必须要指出的是,移动网络中发生的丢包现象却意味着其他的可能性。无线网络天然就很容易因为收到影响、干扰而产生丢包。如果数据包被丢弃确实意味着交换机过载,我们能否在无线网络中也这样推断呢?或者是,一个人从一个不过载的无线基站转移到了一个过载的无线基站?或者是某人刚好这时打开了无线微波?或者可能很随机的?正因为无线网络中的丢包原因不是那样的明确,所以,我们同样不能够明确TCP拥塞控制算法应该如何处理这个问题。

A Series of Leaky Tubes

一系列崩漏的水管

To optimize networks for lossy networks like those on mobile networks, it’s important to understand exactly how TCP Congestion Control algorithms are designed. While the high level concept makes sense, the details of TCP Congestion Control are not widely understood by most people working in the web performance industry. That said, it is an important core part of what makes the Internet reliable and the subject of very active research and development.

为了优化像移动网络这样的容易受到干扰而产生损耗的网络,正确的理解TCP拥塞算法的设计是非常重要的。从一个更高层次的角度来讲,TCP拥塞控制算法的细节并不为广大从事WEB性能优化的人们所广泛知晓。这就是说,这是网络可靠性中是非常重要、核心的部分,也是当前研究和开发的活跃话题。

 

To understand how TCP Congestion Control algorithms work, imagine the following analogy. Think of your web server as your local water utility plant. You’ve built on a large network of pipes in your hometown and you need to guarantee that each pipe is as pressurized as possible for delivery, but you don’t want to burst the pipes. (Note: I recognize the late Senator Ted Stevens got a lot of flack for describing the Internet as a “series of tubes,” but the metaphor is surprisingly accurate.)

为了理解TCP拥塞控制算法的工作方式,想象一下下面的一个可以类比的场景。想像你的WEB服务器就像当地的自来水厂。你已经在你的家乡建立了非常庞大的管线网络,并且你必须确保你的管线尽可能的密封起来,以保证水的传送,但是,你并不愿意水管发生甭裂。(备注:我了解到 Ted Stevens在很多的宣传中都说互联网就像“一系列的水管”,不过,这比喻实在是太恰当了)

Your client, Crazy Arty, runs a local water bottling plant that connects to your pipe network. Crazy Arty’s infrastructure is built on old pipes that are leaky and brittle. For you to get water to them without bursting his pipes, you need to infer the capability of Crazy Arty’s system. If you don’t know in advance then you do a test — you send a known amount of water to the line and then measure the pressure. If the pressure is suddenly lost then you can infer that you broke a pipe. If not, then that level is likely safe and you can add more water pressure and repeat the test. You can iterate this test until you burst a pipe, see the drop in pressure, write down the maximum water volume, and going forward ensure you never exceed it.

你的客户,Crazy Arty,在当地经营一家水灌装厂,连接在你的水管网络中。Crazy Arty的基础设施是用老旧的水管建造的,非常容易崩漏。所有,如果你希望你的水传送到他那里而不崩裂掉他的管道,你就必须能够获知Crazy Arty的水管的能力。如果你事先不知道这些,那么你会做一个测试:你会发送已知流量的水到管线中,并且测试它的压力。如果测试过程中压力突然消失掉,你就能推断出你已经崩裂了水管。如果还没有,这就意味着当前的流量水平是安全的,并且你可以继续增加水的流量压力,以此方式重复测试。你可以重复这样的测试直到你崩掉了水管,发现水管突然没有了压力,然后记下最大的水量,以后确保不再超过这个量。

Imagine, however, that there’s some exogenous factor that could decrease the pressure in the pipe without actually indicating a pipe had burst. What if, for example, Crazy Arty ran a pump that he only turned on randomly from time to time and you didn’t know about. If the only signal you have is observing a loss in pressure, you’d have no way of knowing whether you’d burst a pipe or if Crazy Arty had just plugged in the pump. The effect would be that you’d likely record a pressure level much less than the amount the pipes could actually withstand — leading to all your customers on the network potentially having lower water pressure than they should.

再想象一下,一定还有一些外在的因素,可以使得水管中的压力降低,但是实际上却并不是水管崩裂。例如,Crazy Arty在你并不知道的情况下,随机的使用抽水泵从管道中进行抽水。如果你只能够通过观察水的压力变化来获取信息,那么你就没有办法知道到底是水管崩裂了还是Crazy Arty接上了抽水泵。而结果将会是,你所有记录的压力值很可能远远低于水管所能承受的–导致你所有的用户很可能获得了远小于它本该获得的水流量压力。

Optimizing for Congestion or Loss

拥塞和丢包优化

If you’ve been following up to this point then you already know more about TCP Congestion Control than you would guess. The initial amount of water we talked about in TCP is known as the Initial Congestion Window (initcwnd) it is the initial number of packets in flight across the network. The congestion window (cwnd) either shrinks, grows, or stays the same depending on how many packets make it back and how fast (in ACK trains) they return after the initial burst. In essence, TCP Congestion Control is just like the water utility — measuring the pressure a network can withstand and then adjusting the volume in an attempt to maximize flow without bursting any pipes.

如果你接受这个观点,你就已经知道了很多关于TCP拥塞控制的知识,然后我们就可以更多的进行猜测。我们所谈论的TCP的初始水量,就是初始化拥塞窗口(initcwnd),它是网络传输过程中的初始的数据包个数。拥塞窗口(cwnd)或是降低,或是增长,或是保持同样的大小,这些依赖于在网络开始崩溃后有多少数据包能够返回来以及有多快(在ACK环节)。本质上来讲,TCP拥塞控制就像自来水工厂–测量网络能够经受的压力然后调节开关的大小以取得最大的流量,但是却并不崩裂任何的管道。

When a TCP connection is first established it attempts to ramp up the cwnd quickly. This phase of the connection, where TCP grows the cwnd rapidly, is called Slow Start. That’s a bit of a misnomer since it is generally an exponential growth function which is quite fast and aggressive. Just like when the water utility in the example above detects a drop in pressure it turns down the volume of water, when TCP detects packets are lost it reduces the size of the cwnd and delays the time before another burst of packets is delivered. The time between packet bursts is known as the Retransmission Timeout (RTO). The algorithm within TCP that controls these processes is called the Congestion Control Algorithm. There are many congestion control algorithms and clients and servers can use different strategies based in the characteristics of their networks. Most of Congestion Control Algorithms focus on optimizing for one type of network loss or another: congestive loss (like you see on wired networks) or random loss (like you see on mobile networks).

当TCP连接第一次建立后,它就马上开始增加拥塞窗口。这个阶段中,TCP不断升高拥塞窗口,这个过程叫“慢启动”(Slow Start)。这个词有一些不够恰当,因为,这个过程往往是快速并激进的指数级增长。就像上面提到的自来水工厂一样,当它检测到压力的下降就开始调小水的阀门,当TCP检测到数据包丢失的时候,它也开始减小拥塞窗口的大小,并延迟下一个可能崩裂网络的数据包的发送。这个等待数据包发送的时间被称为“重发延时”(RTO)。TCP中的这个算法控制着这一系列的过程,被成为拥塞控制算法。有很多的拥塞控制算法,并且客户端和服务器之间可以根据它们网络不同的特性使用不同的算法。大多数拥塞控制算法聚焦于优化网络丢包中的一种,比如过载丢包(如有线网络),或是随机丢包(比如移动网络)。

In the example above, a pipe bursting would be an indication of congestive loss. There was a physical limit to the pipes, it is exceeded, and the appropriate response is to back off. On the other hand, Crazy Arty’s pump is analogous to random loss. The capacity is still available on the network and only a temporary disturbance causes the water utility to see the pipes as overfull. The Internet started as a network of wired devices, and, as its name suggests, congestion control was largely designed to optimize for congestive loss. As a result, the default Congestion Control Algorithm in many operating systems is good for communicating wired networks but not as good for communicating with mobile networks.

在上面的例子中,一条数据管道的崩裂可能是因为过载丢包。在管道中有一个物理限制,如果超过了这个限制,就会有对应的回退响应机制。从另一个角度来说,Crazy Arty的抽水机类似于随机丢包。网络的能力依然可用的情况下,仅仅是一个短时间的失衡,导致自来水厂认为管道已经过载。互联网是从有线设备开始的,正如它的名字一样,拥塞控制在很大程度上来讲是为过载丢包而设计。因此,在很多操作系统上默认的拥塞控制算法对于有线网络的数据通讯来讲是非常有用的,但是在手机网络通讯中却并不是那么的适合。

A few Congestion Control algorithms try to bridge the gap by using the time of the delay in the “pressure increase” to “expected capacity” to figure out the cause of the loss. These are known as bandwidth estimation algorithms, and examples include Vegas, Venoand Westwood+. Unfortunately, all of these methods are reactive and reuse no information across two similar streams.

一些拥塞控制算法试图通过缩小“压力增长”至“预期压力”阶段的重发延迟的时间间隔来确认丢包的原因。这些称为带宽评估算法,包括 VegasVenoWestwood+。不幸的是,所有的这些方法都没有效果,并且在两条相似的数据流中无法找到可重用的信息。

At companies that see a significant amount of network traffic, like CloudFlare or Google, it is possible to map the characteristics of the Internet’s networks and choose a specific congestion control algorithm in order to maximize performance for that network. Unfortunately, unless you are seeing the large amounts of traffic as we do and can record data on network performance, the ability to instrument your congestion control or build a “weather forecast” is usually impossible. Fortunately, there are still several things you can do to make your server more responsive to visitors even when they’re coming from lossy, mobile devices.

在一些能够观察到巨量网络流量的公司内部,比如 CloudFlare和Google,是非常有机会通过定位互联网络的特征来选择一个特殊的拥塞控制算法从而达到网络性能最大化的。不幸的是,除非你像我们一样观察大量的网络流量并且记录网络性能数据,否则,建立你自己的拥塞控制就像自己建立“天气预报”机制一样不太可能。但是,还是有很多方式,可以让你的服务面对那些来自于容易丢包的移动设备响应更加良好。

Compelling Reasons to Upgrade You Kernel

升级内核的必要原因

The Linux network stack has been under extensive development to bring about some sensible defaults and mechanisms for dealing with the network topology of 2012. A mixed network of high bandwidth low latency and high bandwidth, high latency, lossy connections was never fully anticipated by the kernel developers of 2009 and if you check your server’s kernel version chances are it’s running a 2.6.32.x kernel from that era.

Linux的网络协议栈已在广泛的开发中带来了一些合理的默认值和机制来处理当今(2012)网络拓扑。在2009年,一个高带宽、低延迟和高带宽、高延迟、易丢包的混合网络是内核开发工作从来没有预期过的,如果你检查你服务器的内核版本碰巧它运行着 2.6.32.x的版本的话,这个内核就是在那个年代开发的。

uname -a

There are a number of reasons that if you’re running an old kernel on your web server and want to increase web performance, especially for mobile devices, you should investigate upgrading. To begin, Linux 2.6.38 bumps the default initcwnd and initrwnd (inital receive window) from 3 to 10. This is an easy, big win. It allows for 14.2KB (vs 5.7KB) of data to be sent or received in the initial round trip before slow start grows the cwnd further. This is important for HTTP and SSL because it gives you more room to fit the header in the initial set of packets. If you are running an older kernel you may be able to run the following command on a bash shell (use caution) to set all of your routes’ initcwnd and initrwnd to 10. On average, this small change can be one of the biggest boosts when you’re trying to maximize web performance.

如果还在运行着老版本的内核并且希望提升web的性能,尤其是针对移动设备,有很多原因使得你应该开始考虑升级了。首先,Linux 2.6.38 跳跃性的将初始拥塞窗口和初始接收窗口(inital receive window)从3升到10.这是个非常简单但是有很大效果的方式。它允许在慢启动提升拥塞窗口前就可以在第一个回环中发送或接收14.2KB(vs 5.7KB)的数据。这对于HTTP和SSL来讲是非常重要的,因为它给了更多的空间在初始阶段的数据包中填充协议头。如果你在一个老版本的内核中,你可能需要通过运行下面的命令来设置所有的路由使得它们的初始化拥塞窗口和初始化接收窗口为10.通常来讲,这个很小的变化对于提升web服务器性能来讲,可能是效果最大的一个。

ip route | while read p; do ip route change $p initcwnd 10 initrwnd 10; done

Linux kernel 3.2 implements Proportional Rate Reduction (PRR). PRR decreases the time it takes for a lossy connection to recover its full speed, potentially improving HTTP response times by 3-10%. The benefits of PRR are significant for mobile networks. To understand why, it’s worth diving back into the details of how previous congestion control strategies interacted with loss.

在Linux 内核 3.2版本中,实现了 Proportional Rate Reduction (PRR)。PRR减小了从丢包网络中恢复到全速的时间,有可能提升3-10%的HTTP的响应时间。这个收益对于手机网络来讲是尤其重要的。为了理解为什么,我们有必要回到前面关于拥塞控制策略和丢包是如何相互影响这个细节的。

Many congestion control algorithms halve the cwnd when a loss is detected. When multiple losses occur this can result in a case where the cwnd is lower than the slow start threshold. Unfortunately, the connection never goes through slow start again. The result is that a few network interruptions can result in TCP slowing to a crawl for all the connections in the session.

很多拥塞控制算法在发现丢包的时候会将拥塞窗口减半(1/2)。当多次丢包发生的时候,可能带来一个结果:拥塞窗口小于慢启动时候的阈值。不幸的是,连接从来不会再来一次慢启动。结果就是,一些网络中断的能够使得会话中的所有连接都慢的像爬一样。

This is even more deadly when combined with tcp_no_metrics_save=0 sysctl setting on unpatched kernels before 3.2. This setting will save data on connections and attempt to use it to optimize the network. Unfortunately, this can actually make performance worse because TCP will apply the exception case to every new connection from a client within a window of a few minutes. In other words, in some cases, one person surfing your site from a mobile phone who has some random packet loss can reduce your server’s performance to this visitor even when their temporary loss has cleared.

更悲催的是,在没有打补丁的3.2版本以前的内核中,当sysctl配置中tcp_no_metrics_save=0 的状况下。这个设置会在连接时保存配置,并且尝试用它来优化网络。不幸的是,这往往会让网络的性能更加糟糕,因为在一段时间周期内的新建连接TCP也会使用这个配置。换句话来讲,某些情况下,一个使用手机访问你站点的人,如果他偶尔发生了丢包现象,就会一直降低他访问你服务器的性能,即便明确知道他们仅仅是偶尔的丢包。

If you expect your visitors to be coming from mobile, lossy connections and you cannot upgrade or patch your kernel I recommend setting tcp_no_metrics_save=1. If you’re comfortable doing some hacking, you can patch older kernels.

如果你预料有从手机设备、不稳定网络来的访客,并且你不能够升级或给你的内核打补丁的情况下,我建议你设置 tcp_no_metrics_save=1。如果你乐于做一些hacking,你可以给你的内核打补丁

The good news is that Linux 3.2 implements PRR, which decreases the amount of time that a lossy connection will impact TCP performance. If you can upgrade, it may be one of the most significant things you can do in order to increase your web performance.

好消息是 Linux 3.2 的内核实现了 PRR,减小了一个低效的连接对于TCP性能影响的时间。如果你能够升级内核,这是一个提升web性能的很重要的事情。

More Improvements Ahead

更多提升

Linux 3.2 also has another important improvement with RFC2099bis. The initial Retransmission Timeout (initRTO) has been changed to 1s from 3s. If loss happens after sending the initcwnd two seconds waiting time are saved when trying to resend the data. With TCP streams being so short this can have a very noticeable improvement if a connection experiences loss at the beginning of the stream. Like the PRR patch this can also be applied (with modification) to older kernels if for some reason you cannot upgrade (here’s the patch).

Linux 3.2 还有另一个对于RFC2099bis的重要的提升。默认的初始重传的延时从3s变成了1s。如果在发送了初始化拥塞窗口后发生了丢包,在等待数据重新发送的时候可以节省2s的等待时间。在数据流开始的阶段经历丢包的情况下,由于TCP流非常短小,因此,这次可带来非常明显的提升。像PRR补丁一样,这个功能也可以在老版本的内核上生效(需要一些修改),如果你不能够升级内核的话。

Looking forward, Linux 3.3 has Byte Queue Limits when teamed with CoDel (controlled delay) in the 3.5 kernel helps fight the long standing issue of Bufferbloat by intelligently managing packet queues. Bufferbloat is when the caching overhead on TCP becomes inefficient because it’s littered with stale data. Linux 3.3 has features to auto QoS important packets (SYN/DNS/ARP/etc.,) keep down buffer queues thereby reducing bufferbloat and improving latency on loaded servers.

紧接着,Linux 3.3实现的字节队列控制同实现了CoDel(可控延迟)功能的3.5的内核一起工作的时候通过智能的管理数据包队列能够帮助系统解决BufferBloat的问题。Bufferbloat的是指因为TCP缓存中掺杂一些陈旧的垃圾数据导致其低效的情况。在Linux 3.3具有针对重要的数据包(SYN/DNS/ARP/等等)自动QoS的功能,控制缓冲队列来减少bufferbloat,降低高负载服务器的延迟。

Linux 3.5 implements TCP Early Retransmit with some safeguards for connections that have a small amount of packet reordering. This allows connections, under certain conditions, to trigger fast retransmit and bypass the costly Retransmission Timeout (RTO) mentioned earlier. By default it is enabled in the failsafe mode tcp_early_retrans=2. If for some reason you are sure your clients have loss but no reordering then you could set tcp_early_retrans=1 to save one quarter a RTT on recovery.

Linux 3.5实现了TCP快速重传,针对那些少量包重排序的连接实现了一些保障措施。这允许在一些情况下的连接可以触发绕过重传延时(RTO)快速重传。默认是在非安全模式 tcp_early_retrans=2下启用的。如果你有足够的理由确认你的客户端会发生丢包但是不需要重排序,你可以配置 tcp_early_retrans=1在恢复重传时节省 ¼ RTT的时间。

One of the most extensive changes to 3.6 that hasn’t got much press is the removal of the IPv4 routing cache. In a nutshell it was an extraneous caching layer in the kernel that mapped interfaces to routes to IPs and saved a lookup to the Forward Information Base (FIB). The FIB is a routing table within the network stack. The IPv4 routing cache was intended to eliminate a FIB lookup and increase performance. While a good idea in principle, unfortunately it provided a very small performance boost in less than 10% of connections. In the 3.2.x-3.5.x kernels it was extremely vulnerable to certain DDoS techniques so it has been removed.

在3.6版本中被大量改进但并没有太多资料的是去除 IPv4的路由缓存。简单来讲,它是内核中额外的一个缓存层,映射网络接口(网卡)到路由、到IP地址,用以节省一次到转发信息库(FIB)的查询操作。FIB是网络栈内部的一个路由表。IPv4的路由缓存希望通过节省一次FIB查询来提升性能。虽然从原理上来讲是一个好的点子,不过,它仅仅对不到10%的连接有少量的提升。在3.2.x-3.5.x的内核中,它非常容易被一些DDoS技术攻击,所以它在新版的内核中已经被去掉了。

Finally, one important setting you should check, regardless of the Linux kernel you are running: tcp_slow_start_after_idle. If you’re concerned about web performance, it has been proclaimed sysctl setting of the year. It can be enabled in almost any kernel. By default this is set to 1 which will aggressively reduce cwnd on idle connections and negatively impact any long lived connections such as SSL. The following command will set it to 0 and can significantly improve performance:

sysctl -w tcp_slow_start_after_idle=0

最后,无论运行什么版本的Linux内核,你都需要检查的一项重要配置:tcp_slow_start_after_idle。如果你关注WEB性能,这是今年被公开宣告的sysctl配置。这个配置在几乎所有版本的内核中都可以启用。默认情况下,它被设置成1,这将会针对闲置的连接积极的降低拥塞窗口,对长连接比如SSL产生非常消极的影响。下面的命令将会设置它为0,并且显著的提升性能:

sysctl -w tcp_slow_start_after_idle=0

The Missing Congestion Control Algorithm

那些漏掉的拥塞控制算法

You may be curious as to why I haven’t made a recommendation as far as a quick and easy change of congestion control algorithms. Since Linux 2.6.19, the default congestion control algorithm in the Linux kernel is CUBIC, which is time based and optimized for high speed and high latency networks. It’s killer feature, known as called Hybrid Slow Start (HyStart), allows it to safely exit slow start by measuring the ACK trains and not overshoot the cwnd. It can improve startup throughput by up to 200-300%.

你可能会好奇为什么我没有推荐一个 快速和容易改变的拥塞控制算法。 从Linux 2.6.19开始,默认的Linux内核的拥塞控制算法是CUBIC,这是一个基于时间并针对高速和高延迟网络进行优化的。 它的杀手锏,称为混合慢启动机制 (HyStart),使其通过测量ACK 冲击和不超过cwnd实现安全地退出慢启动。 它可以提高启动吞吐量200 – 300%。

 

While other Congestion Control Algorithms may seem like performance wins on connections experiencing high amounts of loss (>.1%) (e.g., TCP Westwood+ or Hybla), unfortunately these algorithms don’t include HyStart. The net effect is that, in our tests, they underperform CUBIC for general network performance. Unless a majority of your clients are on lossy connections, I recommend staying with CUBIC.

而其他拥塞控制算法看起来是在连接经历大量的丢包后(> .1%)获得的性能提升 (如 ,TCP Westwood+ or Hybla),不幸的是这些算法不包括 HyStart。在我们的测试中,他们在一般网络中性能表现不如CUBIC。 除非你的大部分客户端都是在受干扰的有损网络中,否则我建议使用CUBIC。

Of course the real answer here is to dynamically swap out congestion control algorithms based on historical data to better serve these edge cases. Unfortunately, that is difficult for the average web server unless you’re seeing a very high volume of traffic and are able to record and analyze network characteristics across multiple connections. The good news is that loss predictors and hybrid congestion control algorithms are continuing to mature, so maybe we will have an answer in an upcoming kernel.

当然最好的答案是基于历史数据动态交换拥塞控制算法为这些偶发状况提供更好地地服务。 不幸的是,这对于普通的web服务器是非常困难的, 除非你能够看到一个非常大的流量并且能记录和分析网络中很多个连接的特性。 好消息是,损失预测和混合拥塞控制 算法继续成熟,也许我们在即将到来的内核中会有答案。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值