通过调整linux内核参数,支持百万级并发

Like the idea of working on large scale problems? We’re hiring talented engineers, and would love to chat with you – check it out!

    Note: Concurrency, as defined in this article, is the same as it is for The C10k problem: concurrent clients (or sockets).

At Urban Airship we recently published a blog post about scaling beyond 500,000 concurrent socket connections. Hitting these numbers was not a trivial exercise so we’re going to share what we’ve come across during our testing. This guide is specific to Linux and has some information related to Amazon EC2, but it is not EC2-centric. These principles should apply to just about any Linux platform.

For our usage, squeezing out as many possible socket connections per server is valuable. Instead of running 100 servers with 10,000 connections each, we’d rather run 2 servers with 500,000 connections apiece. To do this we made the socket servers pretty much just socket servers. Any communication between the client and server is passed through a queueand processed by a worker. Having less for the socket server to do means less code, cpu-usage, and ram-usage.

To get to these numbers we must consider the Linux kernel itself. A number of configurations needed tweaking. But first, an anecdote.

The Kernel, OOM, LOWMEM, and You

We first tested our code on a local Linux box that had Ubuntu 64-bit with 6GB of RAM, connecting with several Ubuntu VMs per client using bridged network adapters so we could ramp up our connections. We’d fire up the server and run our clients locally to see just how many connections we could hit. We noticed that we could hit 512,000 with our Java server not even breaking a sweat.

The next step was to test on EC2. We first wanted to see what sort of numbers we could get on “Small” instances, which are 1.7GB 32-bit VMs. We also had to fire up a number of other EC2 instances to act as clients.

We watched the numbers go up and up without a hitch until, seemingly randomly, the Java server fell over. It didn’t print any exceptions or die gracefully—it was killed.

We tried the same process again to see if we could replicate the behavior. Killed again.

Grepping through syslog, we found this line:

Out of Memory: Killed process 2178 java

The OOM-killer killed the Java process. Having watched the free RAM closely, this was odd because we had at least 500MB free at the time of the kill.

The next time we ran it we watched the contents of /proc/meminfo. What we noticed was a steady decline of the field “LowFree”, the amount of LOWMEM that is available. LOWMEM is the kernel-addressable RAM space used for kernel data. Data like socket buffers.

As we increased the number of sockets each socket’s buffers increased the amount of LOWMEM used. Once LOWMEM was full the kernel (instead of simply panicking) found the user process responsible for the usage and promptly killed it so it could continue to function.

On a standard EC2 Small, the configuration is such that the LOWMEM is around 717MB and the rest is “given” to the user. The kernel is smart about reallocating LOWMEM for the user, but not the other way around. The assumption is that the kernel will use very little ram, or at least a predictable finite amount, and the user should be allowed to go crazy. What we needed with our socket server was just the opposite. We needed the kernel to use all the ram it needed—our Java server rarely uses above a few hundred MB.

(For an in-depth rundown, take a look at High Memory In The Linux Kernel)

On a 32-bit system the kernel-addressable RAM space is 4GB. Making sure the proper space reserved for the kernel is important. But on 64-bit (x86-64) Linux the kernel-addressable space is 64TB (terabytes). At the current state of computing this is effectively limitless, and as such you will not even see LowMem in /proc/meminfo because it is all LOWMEM.

So we created some EC2 Large instances (each of which is 64-bit with 7.5GB of RAM) and ran our tests again, this time without any surprises. The sockets were added happily and the kernel took all the RAM it needed.

Long story short, you can only scale to so many sockets on a 32-bit platform.

Kernel Options

Several parameters exist to allow for tuning and tweaking of socket-related parameters. In /etc/sysctl.conf there are a few options we’ve modified.

First is fs.file-max, the maximum file descriptor limit. The default is quite low so this should be adjusted. Be careful if you’re not ready to go super high.

Second, we have the socket buffer parameters net.ipv4.tcp_rmem and net.ipv4.tcp_wmem. These are the buffers for reads and writes respectively. Each requires three integer inputs: min, default, and max. These each correspond to the number of bytes that may be buffered for a socket. Set these low with a tolerant max to reduce the amount of ram used for each socket.

The relevant portions of our config look like this:

fs.file-max = 999999
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216

Meaning that the kernel allows for 999,999 open file descriptors and each socket buffer has a minimum and default 4096-byte buffer, with a sensible max of 16MB.

We also modified /etc/security/limits.conf to allow for 999,999 open file descriptors for all users.

#<domain>      <type>  <item>         <value>
*               -       nofile         999999

You may want to look at the manpage for more information.
Testing

When testing, we were able to get about 64,000 connections per client by increasing the number of ephemeral ports allowed on both the client and the server.

echo "1024 65535" > /proc/sys/net/ipv4/ip_local_port_range

This effectively allows every ephemeral port above 1024 be used instead of the default, which is a much lower (and typically more sane) default.

The 64k Connection Myth

It’s a common misconception that you can only accept 64,000 connections per IP address and the only way around it is to add more IPs. This is absolutely false.

The misconception begins with the premise that there are only so many ephemeral ports per IP. The truth is that the limit is based on the IP pair, or said another way, the client and server IPs together. A single client IP can connect to a server IP 64,000 times and so can another client IP.

Were this myth true it would be a significant and easy-to-exploit DDoS vector.

Scaling for Everyone

When we set out to establish half a million connections on a single server we were diving deep into water that wasn’t well documented. Sure, we know that C10k is relatively trivial, but how about an order of magnitude (and then some) above that?

Fortunately we’ve been able to achieve success without too many serious problems. Hopefully our solutions can help save time for those out there looking to solve the same problems.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
作者:Xiaohang Zhan,Ziwei Liu,Ping Luo,Xiaoou Tang,Chen Change Loy 摘要:Deep convolutional networks for semantic image segmentation typically require large-scale labeled data, e.g. ImageNet and MS COCO, for network pre-training. To reduce annotation efforts, self-supervised semantic segmentation is recently proposed to pre-train a network without any human-provided labels. The key of this new form of learning is to design a proxy task (e.g. image colorization), from which a discriminative loss can be formulated on unlabeled data. Many proxy tasks, however, lack the critical supervision signals that could induce discriminative representation for the target image segmentation task. Thus self-supervision's performance is still far from that of supervised pre-training. In this study, we overcome this limitation by incorporating a "mix-and-match" (M&M) tuning stage in the self-supervision pipeline. The proposed approach is readily pluggable to many self-supervision methods and does not use more annotated samples than the original process. Yet, it is capable of boosting the performance of target image segmentation task to surpass fully-supervised pre-trained counterpart. The improvement is made possible by better harnessing the limited pixel-wise annotations in the target dataset. Specifically, we first introduce the "mix" stage, which sparsely samples and mixes patches from the target set to reflect rich and diverse local patch statistics of target images. A "match" stage then forms a class-wise connected graph, which can be used to derive a strong triplet-based discriminative loss for fine-tuning the network. Our paradigm follows the standard practice in existing self-supervised studies and no extra data or label is required. With the proposed M&M approach, for the first time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值