haproxy ssl_我们如何微调HAProxy以实现2,000,000个并发SSL连接

haproxy ssl

by Sachin Malhotra

由Sachin Malhotra

我们如何微调HAProxy以实现2,000,000个并发SSL连接 (How we fine-tuned HAProxy to achieve 2,000,000 concurrent SSL connections)

If you look at the above screenshot closely, you’ll find two important pieces of information:

如果仔细查看上面的屏幕截图,您会发现两条重要的信息:

  1. This machine has 2.38 million TCP connections established, and

    该计算机已建立238万个TCP连接 ,并且

  2. The amount of RAM being used is around 48 Gigabytes.

    正在使用的RAM量约为48 GB

Pretty awesome right? What would be even more awesome is if someone provided the setup components, and the tunings required to achieve this kind of scale on a single HAProxy machine. Well, I’ll do just that in this post ;)

太棒了吧? 如果有人在单个HAProxy机器上提供了设置组件以及实现这种规模所需的调整,那将更加令人敬畏。 好吧,我将在这篇文章中这样做;)

This is the final part of the multipart series on load testing HAProxy. If you have time, I recommend you go and read the first two parts in the series first. These will help you get the hang of the kernel level tunings required on all the machines in this setup.

这是负载测试HAProxy多部分系列的最后一部分。 如果有时间,我建议您先阅读本系列的前两部分。 这些将帮助您掌握此设置中所有计算机上所需的内核级调整的功能。

Load Testing HAProxy (Part-1)Load Testing ? HAProxy ? If all this seems greek to you, don’t worry. I will provide inline links to read up on what…medium.comLoad Testing HAProxy (Part 2)This is the second part in the 3 part series on performance testing of the famous TCP load balancer and reverse proxy…medium.com

负载测试HAProxy(第1部分) 负载测试? HAProxy? 如果这一切对您来说都很希腊,请不用担心。 我将提供内联链接,以阅读以下内容…… medium.com 负载测试HAProxy(第2部分) 这是有关著名TCP负载均衡器和反向代理的性能测试的3部分系列的第二部分。

There are a lot of small components that helped us bring together the entire setup and achieve these numbers.

有很多小组件可以帮助我们将整个设置整合在一起并达到这些数字。

Before I tell you the final HAProxy configuration we used (if you’re super impatient you can scroll to the bottom) I want to build up to it by walking you through our thinking.

在我告诉您我们使用的最终HAProxy配置之前(如果您急躁,可以滚动到底部),我想通过逐步思考来逐步完善它。

我们想要测试的 (What we wanted to test)

The component we want to test was HAProxy version 1.6. We are using this in production right now on 4 core, 30 Gig machines. However, all the connectivity is non-SSL based.

我们要测试的组件是HAProxy 1.6版。 我们现在正在4核30 Gig机器上使用此产品。 但是,所有连接都不基于SSL。

We wanted to test two things out of this exercise:

我们想测试一下本练习中的两件事:

  1. The CPU percentage increase when we shift the entire load from non-SSL connections to SSL connections. The CPU usage should definitely increase, owing to the longer 5-way handshake and then the packet encryption.

    当我们将整个负载从非SSL连接转移到SSL连接时, CPU百分比增加 。 由于较长的5次握手和随后的数据包加密,CPU使用率肯定会增加。

  2. Secondly, we wanted to test the limits of our current production setup in terms of number of requests and the max number of concurrent connections that can be supported before performance starts to degrade.

    其次,我们要根据性能开始下降之前可以支持的请求数量和最大并发连接数来测试当前生产设置的限制。

We required the first part because of a major feature rollout that’s in full swing, which requires communication over SSL. We required the second part so that we could reduce the amount of hardware dedicated in production to HAProxy machines.

我们之所以需要第一部分,是因为主要功能正在全面展开,这需要通过SSL进行通信。 我们需要第二部分,以便我们可以减少HAProxy计算机生产专用的硬件数量。

涉及的组件 (The Components Involved)

  • Multiple client machines to stress the HAProxy.

    多台客户端计算机可以强调HAProxy。
  • Single HAProxy machine version 1.6 on various setups

    各种设置上的单个HAProxy计算机1.6版

    * 4 core, 30 Gig

    * 4核心,30 Gig

    * 16 core, 30 Gig

    * 16核心,30 Gig

    * 16 core, 64 Gig

    * 16核心,64 Gig

  • Backend servers that will help support all these concurrent connections.

    后端服务器将帮助支持所有这些并发连接。

HTTP和MQTT (HTTP and MQTT)

If you’ve gone through the first article in this series, you should know that our entire infrastructure is supported over two protocols:

如果阅读了本系列的第一篇文章 ,那么您应该知道我们的整个基础结构都受以下两种协议支持:

  • HTTP and

    HTTP和
  • MQTT.

    MQTT。

In our stack, we don’t use HTTP 2.0 and hence don’t have the functionality of persistent connections on HTTP. So on production the max number of TCP connections that we see is somewhere around (2 * 150k) on a single HAProxy machine (Inbound + Outbound). Although the number of concurrent connections is rather low, the number of requests per second is quite high.

在我们的堆栈中,我们不使用HTTP 2.0,因此不具有HTTP上的持久连接功能。 因此,在生产中,我们在单个HAProxy机器(入站+出站)上看到的最大TCP连接数约为(2 * 150k)。 尽管并发连接的数量很少,但是每秒的请求数量却很高。

On the other hand, MQTT is a different way altogether for communication. It offers great quality of service parameters and persistent connectivity as well. So bidirectional continuous communication can happen over a MQTT channel. As for HAProxy that supports MQTT (underlying TCP) connections, we see somewhere around 600–700k TCP connections at the peak time on a single machine.

另一方面,MQTT是完全不同的通信方式。 它提供了优质的服务参数和持久的连接性。 因此,双向连续通信可以通过MQTT通道进行。 至于支持MQTT(底层TCP)连接的HAProxy,我们在一台机器上的高峰时间看到大约600-700k TCP连接。

We wanted to do a load test that will give us precise results for both HTTP and MQTT based connections.

我们希望进行负载测试,以便为基于HTTP和MQTT的连接提供精确的结果。

There are a lot of tools out there that help us load test an HTTP server easily and a lot of these tools provide advanced functionalities like summarized results, converting text based results to graphs, etc. We could not, however, find any stress testing tool for MQTT. We do have a tool that we developed ourselves, but it was not stable enough to support this kind of load in the timeframe we had.

有很多工具可以帮助我们轻松地对HTTP服务器进行负载测试,并且这些工具中的许多工具提供了高级功能,例如汇总结果,将基于文本的结果转换为图形等。但是,我们找不到任何压力测试工具用于MQTT。 我们确实有一个自己开发的工具,但是它不够稳定,无法在我们所拥有的时间范围内支持这种负载。

So we decided to go for load testing clients for HTTP and simulating the MQTT setup using the same ;) Interesting right?

因此,我们决定去为HTTP进行负载测试客户端,并使用相同的方法来模拟MQTT的设置;)有趣吗?

Well read on.

好吧,继续读下去。

初始设置 (The Initial Setup)

This is going to be a long post as I will be providing a lot of details that I think would be really helpful to someone doing similar load testing or fine tunings.

这将是一篇很长的文章,因为我将提供很多细节,我认为这些细节对于进行类似负载测试或微调的人真的很有帮助。

  • We took a 16 core 30 Gig machine for setting up HAProxy initially. We did not go with our current production setup because we thought the CPU hit because of SSL termination happening at the HAProxy end would be tremendous.

    我们最初使用了一台16核30 Gig机器来设置HAProxy。 我们没有采用当前的生产设置,因为我们认为在HAProxy端发生SSL终止会给CPU造成巨大的损失。
  • For the server end, we went with a simple NodeJs server that replies with pong on receiving a ping request.

    对于服务器端,我们使用了一个简单的NodeJs服务器,该服务器在收到ping请求时以pong进行回复。

  • As for the client, we ended up using Apache Bench initially. The reason we went with ab was because it was a very well known and stable tool for load testing HTTP end points and also because it provides beautiful summarized results that would help us a lot.

    对于客户端,我们最初使用Apache Bench 。 我们选择ab的原因是因为它是用于负载测试HTTP端点的非常著名且稳定的工具,并且还因为它提供了漂亮的汇总结果,这对我们有很大帮助。

The ab tool provides a lot of interesting parameters that we used for our load test like:

ab工具提供了许多我们用于负载测试的有趣参数,例如:

  • - c, concurrency Specifies the number of concurrent requests that would hit the server.

    - c, concurrency指定将命中服务器的并发请求数。

  • -n, no. of requests As the name suggests, specifies the total number of requests of the current load run.

    -n, no. of requests -n, no. of requests顾名思义,指定当前负载运行的请求总数。

  • -p POST file Contains the body of the POST request (if that is what you want to test.)

    -p POST file包含POST请求的主体(如果要测试的话)。

If you look at these parameters closely, you will find that a lot of permutations are possible by tweaking all three. A sample ab request would look something like this

如果仔细查看这些参数,您将发现通过调整所有三个参数可以实现许多排列。 样本ab请求看起来像这样

ab -S -p post_smaller.txt -T application/json -q -n 100000 -c 3000 http://test.haproxy.in:80/ping

A sample result of such a request looks something like this

这样的请求的样本结果看起来像这样

The numbers that we were interested in were

我们感兴趣的数字是

  • 99% latency.

    99%的延迟。
  • Time per request.

    每个请求的时间。
  • No. of failed requests.

    失败请求的数量。
  • Requests per second.

    每秒的请求数。

The biggest problem of ab is that it does not provide a parameter to control the number of requests per second. We had to tweak the concurrency level to get our desired requests per second and this lead to a lot of trail and errors.

ab的最大问题是它没有提供参数来控制每秒的请求数。 我们必须调整并发级别,以获取每秒所需的请求,这会导致大量的跟踪和错误。

全能图 (The Almighty Graph)

We could not randomly go about doing multiple load runs and keep getting results because that would not give us any meaningful information. We had to perform these tests in some specific way so as to get meaningful results out of it. So we followed this graph

我们不能随意进行多个负载运行并保持结果,因为那样做不会给我们任何有意义的信息。 我们必须以某种特定方式执行这些测试,以便从中获得有意义的结果。 所以我们跟随这张图

This graph states that up until a certain point, if we keep increasing the number of requests, the latency will remain almost the same. However, beyond a certain tipping point, the latency will start to increase exponentially. It is this tipping point for a machine or for a setup that we intended to measure.

该图表明,直到一定时间,如果我们继续增加请求数量,延迟将几乎保持不变。 但是, 超过某个临界点后 ,延迟将开始呈指数增长。 这是我们要测量的机器或设置的临界点。

神经节 (Ganglia)

Before providing some test results, I would like to mention Ganglia.

在提供一些测试结果之前,我想提一下Ganglia

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.
Ganglia是一个可扩展的分布式监视系统,用于高性能计算系统,例如集群和网格。

Look at the following screenshot of one of our machines to get an idea about what ganglia is and what sort of information it provides about the underlying machine.

查看以下其中一台计算机的屏幕截图,以了解神经节是什么以及它提供的有关底层计算机的信息。

Pretty interesting, eh?

很有趣,是吗?

Moving on, we constantly monitored ganglia for our HAProxy machine to monitor some important things.

继续,我们不断监视HAProxy机器的神经节,以监视一些重要的事情。

  1. TCP established This tells us the total number of tcp connections established on the system. NOTE: this is the sum of inbound as well as outbound connections.

    TCP established这告诉我们在系统上建立的tcp连接的总数。 注意:这是入站和出站连接的总和。

  2. packets sent and received We wanted to see the total number of tcp packets being sent and received by our HAProxy machine.

    packets sent and received我们想查看HAProxy机器发送和接收的tcp数据包总数。

  3. bytes sent and received This shows us the total data that we sent and received by the machine.

    bytes sent and received这向我们显示了机器发送和接收的总数据。

  4. memory The amount of RAM being used over time.

    memory随时间使用的RAM量。

  5. network The network bandwidth consumption because of the packets being sent over the wire.

    network由于通过有线网络发送数据包而导致的网络带宽消耗。

Following are the known limits found via previous tests/numbers that we wanted to achieve via our load test.

以下是通过我们希望通过负载测试达到的先前测试/编号发现的已知极限。

700k TCP established connections,

700k TCP建立的连接,

50k packets sent, 60k packets received,

已发送50k数据包,已接收60k数据包,

10–15MB bytes sent as well as received,

已发送和已接收的10–15MB字节,

14–15Gig memory at peak,

峰值内存为14–15Gig,

7MB network.

7MB网络。

7MB network. ALL these values are on a per second basis

7MB网络。 ALL these values are on a per second basis

HAProxy Nbproc (HAProxy Nbproc)

Initially when we began load testing HAProxy, we found out that with SSL the CPU was being hit pretty early on in the process but the requests per second were very low. On investigating the top command, we found that HAProxy was using only 1 core. Whereas we had 15 more cores to spare.

最初,当我们开始对HAProxy进行负载测试时,我们发现在使用SSL的过程中,CPU受到的攻击相当早,但是每秒的请求却非常低。 在调查top命令时,我们发现HAProxy仅使用1个内核。 而我们还有15个核心可用。

Googling for about 10 minutes led us to find this interesting setting in HAProxy that lets HAProxy use multiple cores.

谷歌搜索大约10分钟使我们在HAProxy中找到了一个有趣的设置,该设置使HAProxy可以使用多个内核。

It’s called nbproc and to get a better hang of what it is and how to set it, check out this article:

它被称为nbproc ,为了更好地了解它的含义以及如何设置它,请查看本文:

http://blog.onefellow.com/post/82478335338/haproxy-mapping-process-to-cpu-core-for-maximum

http://blog.onefellow.com/post/82478335338/haproxy-mapping-process-to-cpu-core-for-maximum

Tuning this setting was the base of our load testing strategy moving forward. Because the ability to use multiple cores by HAProxy gave us the power to form multiple combinations for our load testing suite.

调整此设置是我们继续进行负载测试策略的基础。 因为HAProxy使用多个内核的能力使我们能够为负载测试套件形成多种组合。

用AB进行负载测试 (Load Testing with AB)

When we had started out with our load testing journey, we were not clear on the things we should be measuring and what we need to achieve.

当我们开始进行负载测试之旅时,我们不清楚应该测量的内容以及需要实现的目标。

Initially we had only one goal in mind and that was to find the tipping point only by variation of all the below mentioned parameters.

最初,我们只有一个目标,那就是仅通过改变下面提到的所有参数来找到临界点。

I maintained a table of all the results for the various load tests that we gave. All in all I gave over 500 test runs to get to the ultimate result. As you can clearly see, there are a lot of moving parts to each and every test.

我维护了一张表,列出了我们给出的各种负载测试的所有结果。 我总共进行了500多次测试,以获得最终结果。 您可以清楚地看到,每个测试都有很多活动部件。

单一客户问题 (Single Client issues)

We started seeing that the client was becoming bottleneck as we kept on increasing our requests per second. Apache bench uses a single core and from the documentation it is evident that it does not provide any feature for using multiple cores.

我们开始看到,随着我们不断增加每秒的请求量,客户端正在成为瓶颈。 Apache Bench使用单个内核,从文档中可以明显看出,它不提供使用多个内核的任何功能。

To run multiple clients efficiently we found an interesting linux utility called Parallel. As the name suggests, it helps you run multiple commands in parallel and utilises multiple cores. Exactly what we wanted.

为了有效地运行多个客户端,我们找到了一个有趣的linux实用程序,称为Parallel 。 顾名思义,它可以帮助您并行运行多个命令并利用多个内核。 正是我们想要的。

Have a look at a sample command that runs multiple clients using parallel.

看一下使用并行运行多个客户端的示例命令。

cat hosts.txt |  parallel  'ab  -S -p post_smaller.txt -T application/json -n 100000 -c 3000 {}'
sachinm@ip-192-168-0-124:~$ cat hosts.txthttp://test.haproxy.in:80/pinghttp://test.haproxy.in:80/pinghttp://test.haproxy.in:80/ping

The above command would run 3 ab clients hitting the same URL. This helped us remove the client side bottleneck.

上面的命令将运行3个Ab客户端,并击中相同的URL。 这帮助我们消除了客户端瓶颈。

睡眠和时间参数 (The Sleep and Times parameter)

We talked about some parameters in ganglia that we wanted to track. Lets discuss them once by one.

我们讨论了要跟踪的神经节中的一些参数。 让我们一次一次地讨论它们。

  1. packets sent and received This can be simulated by sending some data as a part of the post request. This would also help us generate some network as well as bytes sent and received portions in ganglia

    packets sent and received这可以通过将某些数据作为后请求的一部分来发送来模拟。 这也将帮助我们network as well as bytes sent and received portions in ganglia生成一些network as well as bytes sent and received portions in ganglia

  2. tcp_established This is something which took us a long, long time to actually simulate in our scenario. Imagine if a single ping request takes about a second, that would take us about 700k requests per second to reach our tcp_established milestone.

    tcp_established这使我们花了很长时间才能在我们的场景中进行实际模拟。 想象一下,如果单个ping请求大约需要一秒钟,那么每秒将需要大约70万个请求才能达到tcp_树立的里程碑。

    Now this number might seem easier to achieve on production, but it was impossible to generate it in our scenario.

    现在,在生产中似乎更容易实现此数字,但是在我们的方案中无法生成它。

What did we do you might ask? We introduced a sleep parameter in our POST call that specifies the number of milliseconds the server needs to sleep before sending out a response. This would simulate a long running request on production. So now say we have a sleep of about 20 minutes (Yep), that would take us around 583 requests per second to reach the 700k mark.

您可能会问什么? 我们在POST调用中引入了sleep参数,该参数指定服务器在发出响应之前需要Hibernate的毫秒数。 这将模拟长期运行的生产请求。 因此,现在说我们有大约20分钟的睡眠(是),这将使我们每秒大约有583个请求达到70万个标记。

Additionally, we also introduced another parameter in our POST calls to the HAProxy and that was the times parameter. That specified number of times the server should write a response on the tcp connection before terminating it. This helped us simulated even more data transferred over the wire.

此外,我们还在对HAProxy的POST调用中引入了另一个参数,它是times参数。 服务器在终止连接之前应在tcp连接上写入响应的指定次数。 这帮助我们模拟了通过导线传输的更多数据。

apache bench的问题 (Issues with apache bench)

Although we found out a lot of results with apache bench, we also faced a lot of issues along the way. I won’t be mentioning all of them here as they are not important for this post as I’ll be introducing another client shortly.

尽管我们在apache bench上发现了很多结果,但在此过程中我们也遇到了很多问题。 我不会在这里提及所有这些人,因为它们对于这篇文章并不重要,因为我将很快介绍另一个客户。

We were pretty content with the numbers we were getting out of apache bench, but at one point of time, generating the required tcp connections just became impossible. Somehow the apache bench was not handling the sleep parameter we had introduced, properly and was not scaling for us.

我们对从Apache替补席上获得的数字感到非常满意,但是在某一时刻,生成所需的tcp连接就变得不可能了。 不知何故,Apache工作台无法正确处理我们介绍的睡眠参数,并且无法为我们扩展规模。

Although running multiple ab clients on a single machine was sorted out by using the parallel utility. Running this setup across multiple client machines was still a pain for us. I had not heard of the pdsh utility by then and was practically stuck.

尽管使用并行实用程序可以在一台计算机上运行多个Ab客户端。 在多台客户端计算机上运行此设置仍然令我们感到痛苦。 那时我还没有听说过pdsh实用程序,并且几乎陷入了困境。

Also, we were not focussing on any timeouts as well. There are some default set of timeouts on the HAProxy, the ab client and the server and we had completely ignored these. We figured out a lot of things along the way and organized ourselves a lot on how to go about testing.

另外,我们也不关注任何超时。 HAProxy,ab客户端和服务器上有一些默认的超时设置,我们已经完全忽略了它们。 在此过程中,我们发现了很多事情,并组织了很多关于如何进行测试的工作。

We used to talk about the tipping point graph but we deviated a lot from it as time went on. Meaningful results, however, could only be found by focusing on that.

我们曾经谈论过临界点图,但是随着时间的流逝,我们偏离了很多。 但是,只有专注于此,才能找到有意义的结果。

With apache bench a point came where the number of TCP connections were not increasing. We had around 40–45 clients running on 5–6 different client boxes but were not able to achieve the scale we wanted. Theoretically, the number of TCP connections should have jumped as we went on increasing the sleep time, but it wasn’t working for us.

在使用Apache Bench的情况下,TCP连接的数量没有增加。 我们有大约40-45个客户端在5-6个不同的客户端盒上运行,但无法达到我们想要的规模。 从理论上讲,随着我们增加睡眠时间,TCP连接的数量应该会增加,但这对我们没有用。

输入贝吉塔 (Enter Vegeta)

I was searching for some other load testing tools that might be more scalable and better functionality wise as compared to apache bench when I came across Vegeta.

当我遇到V egeta时,我正在寻找其他一些负载测试工具,这些负载测试工具与apache bench相比可能更具可扩展性和更好的功能。

From my personal experience, I have seen Vegeta to be extremely scalable and provides much better functionality as compared to apache bench. A single Vegeta client was able to produce the level of throughput equivalent to 15 apache bench clients in our load test.

从我的个人经验来看,我看到Vegeta具有极好的可扩展性,并且与apache bench相比提供了更好的功能。 在我们的负载测试中,单个Vegeta客户端能够产生相当于15个apache Bench客户端的吞吐量水平。

Moving forward, I will be providing load test results that have been tested using Vegeta itself.

展望未来,我将提供已使用Vegeta本身测试过的负载测试结果。

用Vegeta进行负载测试 (Load Testing with Vegeta)

First, have a look at the command that we used to run a single Vegeta client. Interestingly, the command to put load on the backend servers is called attack :p

首先,看看我们用于运行单个Vegeta客户端的命令。 有趣的是,将负载加载到后端服务器上的命令称为attack :p

echo "POST https://test.haproxy.in:443/ping" | vegeta -cpus=32 attack -duration=10m  -header="sleep:30000"  -body=post_smaller.txt -rate=2000 -workers=500  | tee reports.bin | vegeta report

Just love the parameters provided by Vegeta. Let’s have a look at some of these below.

只是喜欢Vegeta提供的参数。 让我们在下面看看其中的一些。

  1. -cpus=32 Specifies the number of cores to be used by this client. We had to expand our client machines to 32core, 64Gig because of the amount of load to be generated. If you look closely above, the rate isn’t much. But it becomes difficult to sustain such a load when a lot of connections are in sleep state from the server end.

    -cpus=32指定此客户端要使用的核心数。 由于要生成的负载量,我们不得不将客户端计算机扩展到32core,64Gig。 如果您在上面仔细看,这个比率并不高。 但是,当许多连接从服务器端进入睡眠状态时,很难承受这样的负载。

  2. -duration=10m I guess this is self explanatory. If you don’t specify any duration, the test will run forever.

    -duration=10m我猜这是自我解释。 如果您未指定任何持续时间,则测试将永远运行。

  3. -rate=2000 The number of requests per second.

    -rate=2000每秒的请求数。

So as you can see above, we reached a hefty 32k requests per second on a mere 4 core machine. If you remember the tipping point graph, you will be able to notice it clearly enough above. So the tipping point in this case is 31.5k Non SSL requests.

因此,正如您在上面看到的那样,我们在仅4核计算机上就达到了每秒32,000个请求。 如果您记得临界点图,您将能够在上面足够清楚地注意到它。 因此,在这种情况下的转折点是31.5k非SSL请求。

Have a look at some more results from the load test.

看一下负载测试的更多结果。

16k SSL connections is also not bad at all. Please note that at this point in our load testing journey, we had to start from scratch because we had adopted a new client and it was giving us way better results than ab. So we had to do a lot of stuff again.

16k SSL连接也不错。 请注意,在我们的负载测试过程中,我们必须从头开始,因为我们已经采用了一个新客户,这给了我们比ab更好的结果。 因此,我们不得不再次做很多事情。

An increase in the number of cores led to an increase in the number of requests per second that the machine can take before the CPU limit is hit.

内核数量的增加导致机器在达到CPU限制之前每秒可以处理的请求数量增加。

We found that there wasn’t a substantial increase in the number of requests per second if we increased the number of cores from 8 to 16. Also, if we finally decided to go with a 8 core machine in production, we would never allocate all of the cores to HAProxy or be it a any other process for that matter. So we decided to perform some tests with 6 cores as well to see if we had acceptable numbers.

我们发现,如果我们将核心数量从8个增加到16个,则每秒的请求数量并没有实质性的增加。此外,如果我们最终决定在生产中使用8核心机器,我们将永远不会分配全部HAProxy的核心或与此相关的任何其他过程。 因此,我们决定对6个内核也进行一些测试,以查看是否有可接受的数字。

Not bad.

不错。

介绍睡眠 (Introducing the sleep)

We were pretty satisfied with our load test results till now. However, this did not simulate the real production scenario. That happened when we introduced a sleep time as well which was absent till now in our tests.

到目前为止,我们对负载测试结果感到非常满意。 但是,这并未模拟实际的生产方案。 发生这种情况的原因是,我们还引入了睡眠时间,直到现在我们的测试中还没有。

echo "POST https://test.haproxy.in:443/ping" | vegeta -cpus=32 attack -duration=10m  -header="sleep:1000"  -body=post_smaller.txt-rate=2000 -workers=500  | tee reports.bin | vegeta report

So a sleep time of 1000 milliseconds would lead to server sleeping for x amount of time where 0< x <; 1000 and is selected randomly. So on an average the above load test will give a latency of ≥ 500ms

因此,睡眠时间为1000毫秒将导致服务器睡眠x的时间量,其中0< x < ; 1000,并且是随机选择的。 因此,平均而言,上述负载测试会产生≥500ms的延迟

The numbers in the last cell represent

最后一个单元格中的数字代表

TCP established, Packets Rec, Packets Sent

respectively. As you can clearly see the max requests per second that the 6 core machine can support has decreased to 8k from 20k. Clearly, the sleep has its impact and that impact is the increase in the number of TCP connections established. This is however nowhere near to the 700k mark that we set out to achieve.

分别。 您可以清楚地看到6核计算机可以支持的每秒最大请求数已从20k降至8k。 显然,睡眠会产生影响,而这种影响就是建立的TCP连接数量的增加。 但是,这离我们要实现的70万大关还差得很远。

里程碑1 (Milestone 1)

How do we increase the number of TCP connections? Simple, we keep on increasing the sleep time and they should rise. We kept playing around with the sleep time and we stopped at the 60 seconds sleep time. That would mean an average latency of around 30 sec.

我们如何增加TCP连接数? 很简单,我们会继续增加睡眠时间,并且应该增加睡眠时间。 我们一直在玩弄睡眠时间,然后停在60秒的睡眠时间。 这意味着平均延迟约为30秒。

There is an interesting result parameter that Vegeta provides and that is % of requests successful. We saw that with the above sleep time, only 50% of the calls were succeeding. See the results below.

Vegeta提供了一个有趣的结果参数,它是请求成功的百分比。 我们看到,在上述睡眠时间下,只有50%的呼叫成功。 请参阅下面的结果。

We achieved a whooping 400k TCP established connections with 8k requests per second and 60000 ms sleep time. The R in 60000R means Random.

我们以每秒8k个请求和60000 ms的睡眠时间实现了惊人的400k TCP建立的连接。 60000R中的R表示随机。

The first real discovery we made was that there is a default call timeout in Vegeta which is of 30 seconds and that explained why 50% of our calls were failing. So we increased that to about 70s for our further tests and kept on varying it as and when the need arose.

我们第一个真正的发现是Vegeta中有一个默认的呼叫超时,该超时为30秒,这解释了为什么50%的呼叫失败。 因此,在进行进一步测试时,我们将其增加到70s左右,并在需要时不断进行更改。

We hit the 700k mark easily after tweaking the timeout value from the client end. The only problem with this was that these were not consistent. These were just peaks. So the system hit a peak of 600k or 700k but did not stay there for very long.

在调整了客户端的超时值后,我们轻松达到了700k标记。 唯一的问题是这些不一致。 这些只是高峰。 因此,系统达到了600k或700k的峰值,但并没有停留很长时间。

We however wanted something similar to this

但是我们想要类似的东西

This shows a stable state where 780k connections are maintained. If you look closely at the stats above, the number of requests per second are very high. On production however, we have much less number of requests (somewhere around 300) on a single HAProxy machine.

这表明保持了780k个连接的稳定状态。 如果您仔细查看上述统计信息,则每秒的请求数量非常高。 但是,在生产中,在一台HAProxy机器上,我们的请求数量要少得多(大约300个)。

We were sure that if we drastically reduced the number of HAProxies we have on production (somewhere around 30, which means 30*300 ~ 9k connects per second) we will hit the machine limits w.r.t. the number of TCP connections first and not the CPU.

我们确信,如果我们大幅度减少生产中使用的HAProxies的数量(大约30左右,这意味着每秒30 * 300〜9k个连接),我们将首先通过TCP连接而不是CPU达到机器限制。

So we decided to achieve 900 requests per second and 30MB/s Network and 2.1Million TCP established connections. We agreed upon these numbers as these would be 3 times our production load on a single HAProxy.
因此,我们决定实现每秒900个请求和30MB / s的网络以及210万个TCP建立的连接。 我们同意这些数字,因为这将是单个HAProxy上的生产负载的3倍。

Plus, till now we had settled on 6 cores being used by HAProxy. We wanted to test out 3 cores only because this is what would be easiest for us to roll out on our production machines (Our production machines, as mentioned before are 4 core 30 Gig. So for rolling out changes with nbproc = 3 would be easiest for us.

另外,到目前为止,我们已经确定了HAProxy使用的6个内核。 我们只想测试3个核心,因为这是最容易在生产机器上推出的产品(如前所述,我们的生产机器是4核心30 Gig。因此,使用nbproc = 3推出更改最容易为我们。

REMEMBER the machine we had at this point in time was 16 core 30 Gig machine with 3 cores being allocated to HAProxy.

里程碑2 (Milestone 2)

Now that we had max limits on requests per second that different variations in machine configuration could support, we only had one task left as mentioned above.

现在,我们对机器配置的不同变化可以支持的每秒请求数有了最大限制,如上所述,我们只剩下一项任务。

Achieve 3X the production load which is

实现3倍的生产负荷

  • 900 requests per second

    每秒900个请求
  • 2.1 million TCP established and

    建立了210万个TCP
  • 30 MB/s network.

    30 MB / s的网络。

We got stuck yet again as the TCP established were taking a hard hit at 220k. No matter what the number of client machines or what the sleep time was, number of TCP connections seemed to have stuck there.

建立的TCP在220k时遭受了重创,我们再次陷入困境。 无论客户端计算机的数量或睡眠时间是多少,TCP连接的数量似乎都停留在那里。

Let’s look at some calculations. 220k TCP established connections and 900 requests per second = 110,000 / 900 ~= 120 seconds .I took 110k because 220k connections include both incoming and outgoing. So it’s two way.

让我们看一些计算。 220k TCP建立的连接和每秒900个请求= 110,000 / 900〜= 120秒。我花了110k,因为220k连接包括传入和传出。 所以这是两种方式。

Our doubt about 2 minutes being a limit somewhere in the system was verified when we introduced logs on the HAProxy side. We could see 120000 ms as total time for a lot of connections in the logs.

当我们在HAProxy端引入日志时,我们就怀疑2分钟是系统中某个限制的疑问。 我们可以在日志中看到120000 ms作为很多连接的总时间。

Mar 23 13:24:24 localhost haproxy[53750]: 172.168.0.232:48380 [23/Mar/2017:13:22:22.686] api~ api-backend/http31 39/0/2062/-1/122101 -1 0 - - SD-- 1714/1714/1678/35/0 0/0 {0,"",""} "POST /ping HTTP/1.1"
122101 is the timeout value. See HAProxy documentation on meanings of all these values.

On investigating further we found out that NodeJs has a default request timeout of 2 minutes. Voila !

在进一步调查中,我们发现NodeJ的默认请求超时为2分钟。 瞧!

how to modify the nodejs request default timeout time?I was using nodejs request, the default timeout of nodejs http is 120000 ms, but it is not enough for me, while my…stackoverflow.comHTTP | Node.js v7.8.0 DocumentationThe HTTP interfaces in Node.js are designed to support many features of the protocol which have been traditionally…nodejs.org

如何修改Node.js请求的默认超时时间? 我使用要求的NodeJS,HTTP的NodeJS的默认超时为120000毫秒,但它是不够的我,而我...... stackoverflow.com HTTP | Node.js v7.8.0文档Node.js中 的HTTP接口被设计为支持协议的许多功能,这些功能传统上是… nodejs.org

But our happiness was apparently short lived. At 1.3 million, the HAProxy connections suddenly dropped to 0 and started increasing again. We soon checked the dmesg command that provided us some useful kernel level information for our HAProxy process.

但是我们的幸福显然是短暂的。 HAProxy的连接数为130万,突然下降到0,然后又开始增加。 我们很快检查了dmesg命令,该命令为我们的HAProxy进程提供了一些有用的内核级别信息。

Basically, the HAProxy process had gone out of memory. So we decided to increase the machine RAM and we shifted to 16 core 64 Gig machine with nbproc = 3 and because of this change we were able to reach 2.4 million connections.

基本上,HAProxy进程内存不足。 因此,我们决定增加机器RAM,并转移到nbproc = 3 16核64 Gig机器上,由于这一更改,我们能够达到240万个连接。

后端代码 (Backend Code)

Following is the backend server code that was being used. We had also used statsd in the server code to get consolidated data on requests per second that were being received by the client.

以下是正在使用的后端服务器代码。 我们还在服务器代码中使用了statsd来获取客户端每秒收到的请求的合并数据。

var http = require('http');var createStatsd = require('uber-statsd-client');qs = require('querystring');
var sdc = createStatsd({host: '172.168.0.134',port: 8125});
var argv = process.argv;var port = argv[2];
function randomIntInc (low, high){    return Math.floor(Math.random() * (high - low + 1) + low);}
function sendResponse(res,times, old_sleep){    res.write('pong');    if(times==0)    {        res.end();    }    else    {         sleep = randomIntInc(0, old_sleep+1);        setTimeout(sendResponse, sleep, res,times-1, old_sleep);    }}
var server = http.createServer(function(req, res){   headers = req.headers;   old_sleep = parseInt(headers["sleep"]);   times = headers["times"] || 0;   sleep = randomIntInc(0, old_sleep+1);   console.log(sleep);   sdc.increment("ssl.server.http");   res.writeHead(200);   setTimeout(sendResponse, sleep, res, times, old_sleep)
});
server.timeout = 3600000;server.listen(port);

We also had a small script to run multiple backend servers. We had 8 machines with 10 backend servers EACH (yeah !). We literally took the idea of clients and backend servers being infinite for the load test, seriously.

我们还有一个小的脚本可以运行多个后端服务器。 我们有8台计算机,每个服务器有10个后端服务器(是的!)。 从字面上看,我们认真考虑了客户端和后端服务器对于负载测试而言是无限的想法。

counter=0while [ $counter -le 9 ]do   port=$((8282+$counter))   nodejs /opt/local/share/test-tools/HikeCLI/nodeclient/httpserver.js $port &   echo "Server created on port "  $port
((counter++))done
echo "Created all servers"

客户代码 (Client Code)

As for the client, there was a limitation of 63k TCP connections per IP. If you are not sure about this concept, please refer my previous article in this series.

对于客户端,每个IP的TCP连接数限制为63k。 如果您不确定这个概念,请参阅本系列的上一篇文章

So in order to achieve 2.4 million connections (two sided which is 1.2 million from the client machines), we needed somewhere around 20 machines. Its a pain really to run the Vegeta command on all 20 machines one by one and even of you found a way to do that using something like csshx, you still would need something to combine all the results from all the Vegeta clients.

因此,为了实现240万个连接(两侧是来自客户端计算机的120万个),我们需要大约20台计算机。 在所有20台计算机上一一运行Vegeta命令确实很痛苦,甚至您甚至发现了一种使用csshx之类的方法来执行此操作的方法,您仍然需要一些方法来组合所有Vegeta客户端的所有结果。

Check out the script below.

查看下面的脚本。

result_file=$1
declare -a machines=("172.168.0.138" "172.168.0.141" "172.168.0.142" "172.168.0.18" "172.168.0.5" "172.168.0.122" "172.168.0.123" "172.168.0.124" "172.168.0.232" " 172.168.0.244" "172.168.0.170" "172.168.0.179" "172.168.0.59" "172.168.0.68" "172.168.0.137" "172.168.0.155" "172.168.0.154" "172.168.0.45" "172.168.0.136" "172.168.0.143")
bins=""commas=""
for i in "${machines[@]}"; do bins=$bins","$i".bin"; commas=$commas","$i;  done;
bins=${bins:1}commas=${commas:1}
pdsh -b -w "$commas" 'echo "POST http://test.haproxy.in:80/ping" | /home/sachinm/.linuxbrew/bin/vegeta -cpus=32 attack -connections=1000000 -header="sleep:20" -header="times:2" -body=post_smaller.txt -timeout=2h -rate=3000 -workers=500 > ' $result_file
for i in "${machines[@]}"; do  scp sachinm@$i:/home/sachinm/$result_file $i.bin ; done;
vegeta report -inputs="$bins"

Apparently, Vegeta provides information on this utility called pdsh that lets you run a command concurrently on multiple machines remotely . Additionally, the Vegeta allows us to combine multiple results into one and that’s really all we wanted.

显然,Vegeta提供了有关名为pdsh的实用程序的信息,该信息使您可以在多台计算机上远程同时运行命令。 此外,Vegeta允许我们将多个结果合并为一个,这正是我们想要的。

HAProxy配置 (HAProxy Configuration)

This is probably what you came here looking for, below is the HAProxy config that we used in our load test runs. The most important part being that of the nbproc setting and the maxconn setting. The maxconn setting allows us to provide the maximum number of TCP connections that the HAProxy can support overall (one way).

这可能就是您来这里寻找的内容,下面是我们在负载测试运行中使用的HAProxy配置。 最重要的部分是nbproc设置和maxconn设置。 maxconn设置允许我们提供HAProxy总体可以支持的最大TCP连接数(一种方式)。

Changes to maxconn setting leads to increase in HAProxy process’ ulimit. Take a look below

maxconn设置的更改会导致HAProxy进程的ulimit增加。 看下面

The max open files has increased to 4 million because of the max connections for HAProxy being set at 2 million. Neat !

由于HAProxy的最大连接数设置为200万,因此最大打开文件数已增加到400万。 干净!

Check the article below for a whole lot of HAProxy optimisations that you can and should do to achieve the kind of stats we achieved.

请查看下面的文章,以获取大量和完整的HAProxy优化,您可以并且应该这样做以实现我们获得的统计数据。

Use HAProxy to load balance 300k concurrent tcp socket connections: Port Exhaustion, Keep-alive and…I'm trying to build up a push system recently. To increase the scalability of the system, the best practice is to make…www.linangran.com

使用HAProxy负载平衡30万个并发tcp套接字连接:端口耗尽,保持活动状态和… 我最近正在尝试构建推送系统。 为了增加系统的可伸缩性,最佳实践是使... www.linangran.com

The http30 goes on to http83 :p

http30继续到http83:p

That’s all for now folks. If you’ve it so far, I’m truly amazed :)

现在只剩下这些了。 如果到目前为止,我真的很惊讶:)

A special shout out to Dheeraj Kumar Sidana who helped us all the way through this and without whose help we would not have been able to reach any meaningful results. :)

特别向Dheeraj Kumar Sidana呐喊 ,他一路为我们提供帮助,如果没有他的帮助,我们将无法取得任何有意义的成果。 :)

Do let me know how this blog post helped you. Also, please recommend (❤) and spread the love as much as possible for this post if you think this might be useful for someone.

让我知道此博客文章如何帮助您。 另外,如果您认为这可能对某人有用,请推荐(❤)并尽可能多地散布这篇文章的爱情。

翻译自: https://www.freecodecamp.org/news/how-we-fine-tuned-haproxy-to-achieve-2-000-000-concurrent-ssl-connections-d017e61a4d27/

haproxy ssl

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值