rails服务器处理并发调优

最新推荐文章于 2018-12-26 20:22:47 发布

saint1126

最新推荐文章于 2018-12-26 20:22:47 发布

阅读量3k

点赞数

文章标签：服务器 rails processing server apache path

本文链接：https://blog.csdn.net/saint1126/article/details/5891723

版权

背景：第五场展会开始后，并发量比较大，多次出现502错误。

日志中出现了大量的如下错误：

1722317 connect() to unix:/tmp/passenger.19461/master/helper_server.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 192.168.10.107, server: www.fwxgx.com, request: "GET /tje/exhibition/get_vote_number HTTP/1.0", upstream: "passenger://unix:/tmp/passenger.19461/master/helper_server.sock:", host: "www.fwxgx.com", referrer: http://www.fwxgx.com/tje/

发现 master服务器已经并发tcp数量达到 7000

Slave服务器并发tcp数量达到 700+

下面做了一些应急的处理方式，

1、    keepalive_timeout = 32 改成了 keepalive_timeout = 120

2、    此时发现定向到主服务器的upstream出现502的几率相当大。因此设置权重改变为3：7

3、    结果还是不行，最后取消了负载到主服务器，仅仅使用slave服务器。

4、    效果有所改善，但是秒杀开始的时候，依然会出现502错误。

5、    后来停掉了聊天服务器。

6、    基于产品的需要，又开启了聊天服务器，只不过更改了聊天服务器的发送频率由2s到5s，但还是没从根本上解决问题。

问题：
    通过日志分析，访问的人数和之前的几次访问人数相差不多，可是这次出现了比较严重的502错误。原因暂时还未查明。

    不过我通过我试着ab 测试来复现该错误。

ab –n 10000 –c 100 http://192.168.10.106/tje/exhibition/chatroom Top: Cpu(s): 87.0%us, 2.2%sy, 0.0%ni, 9.6%id, 0.0%wa, 0.2%hi, 1.0%si, 0.0%st Mem: 4041312k total, 3808708k used, 232604k free, 177584k buffers Swap: 6094840k total, 124k used, 6094716k free, 1297316k cached Server Software: nginx/0.7.65 Server Hostname: 192.168.10.106 Server Port: 8080 Document Path: /tje/exhibition/chatroom Document Length: 25288 bytes Concurrency Level: 100 Time taken for tests: 55.754484 seconds Complete requests: 10000 Failed requests: 2 (Connect: 0, Length: 2, Exceptions: 0) Write errors: 0 Total transferred: 259050873 bytes HTML transferred: 252879994 bytes Requests per second: 179.36 [#/sec] (mean) Time per request: 557.545 [ms] (mean) Time per request: 5.575 [ms] (mean, across all concurrent requests) Transfer rate: 4537.38 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 2 3.1 1 27 Processing: 130 552 91.1 550 970 Waiting: 124 540 90.8 538 954 Total: 130 554 90.4 552 971 Percentage of the requests served within a certain time (ms) 50% 552 66% 585 75% 609 80% 623 90% 666 95% 707 98% 756 99% 800 100% 971 (longest request)

通过上面的分析，只有两个请求错误，还可以接受。

当我随着加大并发量的时候，到了250个的时候，

ab -n 10000 -c 250 http://192.168.10.106:8080/tje/exhibition/chatroom This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking 192.168.10.106 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Finished 10000 requests Server Software: nginx/0.7.65 Server Hostname: 192.168.10.106 Server Port: 8080 Document Path: /tje/exhibition/chatroom Document Length: 25288 bytes Concurrency Level: 250 Time taken for tests: 25.648259 seconds Complete requests: 10000 Failed requests: 5885 (Connect: 0, Length: 5885, Exceptions: 0) Write errors: 0 Non-2xx responses: 5885 Total transferred: 108389086 bytes HTML transferred: 105078225 bytes Requests per second: 389.89 [#/sec] (mean) Time per request: 641.206 [ms] (mean) Time per request: 2.565 [ms] (mean, across all concurrent requests) Transfer rate: 4126.91 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 1 42.5 0 3001 Processing: 0 622 768.2 2 3432 Waiting: 0 619 764.8 2 3428 Total: 0 623 771.0 2 4526 Percentage of the requests served within a certain time (ms) 50% 2 66% 1361 75% 1415 80% 1442 90% 1509 95% 1631 98% 2049 99% 2780 100% 4526 (longest request)

ab -n 10000 -c 300 http://192.168.10.106:8080/tje/exhibition/chatroom

几乎90%以上的都是502了。

下面我的处理方式，修改了一个参数

在/etc/sysctl.conf下面添加了

net.core.somaxconn = 1024

然后执行 sysctl –p 使修改生效

重启web服务器。

测试：

ab -n 10000 -c 300 http://192.168.10.106:8080/tje/exhibition/chatroom This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking 192.168.10.106 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Finished 10000 requests Server Software: nginx/0.7.65 Server Hostname: 192.168.10.106 Server Port: 8080 Document Path: /tje/exhibition/chatroom Document Length: 25288 bytes Concurrency Level: 300 Time taken for tests: 59.251878 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 259052525 bytes HTML transferred: 252880000 bytes Requests per second: 168.77 [#/sec] (mean) Time per request: 1777.556 [ms] (mean) Time per request: 5.925 [ms] (mean, across all concurrent requests) Transfer rate: 4269.57 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 1 2.9 0 29 Processing: 1263 1749 270.8 1681 3989 Waiting: 1261 1742 269.8 1674 3986 Total: 1263 1750 272.1 1682 3998 Percentage of the requests served within a certain time (ms) 50% 1682 66% 1719 75% 1751 80% 1775 90% 1896 95% 2321 98% 2715 99% 3072 100% 3998 (longest request) 300并发无错误。测试下1000并发： ab -n 10000 -c 1000 http://192.168.10.106:8080/tje/exhibition/chatroom This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking 192.168.10.106 (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Finished 10000 requests Server Software: nginx/0.7.65 Server Hostname: 192.168.10.106 Server Port: 8080 Document Path: /tje/exhibition/chatroom Document Length: 25288 bytes Concurrency Level: 1000 Time taken for tests: 55.407599 seconds Complete requests: 10000 Failed requests: 1 (Connect: 0, Length: 1, Exceptions: 0) Write errors: 0 Total transferred: 259051877 bytes HTML transferred: 252879998 bytes Requests per second: 180.48 [#/sec] (mean) Time per request: 5540.760 [ms] (mean) Time per request: 5.541 [ms] (mean, across all concurrent requests) Transfer rate: 4565.80 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 5 14.0 1 80 Processing: 217 5250 996.2 5478 6269 Waiting: 200 5242 996.7 5471 6252 Total: 218 5255 989.6 5480 6269 Percentage of the requests served within a certain time (ms) 50% 5480 66% 5550 75% 5598 80% 5637 90% 5822 95% 5940 98% 5997 99% 6030 100% 6269 (longest request)

依然没问题，只是响应时间都变成5s以上了。

我看内存依然有剩余，我试着增大了passenger_max_pool_size 由32到40 。

在执行并发 100 ，300 ，1000 。

发现并发处理能力却没有提高，依然每秒 170次左右，如果下降到20，处理并发能力下降。

这个值的设定依据情况而定，如果railsapp本身占内存特别大，开大了反而不好。我保守按照80M-100M计算。

现在解决了并发出现502错误的问题。

那原理是什么呢

看到网络上很多说修改backlog的，其实passenger在2.2.6的时候已经修改了他的backlog。提升至 1024了

而且man 2 listen查询了解到这里的backlog实际上是完成三次握手后的tcp队列，换句话说这里是TCP已经建立，等待服务器accept的队列数目。

而我们应对并发的时候，很多客户端发送SYN j请求，服务器给与ACK j+1 应答并SYN k，客户端需要应答ACK k+1 ，这样，如果客户端不应答，或者来不及应答ACK k+1 就造成了半连接，这在并发高的系统中是常见的，linux有个队列来维持半连接，如果队列溢出，则拒绝服务，这就是DOS工具的基本原理。

因此我们需要修改半连接队列的长度，这里有两个地方，可以通过命令查看