rabbitmq-web-stomp 优化过程
基础环境
OS Centos 7 x86_64
Erlang 19.0.3
Rabbitmq 3.6.5
症状
OA系统显示有2200个人在线,但是Rabbitmq的连接数一直小于1300
IE8下会频繁出现连接错误的异常,重启rabbitmq后能够正常运行30分钟左右,问题重复出现
参考代码:
var stompClient=null;
function stompConnect(){
var ws=new SockJS('http://localhost:15674/stomp');
stompClient=Stomp.over(ws);
stompClient.connect('guest','guest',on_connected,on_error,'test');
}
var on_connected=function(result){
if(stompClient!=null){
stompClient.subscribe(pattern,function(body){
//todo
})
}
}
var on_error=function(x){
setTimeout(stompConnect,10000);
}
stompConnect();
当用户少的时候可能正常运行到on_connected,随着用户的增加部分用户会丢失与rabbitmq的连接,频繁的执行on_error方法
优化过程
查看系统的并发数量
root@alice:~ # netstat -an | awk '/^tcp/ { ++S[$NF]} END { for (a in S) print a, S[a]}'
LISTEN 19
ESTABLISHED 1674
FIN_WAIT1 1
FIN_WAIT2 537
TIME_WAIT 5295
并发数比较符合预期,OA系统的在线用户数在2200左右
查看15674端口打开的文件数数量
与15674端口建立连接的数量,OA系统的用户是通过web_stomp与rabbitmq进行交互
root@alice:~# lsof -i:5674|grep 'TCP'|wc -l
1034
经过多次执行 lsof -i:5674|wc -l发现这个数字永远小于等于1034,以OA系统2200在线用户的规模应该有可能会超过这个数值才对(猜测)
查看rabbitmq的错误日志
root@alice:~# tail /data/rabbitmq_server-3.6.5/var/log/rabbitmq/rabbit@alice.log -n 1000|grep 'ERROR' -B 2 -A 2
Error in process <0.12547.685> on node 'rabbit@alice' with exit value: {[{reason,{badmatch,{error,timeout}}},{mfa,{sockjs_cowboy_handler,handle,2}},{stacktrace,[{sockjs_http,body,1,[{file,"src/sockjs_http.erl"},{line,36}]},{sockjs_action,xhr_send,4,[{file,"src/sockjs_action.erl"},{line,144}]},{sockjs_handler...
官方文档描述了有关的web stomp和Ranch的配置,怀疑是不是因为Ranch的参数配置导致15674端口使用最大只能有1034个连接
Ranch配置有关的配置项似乎没有与连接数有关的配置。
Ranch 1.3 User Guide关于concurrent connections描述了最大连接数的限制
The max_connections transport option allows you to limit the number of concurrent connections.
It defaults to 1024. Its purpose is to prevent your system from being overloaded and
ensuring all the connections are handled optimally.
Customizing the maximum number of concurrent connections
{ok, _} = ranch:start_listener(tcp_echo, 100,
ranch_tcp, [{port, 5555}, {max_connections, 100}],
echo_protocol, []
).
You can disable this limit by setting its value to the atom infinity.
Disabling the limit for the number of connections
{ok, _} = ranch:start_listener(tcp_echo, 100,
ranch_tcp, [{port, 5555}, {max_connections, infinity}],
echo_protocol, []
).
The maximum number of connections is a soft limit. In practice,
it can reach max_connections + the number of acceptors.
When the maximum number of connections is reached, Ranch will stop accepting connections.
This will not result in further connections being rejected, as the kernel option allows queueing incoming connections.
The size of this queue is determined by the backlog option and defaults to 1024. Ranch does not know about the number of
connections that are in the backlog.
文中描述了max_connections是一个软的限制,实际限制是max_connections+acceptors的数量,acceptors默认是10所以通过
lsof -i:15674|grep TCP|wc -l
获取的值是1034
1034=1024(max_connections)+10(acceptors)
至此,可以通过设定max_connections来增加rabbitmq_web_stomp的并发限制
[
{rabbit,[
{hipe_compile,true},
{dis_free_limit,"5GB"}
]
},
{rabbitmq_web_stomp,[
{tcp_config,[
{backlog,3000},
{max_connections,5000}
]}
]
}
].
将最大连接数增加到5000,比较遗憾的是Ranch配置选项中没有明确指出max_connectioins的配置,导致走了不少弯路
优化结果
修改配置重启RabbitMQ服务后,不断监控日志的情况及15674端口的连接数量,比较正常
lsof -i:15674|grep TCP|wc -l
1683
root@alice:~# tail /data/rabbitmq_server-3.6.5/var/log/rabbitmq/rabbit@alice.log -n 1000|grep 'ERROR' -B 2 -A 2
无错误信息
总结
新系统上线对负载预估不足
对RabbitMQ熟悉程度有待加强