网络丢包问题处理

最近测试过程中发现数据库中间件程序会出现网络丢包。具体测试工具为mysqlslap。

发现执行过程中当并发数达到一定程度时,有一定概率会出现mysqlslap一直hold住,无法返回。

测试语句为:
[root@db_slave1 cwinfocenter]# mysqlslap --concurrency=300,300,300,400,500 --number-of-queries=6000 --iterations=1  --create-schema=chinaweather_infocenter -h172.16.80.71 -P3307 -uroot -p111111 --query=test4.sql
Benchmark
      Average number of seconds to run all queries: 2.613 seconds
      Minimum number of seconds to run all queries: 2.613 seconds
      Maximum number of seconds to run all queries: 2.613 seconds
      Number of clients running queries: 300
      Average number of queries per client: 20

Benchmark
      Average number of seconds to run all queries: 2.677 seconds
      Minimum number of seconds to run all queries: 2.677 seconds
      Maximum number of seconds to run all queries: 2.677 seconds
      Number of clients running queries: 300
      Average number of queries per client: 20

Benchmark
      Average number of seconds to run all queries: 2.689 seconds
      Minimum number of seconds to run all queries: 2.689 seconds
      Maximum number of seconds to run all queries: 2.689 seconds
      Number of clients running queries: 300
      Average number of queries per client: 20

Benchmark
      Average number of seconds to run all queries: 2.906 seconds
      Minimum number of seconds to run all queries: 2.906 seconds
      Maximum number of seconds to run all queries: 2.906 seconds
      Number of clients running queries: 400
      Average number of queries per client: 15


并发到500的时候mysqlslap一直不返回。
[root@db_slave1 cwinfocenter]# ps -eLf | grep mysqldslap >/tmp/ps-slap

发现有大约93个线程没有返回,使用pstack跟踪未返回线程:
[root@db_slave1 cwinfocenter]# pstack 23085
Thread 1 (process 23085):
#0  0x0000003259e0e54d in read () from /lib64/libpthread.so.0
#1  0x000000000042a002 in vio_read_buff ()
#2  0x000000000041a659 in my_real_read(st_net*, unsigned long*) ()
#3  0x000000000041aa34 in my_net_read ()
#4  0x000000000041498a in cli_safe_read ()
#5  0x0000000000416938 in mysql_real_connect ()
#6  0x0000000000408a0d in slap_connect ()
#7  0x000000000040c5b6 in run_task ()
#8  0x0000003259e07851 in start_thread () from /lib64/libpthread.so.0
#9  0x0000003259ae767d in clone () from /lib64/libc.so.6

发现mysqlslap的现场是hold在connect上了,那就是连接包丢失了。

修改中间件程序的操作系统配置,调高句柄数和backlog:
  ulimit -n 10240
echo 20480 > /proc/sys/net/ipv4/tcp_max_syn_backlog

再测发现还是有问题。。。
google之后发现,还有一个参数需要调整

echo 20480 > /proc/sys/net/core/somaxconn

具体原因(摘抄自网上):

The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests.

上面这句要注意,现在他指的是已连接但未进行accept 处理的套接字,而不是syn的套接字,我一般设成64左右。所以现在关注的可能是 /proc/sys/net/core/somaxconn这个参数,而非tcp_,ax_sync_backlog,这个参数对一些防火墙应该有用(半syn攻击)

The maximum length of the queue for incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies are enabled there is no logical maximum length and this setting is ignored. Seetcp(7) for more information.

If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value; the default value in this file is 128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with the value 128.


修改somaxconn之后,测试就不会出现丢包了。


转载请注明转自高孝鑫的博客


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值