网络丢包问题处理

最新推荐文章于 2022-03-08 02:54:56 发布

gao1738

最新推荐文章于 2022-03-08 02:54:56 发布

阅读量858

点赞数

分类专栏：工作学习

本文链接：https://blog.csdn.net/gao1738/article/details/42839729

版权

工作学习专栏收录该内容

30 篇文章 0 订阅

订阅专栏

最近测试过程中发现数据库中间件程序会出现网络丢包。具体测试工具为mysqlslap。

发现执行过程中当并发数达到一定程度时，有一定概率会出现mysqlslap一直hold住，无法返回。

测试语句为：
[root@db_slave1 cwinfocenter]# mysqlslap --concurrency=300,300,300,400,500 --number-of-queries=6000 --iterations=1 --create-schema=chinaweather_infocenter -h172.16.80.71 -P3307 -uroot -p111111 --query=test4.sql
Benchmark
Average number of seconds to run all queries: 2.613 seconds
Minimum number of seconds to run all queries: 2.613 seconds
Maximum number of seconds to run all queries: 2.613 seconds
Number of clients running queries: 300
Average number of queries per client: 20

Benchmark
Average number of seconds to run all queries: 2.677 seconds
Minimum number of seconds to run all queries: 2.677 seconds
Maximum number of seconds to run all queries: 2.677 seconds
Number of clients running queries: 300
Average number of queries per client: 20

Benchmark
Average number of seconds to run all queries: 2.689 seconds
Minimum number of seconds to run all queries: 2.689 seconds
Maximum number of seconds to run all queries: 2.689 seconds
Number of clients running queries: 300
Average number of queries per client: 20

Benchmark
Average number of seconds to run all queries: 2.906 seconds
Minimum number of seconds to run all queries: 2.906 seconds
Maximum number of seconds to run all queries: 2.906 seconds
Number of clients running queries: 400
Average number of queries per client: 15

并发到500的时候mysqlslap一直不返回。
[root@db_slave1 cwinfocenter]# ps -eLf | grep mysqldslap >/tmp/ps-slap

发现有大约93个线程没有返回，使用pstack跟踪未返回线程：
[root@db_slave1 cwinfocenter]# pstack 23085
Thread 1 (process 23085):
#0 0x0000003259e0e54d in read () from /lib64/libpthread.so.0
#1 0x000000000042a002 in vio_read_buff ()
#2 0x000000000041a659 in my_real_read(st_net*, unsigned long*) ()
#3 0x000000000041aa34 in my_net_read ()
#4 0x000000000041498a in cli_safe_read ()
#5 0x0000000000416938 in mysql_real_connect ()
#6 0x0000000000408a0d in slap_connect ()
#7 0x000000000040c5b6 in run_task ()
#8 0x0000003259e07851 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003259ae767d in clone () from /lib64/libc.so.6

发现mysqlslap的现场是hold在connect上了，那就是连接包丢失了。

修改中间件程序的操作系统配置，调高句柄数和backlog：
ulimit -n 10240
echo 20480 > /proc/sys/net/ipv4/tcp_max_syn_backlog

再测发现还是有问题。。。
google之后发现，还有一个参数需要调整

echo 20480 > /proc/sys/net/core/somaxconn

具体原因（摘抄自网上）：

The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests.

上面这句要注意，现在他指的是已连接但未进行accept 处理的套接字，而不是syn的套接字，我一般设成64左右。所以现在关注的可能是 /proc/sys/net/core/somaxconn这个参数，而非tcp_,ax_sync_backlog,这个参数对一些防火墙应该有用（半syn攻击）

The maximum length of the queue for incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies are enabled there is no logical maximum length and this setting is ignored. Seetcp(7) for more information.

If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value; the default value in this file is 128. In kernels before 2.4.25, this limit was a hard coded value, SOMAXCONN, with the value 128.

修改somaxconn之后，测试就不会出现丢包了。

转载请注明转自高孝鑫的博客