目录
一、测试版本
名称 | 值 |
CPU | Intel(R) Core(TM) i5-1035G1 CPU @ 1.00GHz |
操作系统 | CentOS Linux release 7.9.2009 (Core) |
内存 | 3G |
逻辑核数 | 2 |
Gbase8a版本 | 8.6.2-R43 |
二、实验模拟
1、检查操作系统参数
[root@czg2 ~]# cat /proc/sys/net/ipv4/ip_local_port_range
60996 61000
[root@czg2 ~]# cat /proc/sys/net/ipv4/tcp_fin_timeout
60
参数名 | 描述 |
net.ipv4.ip_local_port_range | 决定了客户端的一个 ip 可用的端口数量。 |
net.ipv4.tcp_fin_timeout | 如果套接字由本端要求关闭,这个参数决定了它保持在FIN-WAIT-2状态的时间。意思就是客户端关闭了,需要保持一个等待状态一段时间,才释放。 |
net.ipv4.ip_local_port_range这个参数是为了模拟现象,而故意配置的这么小,60996到61000一共五个可用的端口数量。
2、gccli执行SQL
这里我们执行六次,复现报错。
[gbase@czg2 ~]$ gccli -e 'select * from czg.sun limit 10;'
+------+-----------------+-------+
| a | CURRENCY_TYPE_5 | moon2 |
+------+-----------------+-------+
| 光 | B | NULL |
+------+-----------------+-------+
[gbase@czg2 ~]$ gccli -e 'select * from czg.sun limit 10;'
+------+-----------------+-------+
| a | CURRENCY_TYPE_5 | moon2 |
+------+-----------------+-------+
| 光 | B | NULL |
+------+-----------------+-------+
[gbase@czg2 ~]$ gccli -e 'select * from czg.sun limit 10;'
+------+-----------------+-------+
| a | CURRENCY_TYPE_5 | moon2 |
+------+-----------------+-------+
| 光 | B | NULL |
+------+-----------------+-------+
[gbase@czg2 ~]$ gccli -e 'select * from czg.sun limit 10;'
+------+-----------------+-------+
| a | CURRENCY_TYPE_5 | moon2 |
+------+-----------------+-------+
| 光 | B | NULL |
+------+-----------------+-------+
[gbase@czg2 ~]$ gccli -e 'select * from czg.sun limit 10;'
+------+-----------------+-------+
| a | CURRENCY_TYPE_5 | moon2 |
+------+-----------------+-------+
| 光 | B | NULL |
+------+-----------------+-------+
[gbase@czg2 ~]$ gccli -e 'select * from czg.sun limit 10;'
ERROR 1708 (HY000) at line 1: [192.168.142.12:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Can't connect to GBase server on '::ffff:192.168.142.12' (99)
3、查看socket 统计信息
[root@czg2 ~]# ss -an |grep :5050
tcp LISTEN 0 32767 [::]:5050 [::]:*
tcp TIME-WAIT 0 0 [::ffff:192.168.142.12]:60999 [::ffff:192.168.142.12]:5050
tcp TIME-WAIT 0 0 [::ffff:192.168.142.12]:60996 [::ffff:192.168.142.12]:5050
tcp TIME-WAIT 0 0 [::ffff:192.168.142.12]:60998 [::ffff:192.168.142.12]:5050
tcp TIME-WAIT 0 0 [::ffff:192.168.142.12]:60997 [::ffff:192.168.142.12]:5050
tcp TIME-WAIT 0 0 [::ffff:192.168.142.12]:61000 [::ffff:192.168.142.12]:5050
LISTEN:被动打开。
ESTAB:数据发送状态。
TIME-WAIT:等待。
上面的三个状态解释,如果有不对的,请多纠正。
确实可以看到五个等待,那我们在排查这个问题时,就可以按照这个思路去排查。查看内核参数net.ipv4.ip_local_port_range再用ss,看是否差不多,如果差不多,可以初步判断是应用端在频繁的连服务器,并且每次连接后,都又在较短的时间里断开,导致出现过多的TIME_WAIT,消耗完了可用的端口号,以至于新的连接没办法绑定端口。
三、解决方法
我测试的版本报的是Can't connect to GBase server。有的版本报的是Cannot assign requested address。问题的原因都有可能是上面分析的结果。
1、修改/etc/sysctl.conf
修改参数net.ipv4.ip_local_port_range = 30000 61000,net.ipv4.tcp_fin_timeout = 30
[root@czg2 gcinstall]# cat /etc/sysctl.conf
kernel.core_uses_pid = 1
net.core.netdev_max_backlog = 262144
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_default = 8388608
net.core.wmem_max = 16777216
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_sack = 1
net.ipv4.ip_local_reserved_ports = 5050,5258,5288,6666,6268
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096 16384 4194304
vm.vfs_cache_pressure = 1024
vm.swappiness = 1
vm.overcommit_memory = 0
vm.zone_reclaim_mode = 0
vm.max_map_count = 188529
net.core.somaxconn = 32767
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_max_tw_buckets = 20000
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
#vm.min_free_kbytes = 4096 #Commented out by gcluster
vm.min_free_kbytes = 301645
net.ipv4.ip_local_port_range = 30000 61000
2、内核参数生效
[root@czg2 gcinstall]# sysctl -p
kernel.core_uses_pid = 1
net.core.netdev_max_backlog = 262144
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_default = 8388608
net.core.wmem_max = 16777216
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_sack = 1
net.ipv4.ip_local_reserved_ports = 5050,5258,5288,6666,6268
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096 16384 4194304
vm.vfs_cache_pressure = 1024
vm.swappiness = 1
vm.overcommit_memory = 0
vm.zone_reclaim_mode = 0
vm.max_map_count = 188529
net.core.somaxconn = 32767
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_max_tw_buckets = 20000
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
vm.min_free_kbytes = 301645
net.ipv4.ip_local_port_range = 30000 61000