回收非正常状态的socket (FIN_WAIT_2)
症状描述:前置机收发某客户的报文时经常无端中止,后使用netstat查看发现我们与该客户的连接数超过了1000,且绝大的连接状态都是FIN_WAIT_2.
#netstat -an|grep 10.116.50.30
tcp 0 0 192.168.129.44.64306 10.116.50.30.53081 FIN_WAIT_2
tcp 0 0 192.168.129.44.49734 10.116.50.30.53660 FIN_WAIT_2
tcp 0 0 192.168.129.44.63611 10.116.50.30.57966 FIN_WAIT_2
tcp 0 0 192.168.129.44.63416 10.116.50.30.57946 FIN_WAIT_2
tcp 0 0 192.168.129.44.57835 10.116.50.30.49188 FIN_WAIT_2
tcp 0 0 192.168.129.44.57502 10.116.50.30.52615 ESTABLISHED
tcp 0 0 192.168.129.44.50387 10.116.50.30.58301 FIN_WAIT_2
tcp 0 0 192.168.129.44.53297 10.116.50.30.53943 FIN_WAIT_2
tcp 0 0 192.168.129.44.55202 10.116.50.30.54141 FIN_WAIT_2
关于TCP的连接状态,参见 点此链接
(引用)TCP状态
起初每个socket都是CLOSED状态,当客户端初使化一个连接,他发送一个SYN包到服务器,客户端进入SYN_SENT状态。
服务器接收到SYN包,反馈一个SYN-ACK包,客户端接收后返馈一个ACK包客户端变成ESTABLISHED状态,如果长时间没收到SYN-ACK包,客户端超时进入CLOSED状态。
当服务器绑定并监听某一端口时,socket的状态是LISTEN,当客户企图建立连接时,服务器收到一个SYN包,并反馈SYN-ACK包。服务器状态变成SYN_RCVD,当客户端发送一个ACK包时,服务器socket变成ESTABLISHED状态。
当一个程序在ESTABLISHED状态时有两种图径关闭它, 第一是主动关闭,第二是被动关闭。
如果你要主动关闭的话,发送一个FIN包。当你的程序closesocket或者shutdown(标记),你的程序发 送一个FIN包到peer,你的socket变成FIN_WAIT_1状态。peer反馈一个ACK包,你的socket进入FIN_WAIT_2状态。 如果peer也在关闭连接,那么它将发送一个FIN包到你的电脑,你反馈一个ACK包,并转成TIME_WAIT状态。TIME_WAIT状态又号2MSL等待状态。MSL意思是最大段生命周期(Maximum Segment Lifetime)表明一个包存在于网络上到被丢弃之间的时间。每个IP包有一个TTL(time_to_live),当它减到0时则包被丢弃。每个路由 器使TTL减一并且传送该包。当一个程序进入TIME_WAIT状态时,他有2个MSL的时间,这个充许TCP重发最后的ACK,万一最后的ACK丢失 了,使得FIN被重新传输。在2MSL等待状态完成后,socket进入CLOSED状态。
被动关闭:当程序收到一个FIN包从peer,并 反馈一个ACK包,于是程序的socket转入CLOSE_WAIT状态。因为peer已经关闭了,所以不能发任何消息了。但程序还可以。要关闭连接,程 序自已发送给自已FIN,使程序的TCP socket状态变成LAST_ACK状态,当程序从peer收到ACK包时,程序进入CLOSED状态。
TCP终止连接采用的是四次握手,如下图。
FIN_WAIT_1:client发出fin以后,状态更新为fin_wait_1,server接收到来自client的fin以后状态也更改为fin_wait_1,立刻发送ack,
产生FIN_WAIT_2的实质是只完成了一次fin-ack的过程以后,client一直在等待来自server的第二次fin,但由于对端负荷过重,或者连接异常终止,导致对端无法发送FIN. 官方的说法是:Socket closed, waiting for shutdown from remote.
Client Server
1 ----------------------------FIN------------------------------>
FIN_WAIT_1 FIN_WAIT_1
2 FIN_WAIT_2
3 4 ----------------------------ACK------------------------------>
TIME_WAIT
Impact:
If too many FIN_WAIT_2 sessions build up, it can fill up the space allocated for storing connection information and crash the Kernel.
Resolution or workaround:
The right way to handle this problem is for the TCP/IP stack to have a fin_wait2 timer that will shutdown sockets stuck in fin_wait2 state.
How long those FIN_WAIT_2 sockets stay in that state will depend on the "tcp_fin_wait_2_timeout" tcpip parameter. By default, HP will keep those FIN_WAIT_2 sockets around forever. To find out what your value is currently set to, issue :
#ndd -get /dev/tcp tcp_fin_wait_2_timeout
And to change this value to, say 1 hour, issue :
#ndd -set /dev/tcp tcp_fin_wait_2_timeout 3600000
Changing the parameter using the above ndd command will take effect immediately but that change will be lost when the system is rebooted. To make this change permanently, you need to edit the/etc/rc.config.d/nddconf file. By setting the "tcp_fin_wait_2_timeout" to 1 hour, the FIN_WAIT_2 sockets will be closed after 1 hour.
The FIN_WAIT_2 timer must be used with caution because when TCP is in the FIN_WAIT_2 state the remote is still allowed to send data. In addition, if the remote TCP would terminate normally (it is not hung nor terminating abnormally) and the connection is closed because of the FIN_WAIT_2 timer, the connection may be closed prematurely.
Data may be lost if the remote sends a window update or FIN after the local TCP has closed the connection. In this situation, the local TCP will send a RESET. According to the TCP protocol specification, the remote TCP should flush its receive queue when it receives the RESET. This may cause data to be lost.