一. 问题现场:
一天夜里,一台服务器的 zk client 与 server 断开,一般情况下程序写好了自动重连会自动重连成功,但是这一次,怎么重连都连不上了。
client 端日志如下:就是不停地重连,但是连不上,看日志像是服务端挂了,但是其他 client 确好好地,说明服务端没挂
2017-10-07 23:59:44.466 [DEBUG] [org.apache.zookeeper.ClientCnxn:717] Got ping response for sessionid: 0x35e60a652b00001 after 0ms
2017-10-07 23:59:47.802 [DEBUG] [org.apache.zookeeper.ClientCnxn:717] Got ping response for sessionid: 0x35e60a652b00001 after 0ms
2017-10-07 23:59:55.104 [INFO] [org.apache.zookeeper.ClientCnxn:1096] Client session timed out, have not heard from server in 7302ms for sessionid 0x35e60a652b00001, closing socket connection and attempting reconnect
2017-10-07 23:59:55.219 [INFO] [org.apache.curator.framework.state.ConnectionStateManager:228] State change: SUSPENDED
2017-10-07 23:59:55.225 [INFO] [org.apache.zookeeper.ClientCnxn:975] Opening socket connection to server 10.200.151.145/10.200.151.145:2181. Will not attempt to authenticate using SASL (unknown error)
2017-10-07 23:59:55.226 [INFO] [org.apache.zookeeper.ClientCnxn:852] Socket connection established to 10.200.151.145/10.200.151.145:2181, initiating session
2017-10-07 23:59:55.227 [DEBUG] [org.apache.zookeeper.ClientCnxn:892] Session establishment request sent on 10.200.151.145/10.200.151.145:2181
2017-10-07 23:59:55.229 [INFO] [org.apache.zookeeper.ClientCnxn:1235] Session establishment complete on server 10.200.151.145/10.200.151.145:2181, sessionid = 0x35e60a652b00001, negotiated timeout = 10000
2017-10-07 23:59:55.237 [INFO] [org.apache.curator.framework.state.ConnectionStateManager:228] State change: RECONNECTED
2017-10-07 23:59:55.248 [DEBUG] [org.apache.zookeeper.ClientCnxn:733] Got auth sessionid:0x35e60a652b00001
2017-10-07 23:59:55.249 [INFO] [org.apache.zookeeper.ClientCnxn:1098] Unable to read additional data from server sessionid 0x35e60a652b00001, likely server has closed socket, closing socket connection and attempting reconnect
2017-10-07 23:59:55.832 [DEBUG] [org.apache.curator.RetryLoop:187] Retry-able exception received
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /root/project
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) ~[zookeeper-3.4.6.jar:3.4.6-1569965]
at org.apache.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:226) ~[curator-framework-2.10.0.jar:na]
at org.apache.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:215) ~[curator-framework-2.10.0.jar:na]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108) ~[curator-client-2.10.0.jar:na]
at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForegroundStandard(ExistsBuilderImpl.java:212) [curator-framework-2.10.0.jar:na]
at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:205) [curator-framework-2.10.0.jar:na]
at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:168) [curator-framework-2.10.0.jar:na]
at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:39) [curator-framew