我在虚拟机上使用docker部署了三个zookeeper容器(版本3.5.7)
zookeeper集群
-
创建zookeeper1容器:
docker run --name zookeeper1 -d \ --network app-tier \ -e ALLOW_ANONYMOUS_LOGIN=yes \ -e ZOO_SERVER_ID=1 \ -e ZOO_SERVERS=0.0.0.0:2881:3881,zookeeper2:2882:3882,zookeeper3:2883:3883 \ -p 2181:2181 \ -p 2881:2888 \ -p 3881:3888 \ bitnami/zookeeper:latest
-
创建zookeeper2容器:
docker run --name zookeeper2 -d \ --network app-tier \ -e ALLOW_ANONYMOUS_LOGIN=yes \ -e ZOO_SERVER_ID=2 \ -e ZOO_SERVERS=zookeeper1:2881:3881,0.0.0.0:2882:3882,zookeeper3:2883:3883 \ -p 2182:2181 \ -p 2882:2888 \ -p 3882:3888 \ bitnami/zookeeper:latest
-
创建zookeeper3容器:
docker run --name zookeeper3 -d \ --network app-tier \ -e ALLOW_ANONYMOUS_LOGIN=yes \ -e ZOO_SERVER_ID=3 \ -e ZOO_SERVERS=zookeeper1:2881:3881,zookeeper2:2882:3882,0.0.0.0:2883:3883 \ -p 2183:2181 \ -p 2883:2888 \ -p 3883:3888 \ bitnami/zookeeper:latest
创建完之后通过命令docker exec -it zookeeper1 /bin/bash
进入容器内部查看zookeeper状态
经过多次的每次都是ZOO_SERVER_ID=1
的容器中的zookeeper无法加入到集群当中,查看状态时显示
Error contacting service. It is probably not running.
同时可以看到日志
2020-02-24 02:46:17,787 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (2, 1)
2020-02-24 02:46:17,788 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (3, 1)
2020-02-24 02:46:17,789 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@679] - Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2020-02-24 02:46:17,998 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (2, 1)
2020-02-24 02:46:18,002 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (3, 1)
2020-02-24 02:46:18,003 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):FastLeaderElection@919] - Notification time out: 400
2020-02-24 02:46:18,412 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):QuorumCnxManager@438] - Have smaller server identifier, so dropping the connection: (2, 1)
通过google发现这是zookeeper本身存在的问题,并且还没有被修复,不过还是存在解决方法的,解决方法就是先找出zookeeper集群中的leader
节点,然后重启这个leader
节点,这样报错的节点就可以重新加入到集群中。
!!!注意,重启失败的节点并不能解决问题,必须是leader
节点
这个是原链接https://issues.apache.org/jira/browse/ZOOKEEPER-2938,没有被墙的话还是可以访问得到的
docker下重启容器也很简单,我的leader
节点在zookeeper3,于是我执行了
docker restart zookeeper3
问题就解决了
2020-02-24 03:05:32,993 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@679] - Notification: 2 (message format version), 2 (n.leader), 0x100000004 (n.zxid), 0x2 (n.round), LOOKING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2020-02-24 03:05:33,196 [myid:1] - INFO [QuorumPeer[myid=1](plain=0.0.0.0:2181)(secure=disabled):QuorumPeer@1251] - FOLLOWING