今天在使用storm时,需要启动zookeeper依赖集群。于是使用命令启动zookeeper集群,使用命令bin/zkServer.sh start
[root@master bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@master bin]# jps
2936 Jps
2924 QuorumPeerMain
为了查看zookeeper在master节点上是否启动成功,使用jps命令查看相关进程是否启动,如上图QuorumPeerMain进程已成功启动。此时,小厨突然想看看master节点上zookeeper的状态,到底是leader还是follower,确保节点成功启动。使用命令bin/zkServer.sh status,此时却意外的出现了异常。如下图
[root@master bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
于是去bin目录下去查看zookeeper.out输出日志:
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:614)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:913)
2018-11-02 09:42:48,188 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer$QuorumServer@167] - Resolved hostname: 192.168.83.133 to address: /192.168.83.133
2018-11-02 09:42:48,188 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@852] - Notification time out: 60000
报错为:拒绝连接,于是排查了一遍节点的配置,确定都正确,此时有网友列出可能是myid重复了,于是去各节点查看id的设置
发现在data目录下的myid各节点直接并无重复,排除不是myid重复的原因。于是我启动了第二台和第三台节点上的zookeeper,并查看相关启动线程和zookeeper的集群状态
发现两个从节点都可以正常的查看状态。此时问题浮出水面:导致无法查看status的原因是,当小厨在master节点上执行bin/zkServer.sh status命令时,只有master节点的zookeeper进程启动了,其他节点的进程并没有启动,master节点无法与slave节点交互。由于zookeeper集群是依靠选举机制来产生leader和follower,当无法交互时也就无法选出leader和follower,因此执行status命令时,无法获取节点的状态。总结:只启动一台节点时,无法获取zookeeper的节点状态。
当正确启动另外两个节点时,master的status如下:
[root@master bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower
至此,解决问题。