出现问题如何定位并解决

前言

遇到错误、遇到问题不可怕,关键是能否快速去定位找到最终出问题的地方,并顺利解决它。

举例说明。

启动三台服务器:hadoop001、hadoop002、hadoop003

[root@hadoop001 ~]# jps
1479 Jps

服务器刚启动的时候,只有没什么进程。

现在想启动zookeeper集群。

[hadoop@hadoop001 bin]$ ./zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... already running as process 1503.
[hadoop@hadoop001 bin]$ jps
1514 Jps
[hadoop@hadoop001 bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

从上面启动过程看到,zookeeper并未启动,那么怎么定位问题?怎么解决?

①.看日志

(可以用tail -200f看日志最后200行或者更多,复制到本地记事本上,仔细分析一下,生产上或者干脆把日志下载到本地区查看分析)
进入目录/home/hadoop/app/zookeeper/bin(每个人的问题所在目录不一定一样)

[hadoop@hadoop001 bin]$ tail -200f zookeeper.out

看最后200行,可以看到:

2019-04-16 19:59:52,408 [myid:1] - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@849] - Notification time out: 60000
2019-04-16 20:00:52,409 [myid:1] - WARN  [QuorumPeer[myid=1]/0.0.0.0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address hadoop002/172.19.12.133:3888
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
        at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2019-04-16 20:00:52,410 [myid:1] - WARN  [QuorumPeer[myid=1]/0.0.0.0:2181:QuorumCnxManager@382] - Cannot open channel to 3 at election address hadoop003/172.19.12.135:3888
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
        at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2019-04-16 20:00:52,410 [myid:1] - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@849] - Notification time out: 60000

从上面日志分析可以明显看出很多这样的:

 Cannot open channel to 2 at election address hadoop002/172.19.12.133:3888
java.net.ConnectException: Connection refused

可以看出是hadoop001、hadoop002、hadoop003三台机器之间的连接出问题了。
去ping一下:

[hadoop@hadoop001 bin]$ ping hadoop002
PING hadoop002 (172.19.12.133) 56(84) bytes of data.
64 bytes from hadoop002 (172.19.12.133): icmp_seq=1 ttl=64 time=0.220 ms
[hadoop@hadoop001 bin]$ ping hadoop003
PING hadoop003 (172.19.12.135) 56(84) bytes of data.
64 bytes from hadoop003 (172.19.12.135): icmp_seq=1 ttl=64 time=0.282 ms

发现都能ping通,再仔细看上面 Cannot open channel to 2 at election address hadoop002/172.19.12.133:3888,可以看出是3888端口通信有问题。
(网上查了很多原因,防火墙、myid、restart、zoo.cfg中的server什么原因的)
实际上原因很简单,就是因为另外两台机器没有启动zk,去另外两台机器上执行./zkServer.sh start就可以了。
再看一下:

[hadoop@hadoop001 bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Mode: follower
②shell脚本 -x debug模式排查问题原因

编辑启动命令的shell脚本,在第一行后面加入 -x

[hadoop@hadoop001 bin]$ vi zkServer.sh 
#!/usr/bin/bash -x
........

然后启动脚本进行排查分析:

[hadoop@hadoop001 bin]$ ./zkServer.sh start
+ '[' x = x ']'
+ JMXLOCALONLY=false
+ '[' x = x ']'
+ echo 'JMX enabled by default'
JMX enabled by default
.......

每个+号表示执行的命令,可以一步一步的去看。(这里看不出问题,不过有时候会有用的。)
比如说可以看出
+ _ZOO_DAEMON_OUT=./zookeeper.out
日志文件的路径,是执行命令所在的目录,如果想修改日志文件的目录,可以在这里修改。

后面待补充~

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值