hbase全分布式遇到的问题3--集群中有没有配置好ip映射的节点

现象:启动start-abase.sh后过一小段时间,所有的hmaster和regionserver进程全部自动死掉。

这个问题因为hmaster和hregionserver进程都死掉,一直以为是什么别的原因,也没有耐心去查看日志,花了很多时间瞎弄,后来无意间才发现我这有两个节点根本无法解析另一个节点的主机名(hadoop.lsd4.com),才导致这样的问题,贴一下日志:

2017-03-13 09:18:41,194 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/hbase

2017-03-13 09:18:41,195 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection

java.net.UnknownHostException: hadoop.lsd4.com

at java.net.InetAddress.getAllByName0(InetAddress.java:1259)

at java.net.InetAddress.getAllByName(InetAddress.java:1171)

at java.net.InetAddress.getAllByName(InetAddress.java:1105)

at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)

at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)

at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)

at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)

at java.lang.Thread.run(Thread.java:745)

2017-03-13 09:18:42,206 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/hbase

2017-03-13 09:18:42,207 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection

java.net.UnknownHostException: hadoop.lsd4.com

at java.net.InetAddress.getAllByName0(InetAddress.java:1259)

at java.net.InetAddress.getAllByName(InetAddress.java:1171)

at java.net.InetAddress.getAllByName(InetAddress.java:1105)

at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)

at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)

at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)

at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)

at java.lang.Thread.run(Thread.java:745)

2017-03-13 09:18:44,207 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/hbase

2017-03-13 09:18:44,208 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection

java.net.UnknownHostException: hadoop.lsd4.com

at java.net.InetAddress.getAllByName0(InetAddress.java:1259)

at java.net.InetAddress.getAllByName(InetAddress.java:1171)

at java.net.InetAddress.getAllByName(InetAddress.java:1105)

at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)

at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)

at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)

at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)

at java.lang.Thread.run(Thread.java:745)

2017-03-13 09:18:48,208 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, quorum=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181, baseZNode=/hbase

2017-03-13 09:18:48,209 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection

java.net.UnknownHostException: hadoop.lsd4.com: unknown error

at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)

at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907)

at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302)

at java.net.InetAddress.getAllByName0(InetAddress.java:1255)

at java.net.InetAddress.getAllByName(InetAddress.java:1171)

at java.net.InetAddress.getAllByName(InetAddress.java:1105)

at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)

at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)

at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:380)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)

at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)

at java.lang.Thread.run(Thread.java:745)

2017-03-13 09:18:56,210 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop.lsd1.com:2181,hadoop.lsd2.com:2181,hadoop.lsd3.com:2181,hadoop.lsd4.com:2181 sessionTimeout=90000 watcher=regionserver:160200x0, 

2017-03-13 09:18:56,211 ERROR [regionserver/hadoop.lsd2.com/192.168.56.12:16020] zookeeper.RecoverableZooKeeper: ZooKeeper delete failed after 4 attempts

2017-03-13 09:18:56,212 WARN  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] regionserver.HRegionServer: Failed deleting my ephemeral node

org.apache.zookeeper.KeeperException$OperationTimeoutException: KeeperErrorCode = OperationTimeout

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:144)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)

at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1416)

at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1090)

at java.lang.Thread.run(Thread.java:745)

2017-03-13 09:18:56,213 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] regionserver.HRegionServer: stopping server hadoop.lsd2.com,16020,1489411066501; zookeeper connection closed.

2017-03-13 09:18:56,213 INFO  [regionserver/hadoop.lsd2.com/192.168.56.12:16020] regionserver.HRegionServer: regionserver/hadoop.lsd2.com/192.168.56.12:16020 exiting

2017-03-13 09:18:56,225 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting

java.lang.RuntimeException: HRegionServer Aborted

at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68)

at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)

at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2677)

[root@hadoop logs]# 

其实回头来看,如果仔细查看日志的话,也不难找出问题,我这里是hadoop.lsd2.com/hadoop.lsd3.com两个节点的/etc/hosts文件中没有配置好hadoop.lsd4.com的映射(应该是以前做别的试验删掉了没及时还原),导致在通信的时候无法解析域名。

解决方法:重新把主机名映射写的最全的节点的/etc/hosts文件拷贝到各节点,保证每个节点的主机名都能解析,再重启集群。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值