phoenix-hbase 服务频繁挂掉问题排查

近日监控系统改造,使用了phoenix+hbase,最近演示环境监控经常出问题,初步查看为hbase挂掉。
经过log排查发现,是由于centos7.0默认没有fuser命令导致hadoop ha切换失败,hadoop集群挂掉导致;
namenode挂掉是由于zookeeper超时时间设置太小导致。

以下为具体排查过程,

1.首先查看hbase-master log,log显示由于hadoop集群连不上,导致hbase关闭。

2017-08-01 11:46:46,304 INFO  [master/hadoop171/172.16.31.171:60000] regionserver.HRegionServer: stopping server hadoop171,60000,1501497159832; zookeeper connection closed.
2017-08-01 11:46:46,304 INFO  [master/hadoop171/172.16.31.171:60000] regionserver.HRegionServer: master/hadoop171/172.16.31.171:60000 exiting
2017-08-01 11:46:46,307 ERROR [Thread-7] hdfs.DFSClient: Failed to close inode 63944
java.net.ConnectException: Call From hadoop171/172.16.31.171 to hadoop171:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
        at org.apache.hadoop.ipc.Client.call(Client.java:1475)
        at org.apache.hadoop.ipc.Client.call(Client.java:1408)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:404)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
        at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
        at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1704)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1500)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:668)
Caused by: java.net.ConnectException: 拒绝连接
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)

2.然后查看hadoop-namenode log,log显示由于zookeeper超时导致namenode挂掉。
集群配置了ha,正常来说,一个namenode挂掉应该切换到另外一个namenode才对。

2017-08-03 05:31:30,999 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 ms (timeout=20000 ms) for a response for startLogSegment(562081). Succeeded so far: [172.16.31.171:8485]
2017-08-03 05:31:31,984 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: starting log segment 562081 failed for required journal (JournalAndStream(mgr=QJM to [172.16.31.171:8485, 172.16.31.172:8485, 172.16.31.173:8485], stream=null))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
        at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
        at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:403)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:107)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet$3.apply(JournalSet.java:222)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:219)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1206)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1175)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1249)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6422)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1003)
        at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
        at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
2017-08-03 05:31:31,987 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-08-03 05:31:31,996 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop171/172.16.31.171
************************************************************/

3.接下来排查zkfc log。
hadoop171 zkfc log显示171退出选举,具体log如下

2017-08-03 05:31:32,700 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at hadoop171/172.16.31.171:9000: java.io.EOFException End of File Exception between local host is: "hadoop171/172.16.31.171"; destination host is: "hadoop171":9000; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
2017-08-03 05:31:32,701 INFO org.apache.hadoop.ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING
2017-08-03 05:31:32,701 INFO org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at hadoop171/172.16.31.171:9000 entered state: SERVICE_NOT_RESPONDING
2017-08-03 05:31:32,704 WARN org.apache.hadoop.hdfs.tools.DFSZKFailoverController: Can't get local NN thread dump due to 拒绝连接
2017-08-03 05:31:32,704 INFO org.apache.hadoop.ha.ZKFailoverController: Quitting master election for NameNode at hadoop171/172.16.31.171:9000 and marking that fencing is necessary
2017-08-03 05:31:32,704 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
2017-08-03 05:31:32,756 INFO org.apache.zookeeper.ZooKeeper: Session: 0x35d9be43dc1019b closed
2017-08-03 05:31:32,756 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x35d9be43dc1019b
2017-08-03 05:31:32,756 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2017-08-03 05:31:34,758 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop171/172.16.31.171:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)

hadoop172 zkfc log 显示,ha切换要首先fence hadoop171,
**联系过程中 提示,fuser: 未找到命令,经查为centos7 没有fuser命令,遂陷入死循环。
参考 : http://f.dataguru.cn/hadoop-707120-1-1.html

2017-08-03 05:31:32,812 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2017-08-03 05:31:32,813 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0362656812036e6e311a096861646f6f7031373120a84628d33e
2017-08-03 05:31:32,816 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at hadoop171/172.16.31.171:9000
2017-08-03 05:31:33,822 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop171/172.16.31.171:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2017-08-03 05:31:33,921 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at hadoop171/172.16.31.171:9000 standby (unable to connect)
java.net.ConnectException: Call From hadoop172/172.16.31.172 to hadoop171:9000 failed on connection exception: java.net.ConnectException: 拒>绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
        at org.apache.hadoop.ipc.Client.call(Client.java:1475)
        at org.apache.hadoop.ipc.Client.call(Client.java:1408)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
        at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:511)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:502)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:60)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:888)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:909)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:808)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.net.ConnectException: 拒绝连接
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)
        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524)
        at org.apache.hadoop.ipc.Client.call(Client.java:1447)
        ... 14 more
        2017-08-03 05:31:33,927 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2017-08-03 05:31:33,927 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2017-08-03 05:31:34,092 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to hadoop171...
2017-08-03 05:31:34,096 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop171 port 22
2017-08-03 05:31:34,104 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
2017-08-03 05:31:34,122 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_6.6.1
2017-08-03 05:31:34,122 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.42
2017-08-03 05:31:34,123 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
2017-08-03 05:31:35,373 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
2017-08-03 05:31:35,374 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
2017-08-03 05:31:35,376 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
2017-08-03 05:31:35,376 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
2017-08-03 05:31:35,377 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-md5 none
2017-08-03 05:31:35,377 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-md5 none
2017-08-03 05:31:35,430 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
2017-08-03 05:31:35,431 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
2017-08-03 05:31:35,447 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
2017-08-03 05:31:35,450 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop171' (RSA) to the list of known hosts.
2017-08-03 05:31:35,451 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
2017-08-03 05:31:35,451 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
2017-08-03 05:31:35,456 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
2017-08-03 05:31:35,457 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
2017-08-03 05:31:35,459 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password
2017-08-03 05:31:35,460 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic
2017-08-03 05:31:35,468 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password
2017-08-03 05:31:35,468 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey
2017-08-03 05:31:35,628 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey).
2017-08-03 05:31:35,629 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to hadoop171
2017-08-03 05:31:35,629 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 9000
2017-08-03 05:31:35,840 WARN org.apache.hadoop.ha.SshFenceByTcpPort: PATH=$PATH:/sbin:/usr/sbin fuser -v -k -n tcp 9000 via ssh: bash: fuser: 未找到命令
2017-08-03 05:31:35,844 INFO org.apache.hadoop.ha.SshFenceByTcpPort: rc: 127
2017-08-03 05:31:35,844 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop171 port 22
2017-08-03 05:31:35,847 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2017-08-03 05:31:35,847 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2017-08-03 05:31:35,847 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed
2017-08-03 05:31:35,905 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at hadoop171/172.16.31.171:9000
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:530)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:502)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:60)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:888)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:909)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:808)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
        2017-08-03 05:31:35,906 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2017-08-03 05:31:35,967 INFO org.apache.zookeeper.ZooKeeper: Session: 0x35d9be43dc1019c closed
2017-08-03 05:31:36,968 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hadoop171:2181,hadoop172:2181,hadoop173:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@6562a9e9
2017-08-03 05:31:36,973 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hadoop173/172.16.31.173:2181. Will not attempt to authenticate using SASL (unknown error)
2017-08-03 05:31:37,731 INFO org.apache.zookeeper.ClientCnxn: Socket connection established, initiating session, client: /172.16.31.172:52192, server: hadoop173/172.16.31.173:2181
2017-08-03 05:31:37,952 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop173/172.16.31.173:2181, sessionid = 0x35d9be43dc1021b, negotiated timeout = 5000
2017-08-03 05:31:37,955 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2017-08-03 05:31:37,956 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2017-08-03 05:31:38,047 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2017-08-03 05:31:38,054 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a0362656812036e6e311a096861646f6f7031373120a84628d33e
2017-08-03 05:31:38,056 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at hadoop171/172.16.31.171:9000
2017-08-03 05:31:39,061 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop171/172.16.31.171:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2017-08-03 05:31:39,064 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at hadoop171/172.16.31.171:9000 standby (unable to connect)

4.到此已经查明原因,解决方案就是安装fuser对应的包

在namenode主、备节点上安装fuser(datanode节点不用安装)

[root@server101 ~]# yum -y install psmisc
[root@server102 ~]# yum -y install psmisc

zookeeper超时的隐患修改,增大超时 20000ms 增加到 50000ms

5.后续再仔细查看log发现,在hadoop出现问题之前,hbase 执行了一个balance操作。
http://openinx.github.io/2016/06/21/hbase-balance/

2017-08-03 05:30:04,345 TRACE [hadoop171,60000,1501568949531_ChoreService_2] access.AccessController: Access allowed for user hadoop; reason: Global check allowed; remote address: ; request: balance; context: (user=hadoop, scope=GLOBAL, action=ADMIN)
2017-08-03 05:30:04,349 DEBUG [htable-pool260-t1] ipc.RpcClientImpl: Use SIMPLE authentication for service ClientService, sasl=false
2017-08-03 05:30:04,349 DEBUG [htable-pool260-t1] ipc.RpcClientImpl: Connecting to hadoop172/172.16.31.172:60020
2017-08-03 05:30:05,994 DEBUG [hadoop171,60000,1501568949531_ChoreService_2] balancer.StochasticLoadBalancer: Finished computing new load balance plan.  Computation took 1646ms to try 73600 different iterations.  Found a solution that moves 16 regions; Going from a computed cost of 402.57591499759764 to a new cost of 87.37110284715531

完。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值