【Hadoop HA】22/01/10 22:50:17 ERROR namenode.NameNode: Failed to start namenode.

在搭建Hadoop HA时遇到下述错误
错误主要是进程显示一切正常,但是当一个namenode节点挂掉之后,另一个namenode节点依旧是stand状态

22/01/10 22:50:17 ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 2 successful responses:
192.168.3.73:8485: false
192.168.3.61:8485: false
1 exceptions thrown:
192.168.3.66:8485: Call From master/192.168.3.73 to slave2:8485 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:286)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:233)
	at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:901)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:202)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1011)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1457)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1582)
2022-01-10 23:42:34,595 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2022-01-10 23:42:34,595 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2022-01-10 23:42:34,619 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to master...
2022-01-10 23:42:34,621 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to master port 22
2022-01-10 23:42:34,626 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
2022-01-10 23:42:34,647 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_7.4
2022-01-10 23:42:34,647 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.54
2022-01-10 23:42:34,647 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
2022-01-10 23:42:34,830 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckKexes: diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521
2022-01-10 23:42:34,882 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckSignatures: ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: ssh-rsa,rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: none,zlib@openssh.com
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server: none,zlib@openssh.com
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server:
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server:
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client:
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client:
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-sha1 none
2022-01-10 23:42:34,884 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-sha1 none
2022-01-10 23:42:34,889 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEX_ECDH_INIT sent
2022-01-10 23:42:34,889 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEX_ECDH_REPLY
2022-01-10 23:42:34,897 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
2022-01-10 23:42:34,899 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'master' (RSA) to the list of known hosts.
2022-01-10 23:42:34,902 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
2022-01-10 23:42:34,921 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic
2022-01-10 23:42:40,928 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password
2022-01-10 23:42:40,928 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey
2022-01-10 23:42:40,999 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey).
2022-01-10 23:42:41,000 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to master
2022-01-10 23:42:41,000 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 8020
2022-01-10 23:42:41,223 WARN org.apache.hadoop.ha.SshFenceByTcpPort: PATH=$PATH:/sbin:/usr/sbin fuser -v -k -n tcp 8020 via ssh: bash: fuser: 未找到命令
2022-01-10 23:42:41,223 INFO org.apache.hadoop.ha.SshFenceByTcpPort: rc: 127
2022-01-10 23:42:41,223 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from master port 22
2022-01-10 23:42:41,227 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2022-01-10 23:42:41,227 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2022-01-10 23:42:41,228 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed
2022-01-10 23:42:41,229 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at master/192.168.3.73:8020
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2022-01-10 23:42:41,229 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2022-01-10 23:42:41,236 INFO org.apache.zookeeper.ZooKeeper: Session: 0x17e449f6e350002 closed
2022-01-10 23:42:42,240 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave1/192.168.3.66:2181. Will not attempt to authenticate using SASL (unknown error)
2022-01-10 23:42:42,240 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to slave1/192.168.3.66:2181, initiating session
2022-01-10 23:42:42,245 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server slave1/192.168.3.66:2181, sessionid = 0x27e449f6da60002, negotiated timeout = 5000
2022-01-10 23:42:42,247 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2022-01-10 23:42:42,249 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
2022-01-10 23:42:42,252 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
2022-01-10 23:42:42,256 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a096d79636c757374657212036e6e311a066d617374657220d43e28d33e
2022-01-10 23:42:42,258 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at master/192.168.3.73:8020
2022-01-10 23:42:48,935 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at master/192.168.3.73:8020 standby (unable to connect)
java.net.ConnectException: Call From slave1/192.168.3.66 to master:8020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
        at org.apache.hadoop.ipc.Client.call(Client.java:1480)
        at org.apache.hadoop.ipc.Client.call(Client.java:1413)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
        at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:514)
        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.net.ConnectException: 拒绝连接
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:713)
        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
        at org.apache.hadoop.ipc.Client.call(Client.java:1452)
        ... 14 more

提供一个思路,可能是网络不好,通信有延迟,我们重新设置集群之间通信的超时时间即可
在配置文件core-site.xml中添加

 <!-- hadoop 链接 zookeeper 的超时时长设置 -->
 <property>
	<name>ha.zookeeper.session-timeout.ms</name>
	<value>30000</value>
 <description>ms</description>
 </property>
 <property>
	<name>fs.trash.interval</name>
	<value>1440</value>
 </property>

在配置文件hdfs-site.xml中添加

 <!-- journalnode 集群之间通信的超时时间 -->
 <property>
	<name>dfs.qjournal.start-segment.timeout.ms</name>
	<value>60000</value>
 </property>
  <!-- 配置 sshfence 隔离机制超时时间 -->
 <property>
	<name>dfs.ha.fencing.ssh.connect-timeout</name>
	<value>30000</value>
 </property>
  <property>
	<name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
	<value>60000</value>
 </property>
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值