Hadoop 自动故障转移

场景

三节点二namenode情况下,一个namenode(nn1)挂掉,手工将另一个name提升为active节点报错:hdfs haadmin -transitionToActive nn2

21/02/27 00:24:16 INFO ipc.Client: Retrying connect to server: hadoop01/192.168.26.10:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Unexpected error occurred  Call From hadoop02/192.168.26.20 to hadoop01:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
Usage: haadmin [-transitionToActive [--forceactive] <serviceId>]

原因分析:

这是因为在做提升active节点时,需要连接各个namenode确认没有namenode是active状态(防止脑裂),才能提升为active,此时可以强制提升为active节点。

解决方案1:

强制提升:hdfs haadmin -transitionToActive --forceactive nn2

解决方案2:

配置成自动故障转移,在namenode的节点上启zkfc,zkfc会在zookeeper上的指定目录抢占式的建立一个临时节点,并保持锁,建立节点的zkfc所在的namenode为active,zkfc保持监听namenode的状态,若namenode挂掉或者假死,则zkfc会释放锁删除临时节点,让其它zkfc抢占式注册并建立节点(此时会通过杀死原namenode,防止假死的namenode恢复状态,出现两个active的namenode,导致脑裂),其所在节点成为active状态。

hdfs-site.xml中配置

        <!-- 自动故障转移 -->
        <property>
                <name>dfs.ha.automatic-failover.enabled</name>
                <value>true</value>
        </property>

        <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
        <property>
                <name>dfs.ha.fencing.methods</name>
                <value>sshfence</value>
        </property>

        <!-- 使用隔离机制时需要ssh无秘钥登录-->
        <property>
                <name>dfs.ha.fencing.ssh.private-key-files</name>
                <value>/home/hadoop/.ssh/id_rsa</value>
        </property>

core-site.xml中配置

    <property>
	    <name>ha.zookeeper.quorum</name>
	    <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>

重启集群:

stop-dfs.sh  --停

hdfs zkfc -formatZK --初始化 此时 zookeeper根节点下会产生 hadoop-ha目录

start-dfs.sh  --启动集群  此时可以从zookeeper的节点中看到 此时active 的节点为Hadoop01

 

遇到的问题:

2021-02-27 11:44:32,775 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to hadoop01
2021-02-27 11:44:32,775 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 9000
2021-02-27 11:44:32,968 WARN org.apache.hadoop.ha.SshFenceByTcpPort: PATH=$PATH:/sbin:/usr/sbin fuser -v -k -n tcp 9000 via ssh: bash: fuser: command not found
2021-02-27 11:44:32,968 INFO org.apache.hadoop.ha.SshFenceByTcpPort: rc: 127
2021-02-27 11:44:32,968 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop01 port 22
2021-02-27 11:44:32,968 WARN org.apache.hadoop.ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2021-02-27 11:44:32,968 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.
2021-02-27 11:44:32,968 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed
2021-02-27 11:44:32,968 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at hadoop01/192.168.26.10:9000
	at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:533)
	at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:505)
	at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)
	at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:892)
	at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:921)
	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:820)
	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

若没有fuser命令,需要执行以下命令安装:

yum install psmisc

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值