文章预览:
1.阿里云添加实例(3台)
点击控制台
进入云服务器ECS
点击创建实例
选择按量付费,区域以及资源大小,完成之后点击下一步
默认点下一步
点击下一步,其它的参照下图配置
默认点下一步
检查箭头所指几项,创建实例
2.前置环境配置
在web界面中的运维管理里边开放相关(也可以是所有)端口
1.关闭防火墙
3台全做
[root@ruozedata001 ~]# systemctl stop firewalld
[root@ruozedata001 ~]# systemctl disable firewalld
[root@ruozedata001 ~]# setenforce 0
[root@ruozedata001 ~]# iptables -F
修改selinux配置文件为disabled,永久禁用
2.添加hadoop用户
3台全做
[root@ruozedata001 ~]# useradd hadoop
[root@ruozedata001 ~]# echo "123456" | passwd --stdin hadoop
3.配置host映射
添加hosts映射
[root@ruozedata001 ~]# echo "192.168.153.101 ruozedata001" >> /etc/hosts
[root@ruozedata001 ~]# echo "192.168.153.102 ruozedata002" >> /etc/hosts
[root@ruozedata001 ~]# echo "192.168.153.103 ruozedata003" >> /etc/hosts
4.免密登录(3台)
ssh-keygen一路回车,3台都做一遍
ssh-copy-id 主机名将公钥分发给ruozedata001、002、003机器,3台都做一遍
以上配置完成之后,在每台机器上的hadoop用户下 分别执行ssh ruozedata001 date、ssh ruozedata002 date、ssh ruozedata003 date,如果如下图所示能直接打印出时间,则代表免密登录配置成功
3.环境部署
集群角色规划
ruozedata001
ruozedata002
ruozedata003
ZooKeeper
NameNode
DataNode
JournalNode
ResourceManager
NodeManager
DFSZKFailoverControl
JobHistroyServer
在hadoop用户下创建目录(3台全做)
[hadoop@ruozedata001 ~]#mkdir software app tmp data log lib shell sourcecode
上传jdk zookeeper hadoop安装包至software下
装JDK不再概述,可参考我的另一篇博客hadoop部署中关于JDK的部署以及所属用户的一个坑
1.zookeeper部署
1.解压 创建软连接
解压zookeeper至app目录,并创建软连接,如下图所示
2.写入个人环境变量
在~/.bashrc文件中写入个人环境变量,如下图
别忘了source ~/.bashrc文件
3.修改zoo.cfg
复制参数模板,修改zoo.cfg中如下两处参数
4.创建datadir目录,新建myid
5.分发文件
分发zookeeper文件至002和003机器,过程如下图所示
如3.1.1和3.1.2加入环境变量和创建软连接一样,在002和003中同样配置
6.给myid赋值
分别执行一下三条命令至001、002、003机器
echo 1 > /home/hadoop/tmp/zookeeper/myid
echo 2 > /home/hadoop/tmp/zookeeper/myid
echo 3 > /home/hadoop/tmp/zookeeper/myid
7.启动ZK集群
[hadoop@ruozedata003 bin]$ zkServer.sh start 三台机器上分别执行
8.验证
在三台机器上分别执行zkServer.sh status查看Mode的状态,Leader或者Follow即可
[hadoop@ruozedata003 bin]$ zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/app/zookeeper/bin/../conf/zoo.cfg
Mode: leader
2. hadoop部署
1.解压 创建软连接
如zookeeper一样,进入software目录,解压hadoop包至app目录
2.写入个人环境变量
在~/.bashrc文件中写入个人环境变量
同样 别忘了source ~/.bashrc
3.配置文件
修改/home/hadoop/app/hadoop/etc/hadoop目录下的以下配置文件
1.配置core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--Yarn 需要使用 fs.defaultFS 指定NameNode URI -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ruozeclusterg10</value>
</property>
<!--==============================Trash机制======================================= -->
<property>
<!--多长时间创建CheckPoint NameNode截点上运行的CheckPointer 从Current文件夹创建CheckPoint;默认:0 由fs.trash.interval项指定 -->
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
</property>
<property>
<!--多少分钟.Trash下的CheckPoint目录会被删除,该配置服务器设置优先级大于客户端,默认:0 不删除 -->
<name>fs.trash.interval</name>
<value>10080</value>
</property>
<!--指定hadoop临时目录, hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这>个路径中 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/hadoop</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>ruozedata001:2181,ruozedata002:2181,ruozedata003:2181</value>
</property>
<!--指定ZooKeeper超时间隔,单位毫秒 -->
<property>
<name>ha.zookeeper.session-timeout.ms</name>
<value>2000</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
</configuration>
2.配置hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--HDFS超级用户 -->
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<!--开启web hdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/dfs/name</value>
<description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>${dfs.namenode.name.dir}</value>
<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/dfs/data</value>
<description>datanode存放block本地目录(需要修改)</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 块大小128M (默认128M) -->
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--指定hdfs的nameservice为ruozeclusterg7,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ruozeclusterg10</value>
</property>
<property>
<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
<name>dfs.ha.namenodes.ruozeclusterg10</name>
<value>nn1,nn2</value>
</property>
<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg10.nn1</name>
<value>ruozedata001:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg10.nn2</name>
<value>ruozedata002:8020</value>
</property>
<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
<name>dfs.namenode.http-address.ruozeclusterg10.nn1</name>
<value>ruozedata001:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ruozeclusterg10.nn2</name>
<value>ruozedata002:50070</value>
</property>
<!--==================Namenode editlog同步 ============================================ -->
<!--保证数据恢复 -->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://ruozedata001:8485;ruozedata002:8485;ruozedata003:8485/ruozeclusterg10</value>
</property>
<property>
<!--JournalNode存放数据地址 -->
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/dfs/jn</value>
</property>
<!--==================DataNode editlog同步 ============================================ -->
<property>
<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
<!-- 配置失败自动切换实现方式 -->
<name>dfs.client.failover.proxy.provider.ruozeclusterg10</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--==================Namenode fencing:=============================================== -->
<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<!--多少milliseconds 认为fencing失败 -->
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
<!--开启基于Zookeeper -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--动态许可datanode连接namenode列表 -->
<property>
<name>dfs.hosts</name>
<value>/home/hadoop/app/hadoop/etc/hadoop/slaves</value>
</property>
</configuration>
3.配置mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--HDFS超级用户 -->
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<!--开启web hdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/dfs/name</value>
<description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>${dfs.namenode.name.dir}</value>
<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/dfs/data</value>
<description>datanode存放block本地目录(需要修改)</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 块大小128M (默认128M) -->
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--指定hdfs的nameservice为ruozeclusterg7,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ruozeclusterg10</value>
</property>
<property>
<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
<name>dfs.ha.namenodes.ruozeclusterg10</name>
<value>nn1,nn2</value>
</property>
<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg10.nn1</name>
<value>ruozedata001:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg10.nn2</name>
<value>ruozedata002:8020</value>
</property>
<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
<name>dfs.namenode.http-address.ruozeclusterg10.nn1</name>
<value>ruozedata001:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ruozeclusterg10.nn2</name>
<value>ruozedata002:50070</value>
</property>
<!--==================Namenode editlog同步 ============================================ -->
<!--保证数据恢复 -->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://ruozedata001:8485;ruozedata002:8485;ruozedata003:8485/ruozeclusterg10</value>
</property>
<property>
<!--JournalNode存放数据地址 -->
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/dfs/jn</value>
</property>
<!--==================DataNode editlog同步 ============================================ -->
<property>
<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
<!-- 配置失败自动切换实现方式 -->
<name>dfs.client.failover.proxy.provider.ruozeclusterg10</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--==================Namenode fencing:=============================================== -->
<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<!--多少milliseconds 认为fencing失败 -->
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
<!--开启基于Zookeeper -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--动态许可datanode连接namenode列表 -->
<property>
<name>dfs.hosts</name>
<value>/home/hadoop/app/hadoop/etc/hadoop/slaves</value>
</property>
</configuration>
4.配置yarn-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--HDFS超级用户 -->
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<!--开启web hdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/dfs/name</value>
<description> namenode 存放name table(fsimage)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.namenode.edits.dir</name>
<value>${dfs.namenode.name.dir}</value>
<description>namenode粗放 transaction file(edits)本地目录(需要修改)</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/dfs/data</value>
<description>datanode存放block本地目录(需要修改)</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- 块大小128M (默认128M) -->
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--指定hdfs的nameservice为ruozeclusterg7,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ruozeclusterg10</value>
</property>
<property>
<!--设置NameNode IDs 此版本最大只支持两个NameNode -->
<name>dfs.ha.namenodes.ruozeclusterg10</name>
<value>nn1,nn2</value>
</property>
<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg10.nn1</name>
<value>ruozedata001:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ruozeclusterg10.nn2</name>
<value>ruozedata002:8020</value>
</property>
<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
<name>dfs.namenode.http-address.ruozeclusterg10.nn1</name>
<value>ruozedata001:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ruozeclusterg10.nn2</name>
<value>ruozedata002:50070</value>
</property>
<!--==================Namenode editlog同步 ============================================ -->
<!--保证数据恢复 -->
<property>
<name>dfs.journalnode.http-address</name>
<value>0.0.0.0:8480</value>
</property>
<property>
<name>dfs.journalnode.rpc-address</name>
<value>0.0.0.0:8485</value>
</property>
<property>
<!--设置JournalNode服务器地址,QuorumJournalManager 用于存储editlog -->
<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://ruozedata001:8485;ruozedata002:8485;ruozedata003:8485/ruozeclusterg10</value>
</property>
<property>
<!--JournalNode存放数据地址 -->
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/dfs/jn</value>
</property>
<!--==================DataNode editlog同步 ============================================ -->
<property>
<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
<!-- 配置失败自动切换实现方式 -->
<name>dfs.client.failover.proxy.provider.ruozeclusterg10</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--==================Namenode fencing:=============================================== -->
<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<!--多少milliseconds 认为fencing失败 -->
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
<!--开启基于Zookeeper -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--动态许可datanode连接namenode列表 -->
<property>
<name>dfs.hosts</name>
<value>/home/hadoop/app/hadoop/etc/hadoop/slaves</value>
</property>
</configuration>
5.配置slaves
ruozedata001
ruozedata002
ruozedata003
6.配置hadoop-env.sh
修改hadoop-env.sh文件中指定JAVA_HOME的目录
4.分发文件
分发文件至002和003机器,过程如下图所示
如3.2.1和3.2.2一样 加入环境变量和创建软连接一样,在002和003中同样配置
5.启动Journalnode(3台)
在三台机器上全部执行hadoop-daemon.sh start journalnode
6.格式化NameNode(001)
在001上执行hadoop namenode -format 进行格式化
7.同步NameNode元数据
同步001上NameNode的元数据至002机器
元数据存储位置在dfs.namenode.name.dir和dfs.namenode.edits.dir下,另外还需要共享存储(dfs.namenode.shared.edits.dir)中包含NameNode所有的元数据
scp -r /home/hadoop/data/dfs ruozedata002:/home/hadoop/data/
8.初始化ZKFC(001)
hdfs zkfc -formatZK
9.启动HDFS集群
在001上执行start-dfs.sh即可
10.启动YARN集群
启动yarn以及historyserver
jps发现,YARN并不能像HDFS那样,在启动集群的时候Standby也能随着启动 在这里需要手动启动一下 如下图所示 在002机器上启动
另外:YARN的web界面地址是(001为Active、002为Standby):http://ruozedata001:8088 http://ruozedata002:8088/cluster/cluster
注意
有个小坑,在VM虚拟机上最小化装的Centos7.2系统,在主NameNode节点挂掉之后,从节点不能完成状态转变,以下是当时测试拿到的ZKFC的日志,如下图:
百度之后说是可能openssh版本不兼容的问题,查了一下本机的OpenSSH版本是6.6.1
然后升级到7.4还是不行,最后使用yum install psmisc -y 这个组件解决
以下是Standby NameNode上位日志
2021-01-18 16:17:53,499 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at ruozedata001/172.31.134.48:8020 standby (unable to connect)
java.net.ConnectException: Call From ruozedata002/172.31.134.47 to ruozedata001:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor10.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1508)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:231)
at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)
at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)
at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:520)
at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:511)
at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:60)
at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:897)
at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:909)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:808)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:648)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:744)
at org.apache.hadoop.ipc.Client$Connection.access$3000(Client.java:396)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1557)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
... 14 more
2021-01-18 16:17:53,499 INFO org.apache.hadoop.ha.NodeFencer: ====== Beginning Service Fencing Process... ======
2021-01-18 16:17:53,499 INFO org.apache.hadoop.ha.NodeFencer: Trying method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2021-01-18 16:17:53,500 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connecting to ruozedata001...
2021-01-18 16:17:53,500 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to ruozedata001 port 22
2021-01-18 16:17:53,501 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
2021-01-18 16:17:53,507 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_6.6.1
2021-01-18 16:17:53,507 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.42
2021-01-18 16:17:53,507 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
2021-01-18 16:17:53,510 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
2021-01-18 16:17:53,510 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
2021-01-18 16:17:53,510 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
2021-01-18 16:17:53,510 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
2021-01-18 16:17:53,510 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
2021-01-18 16:17:53,510 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
2021-01-18 16:17:53,510 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
2021-01-18 16:17:53,510 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-md5 none
2021-01-18 16:17:53,510 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-md5 none
2021-01-18 16:17:53,513 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
2021-01-18 16:17:53,513 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
2021-01-18 16:17:53,518 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
2021-01-18 16:17:53,518 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'ruozedata001' (RSA) to the list of known hosts.
2021-01-18 16:17:53,519 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
2021-01-18 16:17:53,519 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
2021-01-18 16:17:53,520 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
2021-01-18 16:17:53,520 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
2021-01-18 16:17:53,520 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password
2021-01-18 16:17:53,520 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic
2021-01-18 16:17:53,522 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password
2021-01-18 16:17:53,522 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method: publickey
2021-01-18 16:17:53,611 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentication succeeded (publickey).
2021-01-18 16:17:53,611 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Connected to ruozedata001
2021-01-18 16:17:53,611 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Looking for process running on port 8020
2021-01-18 16:17:53,678 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Indeterminate response from trying to kill service. Verifying whether it is running using nc...
2021-01-18 16:17:53,737 WARN org.apache.hadoop.ha.SshFenceByTcpPort: nc -z ruozedata001 8020 via ssh: bash: nc: command not found
2021-01-18 16:17:53,738 INFO org.apache.hadoop.ha.SshFenceByTcpPort: Verified that the service is down.
2021-01-18 16:17:53,739 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from ruozedata001 port 22
2021-01-18 16:17:53,739 INFO org.apache.hadoop.ha.NodeFencer: ====== Fencing successful by method org.apache.hadoop.ha.SshFenceByTcpPort(null) ======
2021-01-18 16:17:53,739 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/ruozeclusterg10/ActiveBreadCrumb to indicate that the local node is the most recent active...
2021-01-18 16:17:53,739 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed
2021-01-18 16:17:53,741 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at ruozedata002/172.31.134.47:8020 active...
2021-01-18 16:17:54,305 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at ruozedata002/172.31.134.47:8020 to active state