1. 主机与服务规划
1.1 主机规划
主机 | IP | HostName | CPU | MEMERY | USER | PWD |
---|---|---|---|---|---|---|
hadoop181 | 192.168.207.181 | hadoop181 | 4 CORE | 8G | hadoop | hadoop |
hadoop182 | 192.168.207.182 | hadoop182 | 4 CORE | 8G | hadoop | hadoop |
hadoop183 | 192.168.207.183 | hadoop183 | 4 CORE | 8G | hadoop | hadoop |
1.2 服务规划
服务 | hadoop181 | hadoop182 | hadoop183 |
---|---|---|---|
DataNode | √ | √ | √ |
Journal Node | √ | √ | √ |
Zookeeper | √ | √ | √ |
ZKFS | √ | √ | √ |
ResourceManager | √ | √ | √ |
NodeManager | √ | √ | √ |
Name Node | √ | √ | √ |
HistoryServer | √ |
2. 具体安装
安装yarn 高可用之前需要先部署HDFS的环境, 可以参考 hdfs ha 搭建实录
2.1 修改 yarn-site.xml
配置文件
(1)vim 编辑 yarn-site.xml
文件
[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
(2)配置启用 resource manager ha
<!--启用resourcemanager ha-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--声明HA resourcemanager的地址-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarncluster</value>
</property>
<!-- 指定RM的逻辑列表 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2,rm3</value>
</property>
(3)配置rm1
<!-- 指定rm1 的主机名 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop181</value>
</property>
<!-- 指定rm1的web端地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hadoop181:8088</value>
</property>
<!-- 指定rm1的内部通信地址 -->
<property>
<name>yarn.resourcemanager.address.rm1</name>
<value>hadoop181:8032</value>
</property>
<!-- 指定AM向rm1申请资源的地址 -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm1</name>
<value>hadoop181:8030</value>
</property>
<!-- 指定供NM连接的地址 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
<value>hadoop181:8031</value>
</property>
(4)配置rm2
<!-- 指定rm2 的主机名 -->
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop182</value>
</property>
<!-- 指定rm2的web端地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hadoop182:8088</value>
</property>
<!-- 指定rm2的内部通信地址 -->
<property>
<name>yarn.resourcemanager.address.rm2</name>
<value>hadoop182:8032</value>
</property>
<!-- 指定AM向rm2申请资源的地址 -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm2</name>
<value>hadoop182:8030</value>
</property>
<!-- 指定供NM连接的地址 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
<value>hadoop182:8031</value>
</property>
(5)配置rm3
<!-- 指定rm3 的主机名 -->
<property>
<name>yarn.resourcemanager.hostname.rm3</name>
<value>hadoop183</value>
</property>
<!-- 指定rm3的web端地址 -->
<property>
<name>yarn.resourcemanager.webapp.address.rm3</name>
<value>hadoop183:8088</value>
</property>
<!-- 指定rm3的内部通信地址 -->
<property>
<name>yarn.resourcemanager.address.rm3</name>
<value>hadoop183:8032</value>
</property>
<!-- 指定AM向rm3申请资源的地址 -->
<property>
<name>yarn.resourcemanager.scheduler.address.rm3</name>
<value>hadoop183:8030</value>
</property>
<!-- 指定供NM连接的地址 -->
<property>
<name>yarn.resourcemanager.resource-tracker.address.rm3</name>
<value>hadoop183:8031</value>
</property>
(6)指定zookeeper集群的地址
<!--指定zookeeper集群的地址-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop181:2181,hadoop182:2181,hadoop183:2181</value>
</property>
(7)配置 ZKRM
<!--启用自动恢复 true才有下面 ZKRMStateStore类存-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--指定resourcemanager的状态信息存储在zookeeper集群-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
(8)配置环境变量的继承
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
2.2 注意事项
(1)上面所有配置都是在yarn-site.xml 配置文件
(2)文件中之前配置的ResourceManager地址一定要删除
<!-- 指定YARN的ResourceManager的地址 , 这一段都要删掉-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop182</value>
</property>
2.3 文件分发
[hadoop@hadoop181 ~]$ xsync $HADOOP_HOME/etc/hadoop/yarn-site.xml
3. 服务的启动
3.1 需要 启动hdfs 先
[hadoop@hadoop181 ~]$ xssh jps -l
[DEBUG] 1 command is :jps -l
[DEBUG] ssh to hadoop181 to execute commands [ jps -l]
7057 org.apache.hadoop.hdfs.server.namenode.NameNode
11969 sun.tools.jps.Jps
7186 org.apache.hadoop.hdfs.server.datanode.DataNode
10409 org.apache.zookeeper.server.quorum.QuorumPeerMain
7437 org.apache.hadoop.hdfs.qjournal.server.JournalNode
[DEBUG] ssh to hadoop182 to execute commands [ jps -l]
7044 org.apache.zookeeper.server.quorum.QuorumPeerMain
8581 sun.tools.jps.Jps
5112 org.apache.hadoop.hdfs.server.datanode.DataNode
5225 org.apache.hadoop.hdfs.qjournal.server.JournalNode
5020 org.apache.hadoop.hdfs.server.namenode.NameNode
[DEBUG] ssh to hadoop183 to execute commands [ jps -l]
5168 org.apache.hadoop.hdfs.qjournal.server.JournalNode
4963 org.apache.hadoop.hdfs.server.namenode.NameNode
8515 sun.tools.jps.Jps
6987 org.apache.zookeeper.server.quorum.QuorumPeerMain
5055 org.apache.hadoop.hdfs.server.datanode.DataNode
3.2 启动yarn
# 启动,在哪一台都可以
[hadoop@hadoop181 ~]$ start-yarn.sh
3.3 启动状态监控
(1)查看jps
# 启动状态查看
[hadoop@hadoop181 ~]$ xssh jps -l
[DEBUG] 1 command is :jps -l
[DEBUG] 1 command is :jps -l
[DEBUG] ssh to hadoop181 to execute commands [ jps -l]
7057 org.apache.hadoop.hdfs.server.namenode.NameNode
7186 org.apache.hadoop.hdfs.server.datanode.DataNode
10757 org.apache.spark.deploy.worker.Worker
14870 sun.tools.jps.Jps
14472 org.apache.hadoop.yarn.server.nodemanager.NodeManager
10409 org.apache.zookeeper.server.quorum.QuorumPeerMain
14347 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
7437 org.apache.hadoop.hdfs.qjournal.server.JournalNode
9246 org.apache.spark.deploy.history.HistoryServer
[DEBUG] ssh to hadoop182 to execute commands [ jps -l]
9952 org.apache.hadoop.yarn.server.nodemanager.NodeManager
10291 sun.tools.jps.Jps
7044 org.apache.zookeeper.server.quorum.QuorumPeerMain
7252 org.apache.spark.deploy.worker.Worker
7364 org.apache.spark.deploy.master.Master
5112 org.apache.hadoop.hdfs.server.datanode.DataNode
5225 org.apache.hadoop.hdfs.qjournal.server.JournalNode
9867 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
5020 org.apache.hadoop.hdfs.server.namenode.NameNode
[DEBUG] ssh to hadoop183 to execute commands [ jps -l]
5168 org.apache.hadoop.hdfs.qjournal.server.JournalNode
10224 sun.tools.jps.Jps
4963 org.apache.hadoop.hdfs.server.namenode.NameNode
7188 org.apache.spark.deploy.worker.Worker
7301 org.apache.spark.deploy.master.Master
9801 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
6987 org.apache.zookeeper.server.quorum.QuorumPeerMain
9886 org.apache.hadoop.yarn.server.nodemanager.NodeManager
5055 org.apache.hadoop.hdfs.server.datanode.DataNode
[hadoop@hadoop181 hadoop]$
(2)WEB浏览器查看
http://hadoop181:8088/cluster
http://hadoop182:8088/cluster
http://hadoop183:8088/cluster
(3)激活状态查看
[hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm1
standby
[hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm2
active
[hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm3
standby
[hadoop@hadoop181 hadoop]$
(4)kill 激活的resourcemanager 看看
[hadoop@hadoop181 hadoop]$ ssh hadoop@hadoop182 "kill -9 10564"
(5)kill rn2 之后再看看
[hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm1
active
[hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm3
standby
[hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm2
2020-09-04 11:55:25,237 INFO ipc.Client: Retrying connect to server: hadoop182/192.168.207.182:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From hadoop181/192.168.207.181 to hadoop182:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
[hadoop@hadoop181 hadoop]$
好了, 到此 我们的yarn ha 环境搭建完毕 ~~~~~