Hadoop在2.4以后引入了ResourceManager HA,避免了单点失败。
官方文档:ResourceManager High Availability
按照之前的安装配置,使用master和master2两台主机作ResourceManager的HA。
按照文档说明,我们对yarn-site.xml进行配置:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property></span>
</configuration>
分别执行命令,启动ResourceManager,可以看到两台机器都有了ResourceManager进程。
yarn-daemon.sh start resourcemanager
登录http://master:8088/和http://master2:8088/能看到主备情况,可以kill ResourceManager进程进行切换测试。
附一些切换命令:
hdfs haadmin -transitionToActive/transitionToStandby
yarn rmadmin -transitionToActive/transitionToStandby
hdfs haadmin -transitionToActive --forcemanual nn1