Resourcemanager HA

最新推荐文章于 2023-04-14 18:05:19 发布

lm709409753

最新推荐文章于 2023-04-14 18:05:19 发布

阅读量2k

点赞数

分类专栏： Hadoop

本文链接：https://blog.csdn.net/lm709409753/article/details/54601852

版权

该指南介绍了YARN的ResourceManager高可用性，包括其配置和使用方法。ResourceManager负责跟踪集群资源并调度应用程序。在Hadoop 2.4之前，ResourceManager是单点故障。HA特性通过Active/Standby ResourceManager对消除了这一问题。当Active RM失败时，可通过手动或自动故障转移实现Standby RM接管。客户端、ApplicationMaster和NodeManager会轮询连接到RMs，直到找到Active RM。ResourceManager的状态恢复依赖于ResourceManager重启功能，推荐使用ZKRMStateStore以确保HA集群的安全。

摘要由CSDN通过智能技术生成

1.官网resourcemanager HA

Introduction

This guide provides an overview of High Availability of YARN's ResourceManager, and details how to configure and use this feature. The ResourceManager (RM) is responsible for tracking the resources in a cluster, and scheduling applications (e.g., MapReduce jobs). Prior to Hadoop 2.4, the ResourceManager is the single point of failure in a YARN cluster. The High Availability feature adds redundancy in the form of an Active/Standby ResourceManager pair to remove this otherwise single point of failure.

Architecture

Overview of ResourceManager High Availability

RM Failover

ResourceManager HA is realized through an Active/Standby architecture - at any point of time, one of the RMs is Active, and one or more RMs are in Standby mode waiting to take over should anything happen to the Active. The trigger to transition-to-active comes from either the admin (through CLI) or through the integrated failover-controller when automatic-failover is enabled.

Manual transitions and failover

When automatic failover is not enabled, admins have to manually transition one of the RMs to Active. To failover from one RM to the other, they are expected to first transition the Active-RM to Standby and transition a Standby-RM to Active. All this can be done using the "yarn rmadmin" CLI.

Automatic failover

The RMs have an option to embed the Zookeeper-based ActiveStandbyElector to decide which RM should be the Active. When the Active goes down or becomes unresponsive, another RM is automatically elected to be the Active which then takes over. Note that, there is no need to run a separate ZKFC daemon as is the case for HDFS because ActiveStandbyElector embedded in RMs acts as a failure detector and a leader elector instead of a separate ZKFC deamon.

Client, ApplicationMaster and NodeManager on RM failover

When there are multiple RMs, the configuration (yarn-site.xml) used by clients and nodes is expected to list all the RMs. Clients, ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs in a round-robin fashion until they hit the Active RM. If the Active goes down, they resume the round-robin polling until they hit the "new" Active. This default retry logic is implemented as org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider. You can override the logic by implementing org.apache.hadoop.yarn.client.RMFailoverProxyProvider and setting the value of yarn.client.failover-proxy-provider to the class name.