Hadoop和Yarn故障切换高可用原理
高可用相关参数
Configuration Properties | Description |
---|---|
hadoop.zk.address | Address of the ZK-quorum. Used both for the state-store and embedded leader-election. |
yarn.resourcemanager.ha.enabled | Enable RM HA. |
yarn.resourcemanager.ha.rm-ids | List of logical IDs for the RMs. e.g., “rm1,rm2”. |
yarn.resourcemanager.hostname.rm-id | For each rm-id, specify the hostname the RM corresponds to. Alternately, one could set each of the RM’s service addresses. |
yarn.resourcemanager.address.rm-id | For each rm-id, specify host:port for clients to submit jobs. If set, overrides the hostname set in yarn.resourcemanager.hostname. rm-id. |
yarn.resourcemanager.scheduler.address.rm-id | For each rm-id, specify scheduler host:port for ApplicationMasters to obtain resources. If set, overrides the hostname set in yarn.resourcemanager.hostname. rm-id. |
yarn.resourcemanager.resource-tracker.address.rm-id | For each rm-id, specify host:port for NodeManagers to connect. If set, overrides the hostname set in yarn.resourcemanager.hostname. rm-id. |
yarn.resourcemanager.admin.address.rm-id | For each rm-id, specify host:port for administrative commands. If set, overrides the hostname set in yarn.resourcemanager.hostname. rm-id. |
yarn.resourcemanager.webapp.address.rm-id | For each rm-id, specify host:port of the RM web application corresponds to. You do not need this if you set yarn.http.policy to HTTPS_ONLY . If set, overrides the hostname set in yarn.resourcemanager.hostname. rm-id. |
yarn.resourcemanager.webapp.https.address.rm-id | For each rm-id, specify host:port of the RM https web application corresponds to. You do not need this if you set yarn.http.policy to HTTP_ONLY . If set, overrides the hostname set in yarn.resourcemanager.hostname. rm-id. |
yarn.resourcemanager.ha.id | Identifies the RM in the ensemble. This is optional; however, if set, admins have to ensure that all the RMs have their own IDs in the config. |
yarn.resourcemanager.ha.automatic-failover.enabled | Enable automatic failover; By default, it is enabled only when HA is enabled. |
yarn.resourcemanager.ha.automatic-failover.embedded | Use embedded leader-elector to pick the Active RM, when automatic failover is enabled. By default, it is enabled only when HA is enabled. |
yarn.resourcemanager.cluster-id | Identifies the cluster. Used by the elector to ensure an RM doesn’t take over as Active for another cluster. |
yarn.client.failover-proxy-provider | The class to be used by Clients, AMs and NMs to failover to the Active RM. |
yarn.client.failover-no-ha-proxy-provider | The class to be used by Clients, AMs and NMs to failover to the Active RM, when not running in HA mode |
yarn.client.failover-max-attempts | The max number of times FailoverProxyProvider should attempt failover. |
yarn.client.failover-sleep-base-ms | The sleep base (in milliseconds) to be used for calculating the exponential delay between failovers. |
yarn.client.failover-sleep-max-ms | The maximum sleep time (in milliseconds) between failovers. |
yarn.client.failover-retries | The number of retries per attempt to connect to a ResourceManager. |
yarn.client.failover-retries-on-socket-timeouts | The number of retries per attempt to connect to a ResourceManager on socket timeouts. |
故障转移原理
通常调用Yarn和Hadoop的客户端api时都会经过他们的一层代理类,每次调用代理类方法失败时会进行重试或者故障转移,代理类的创建是通过RetryProxy#create
创建的,例如RMProxy、ServerProxy、NameNodeProxies都会通过它创建相应的客户端代理。
下面主要解析一下YarnClient使用的RMProxy(默认Hadoop2):
-
创建RMProxy
RMProxy#createRMProxy
-
创建故障切换代理供应者FailoverProxyProvider,默认为ConfiguredRMFailoverProxyProvider,用于出现故障时切换节点,对应配置
yarn.client.failover-proxy-provider
RMProxy#createRMFailoverProxyProvider
-
创建失败重试策略,对应配置
yarn.resourcemanager.connect.max-wait.ms
、yarn.resourcemanager.connect.retry-interval.ms
、yarn.client.failover-sleep-base-ms
、yarn.client.failover-sleep-max-ms
、yarn.client.failover-max-attempts
如果没有配置:yarn.client.failover-max-attempts
最大重试次数为:yarn.resourcemanager.connect.max-wait.ms / yarn.client.failover-sleep-max-ms
如果yarn.client.failover-max-attempts为0,那么就没有故障切换
RMProxy#createRetryPolicy
-
初始化高可用配置,对应配置
yarn.resourcemanager.ha.rm-ids
、yarn.resourcemanager.address.rm-id
ConfiguredRMFailoverProxyProvider#init
-
代理层调用方法
RetryInvocationHandler#invoke
(Hadoop3)
RetryInvocationHandler.Call#invokeOnce
-
判断是否要进行重试
RetryPolicy#shouldRetry
-
方法调用异常时主从切换
ConfiguredRMFailoverProxyProvider#performFailover
(Hadoop3)
RetryInvocationHandler.ProxyDescriptor#failover