Yarn 高可用
Yarn 的故障转移有手动故障转移和自动故障转移, 手动故障转移是通过管理员执行 yarn 命令, 自动故障转移依赖于 Zookeeper 实现, 不需要额外的独立运行一个 ZKFC.
自动故障转移官方介绍:
The RMs have an option to embed the Zookeeper-based ActiveStandbyElector to decide which RM should be the Active. When the Active goes down or becomes unresponsive, another RM is automatically elected to be the Active which then takes over. Note that, there is no need to run a separate ZKFC daemon as is the case for HDFS because ActiveStandbyElector embedded in RMs acts as a failure detector and a leader elector instead of a separate ZKFC deamon.
yarn-site 配置, 以及具体步骤, 后续补全…
官方文档
RM 重启
RM restart 在 Hadoop 2.4 和 Hadoop 2.6 有两个不同的阶段.
- 2.4 版本, 主要是把 RM 在重启的过程中, 把正在运行的任务通过 store file 将其保存起来, 并在 recovery 过程中把其重新 load , 从而让 RM 重启后恢复之前正在运行的任务.
- 而 2.6 版本在 2.4 版本的基础之上, 主要关注于 RM 运行状态的重新构建:
" Beyond all the groundwork that has been done in Phase 1 to ensure the persistency of application state and reload that state on recovery, Phase 2 primarily focuses on re-constructing the entire running state of YARN cluster, the majority of which is the state of the central scheduler inside RM which keeps track of all containers’ life-cycle, applications’ headroom and resource requests, queues’ resource usage etc. ", 由此可以知道, 在 2.6 版本的 Hadoop 之后, RM 在重启过程中不会杀死任何正在运行的 Application.