Yarn:资源调度器
Yarn是一个资源调度平台,负责为运算程序提供服务器运算资源,相当于一个分布式的操作系统平台,而MapReduce等运算程序相当于操作系统上的应用程序
Yarn的基本构架
YARN主要由ResourceManager、NodeManager、ApplicationMaster和Container组成
Yarn工作机制
1.向Yarn提交应用程序
2.ResourceManager为其分配一个Container,并与NodeManager通信,令其启动Container中的ApplicationMaster。
3.ApplicationMaster向ResourceManager注册,重复4-7
4.ApplicationMaster采用轮询的方式通过RPC协议向ResourceManager申请领取资源
5.ApplicationMaster领取资源后与NodeManager通信,要其启动任务
6.NodeManager设置好环境后,通过脚本启动任务
7.任务通过RPC协议返回进度给ApplicationMaster
8.任务完成,ApplicationMaster向ResourceManager申请注销
资源调动器
任务推测执行
1.作业完成时间取决于最慢的任务完成时间
一个作业由若干个Map任务和Reduce任务构成。因硬件老化、软件Bug等,某些任务可能运行非常慢。
思考:系统中有99%的Map任务都完成了,只有少数几个Map老是进度很慢,完不成,怎么办?
2.推测执行机制
发现拖后腿的任务,比如某个任务运行速度远慢于任务平均速度。为拖后腿任务启动一个备份任务,同时运行。谁先运行完,则采用谁的结果。
前提条件:
1.每个Task只能有一个备份任务
2.当前Job已完成的Task必须不能小于0.05
3.开启推测执行参数设置
mapred-site.xml文件中默认是打开的。
<property>
<name>mapreduce.map.speculative</name>
<value>true</value>
<description>If true, then multiple instances of some map tasks may be executed in parallel.</description>
</property>
<property>
<name>mapreduce.reduce.speculative</name>
<value>true</value>
<description>If true, then multiple instances of some reduce tasks may be executed in parallel.</description>
</property>
不能启用推测执行机制情况:
任务间存在严重的负载倾斜
2.特殊任务,比如向数据库中写数据
算法:
Yarn HA配置(HDFS HA之后)
官方文档
http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
工作机制
配置
[yyx@hadoop01 hadoop-2.7.2]$ vim etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--启用resourcemanager ha-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--声明两台resourcemanager的地址-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster-yarn1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hadoop02</value>
</property>
<!--指定zookeeper集群的地址-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
<!--启用自动恢复-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--指定resourcemanager的状态信息存储在zookeeper集群-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
</configuration>
发送给另外两台虚拟机
全都启动ZooKeeper
启动新集群
[yyx@hadoop01 hadoop-2.7.2]$ sbin/start-dfs.sh
Starting namenodes on [hadoop01 hadoop02]
hadoop01: starting namenode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-namenode-hadoop01.out
hadoop02: starting namenode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-namenode-hadoop02.out
hadoop01: starting datanode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-datanode-hadoop01.out
hadoop03: starting datanode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-datanode-hadoop03.out
hadoop02: starting datanode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-datanode-hadoop02.out
Starting journal nodes [hadoop01 hadoop02 hadoop03]
hadoop01: starting journalnode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-journalnode-hadoop01.out
hadoop02: starting journalnode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-journalnode-hadoop02.out
hadoop03: starting journalnode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-journalnode-hadoop03.out
Starting ZK Failover Controllers on NN hosts [hadoop01 hadoop02]
hadoop01: starting zkfc, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-zkfc-hadoop01.out
hadoop02: starting zkfc, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-zkfc-hadoop02.out
启动Yarn
尽管配置了Yarn HA,但是Yarn只能手动启动
[yyx@hadoop01 hadoop-2.7.2]$ sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/module/ha/hadoop-2.7.2/logs/yarn-yyx-resourcemanager-hadoop01.out
hadoop01: starting nodemanager, logging to /opt/module/ha/hadoop-2.7.2/logs/yarn-yyx-nodemanager-hadoop01.out
hadoop02: starting nodemanager, logging to /opt/module/ha/hadoop-2.7.2/logs/yarn-yyx-nodemanager-hadoop02.out
hadoop03: starting nodemanager, logging to /opt/module/ha/hadoop-2.7.2/logs/yarn-yyx-nodemanager-hadoop03.out
[yyx@hadoop02 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager
结果:
(方式一)
如果此时写成Hadoop02,也会自动跳转到Hadoop01
(方式二)
[yyx@hadoop02 hadoop-2.7.2]$ bin/yarn rmadmin -getServiceState rm1
active
[yyx@hadoop02 hadoop-2.7.2]$ bin/yarn rmadmin -getServiceState rm2
standby