Yarn

Yarn:资源调度器

Yarn是一个资源调度平台,负责为运算程序提供服务器运算资源,相当于一个分布式的操作系统平台,而MapReduce等运算程序相当于操作系统上的应用程序

Yarn的基本构架

YARN主要由ResourceManager、NodeManager、ApplicationMaster和Container组成
在这里插入图片描述

Yarn工作机制

在这里插入图片描述
1.向Yarn提交应用程序
2.ResourceManager为其分配一个Container,并与NodeManager通信,令其启动Container中的ApplicationMaster。
3.ApplicationMaster向ResourceManager注册,重复4-7
4.ApplicationMaster采用轮询的方式通过RPC协议向ResourceManager申请领取资源
5.ApplicationMaster领取资源后与NodeManager通信,要其启动任务
6.NodeManager设置好环境后,通过脚本启动任务
7.任务通过RPC协议返回进度给ApplicationMaster
8.任务完成,ApplicationMaster向ResourceManager申请注销

资源调动器

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

任务推测执行

1.作业完成时间取决于最慢的任务完成时间
一个作业由若干个Map任务和Reduce任务构成。因硬件老化、软件Bug等,某些任务可能运行非常慢。
思考:系统中有99%的Map任务都完成了,只有少数几个Map老是进度很慢,完不成,怎么办?
2.推测执行机制
发现拖后腿的任务,比如某个任务运行速度远慢于任务平均速度。为拖后腿任务启动一个备份任务,同时运行。谁先运行完,则采用谁的结果。
前提条件:
1.每个Task只能有一个备份任务
2.当前Job已完成的Task必须不能小于0.05
3.开启推测执行参数设置
mapred-site.xml文件中默认是打开的。

<property>
  	<name>mapreduce.map.speculative</name>
  	<value>true</value>
  	<description>If true, then multiple instances of some map tasks may be executed in parallel.</description>
</property>

<property>
	<name>mapreduce.reduce.speculative</name>
  	<value>true</value>
  	<description>If true, then multiple instances of some reduce tasks may be executed in parallel.</description>
</property>

不能启用推测执行机制情况:
任务间存在严重的负载倾斜
2.特殊任务,比如向数据库中写数据
算法:
在这里插入图片描述

Yarn HA配置(HDFS HA之后)

官方文档
http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

工作机制

在这里插入图片描述

配置

[yyx@hadoop01 hadoop-2.7.2]$ vim etc/hadoop/yarn-site.xml 

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <!--启用resourcemanager ha-->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <!--声明两台resourcemanager的地址-->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>cluster-yarn1</value>
    </property>

    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop01</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop02</value>
    </property>

    <!--指定zookeeper集群的地址-->
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>

    <!--启用自动恢复-->
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

    <!--指定resourcemanager的状态信息存储在zookeeper集群-->
    <property>
        <name>yarn.resourcemanager.store.class</name>
          <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

</configuration>

发送给另外两台虚拟机
全都启动ZooKeeper
启动新集群

[yyx@hadoop01 hadoop-2.7.2]$ sbin/start-dfs.sh
Starting namenodes on [hadoop01 hadoop02]
hadoop01: starting namenode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-namenode-hadoop01.out
hadoop02: starting namenode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-namenode-hadoop02.out
hadoop01: starting datanode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-datanode-hadoop01.out
hadoop03: starting datanode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-datanode-hadoop03.out
hadoop02: starting datanode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-datanode-hadoop02.out
Starting journal nodes [hadoop01 hadoop02 hadoop03]
hadoop01: starting journalnode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-journalnode-hadoop01.out
hadoop02: starting journalnode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-journalnode-hadoop02.out
hadoop03: starting journalnode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-journalnode-hadoop03.out
Starting ZK Failover Controllers on NN hosts [hadoop01 hadoop02]
hadoop01: starting zkfc, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-zkfc-hadoop01.out
hadoop02: starting zkfc, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-yyx-zkfc-hadoop02.out

启动Yarn
尽管配置了Yarn HA,但是Yarn只能手动启动

[yyx@hadoop01 hadoop-2.7.2]$ sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/module/ha/hadoop-2.7.2/logs/yarn-yyx-resourcemanager-hadoop01.out
hadoop01: starting nodemanager, logging to /opt/module/ha/hadoop-2.7.2/logs/yarn-yyx-nodemanager-hadoop01.out
hadoop02: starting nodemanager, logging to /opt/module/ha/hadoop-2.7.2/logs/yarn-yyx-nodemanager-hadoop02.out
hadoop03: starting nodemanager, logging to /opt/module/ha/hadoop-2.7.2/logs/yarn-yyx-nodemanager-hadoop03.out
[yyx@hadoop02 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager

结果:
(方式一)
在这里插入图片描述
如果此时写成Hadoop02,也会自动跳转到Hadoop01
(方式二)

[yyx@hadoop02 hadoop-2.7.2]$ bin/yarn rmadmin -getServiceState rm1
active
[yyx@hadoop02 hadoop-2.7.2]$ bin/yarn rmadmin -getServiceState rm2
standby
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值