YARN资源调度框架_which component is responsible for negotiating res-CSDN博客

本文链接：https://blog.csdn.net/zhaoxiaoba123/article/details/108542610

YARN的概述
YARN Yet Another Resource Negotiator 的缩写
通用的资源管理框架
为上层的应用提供统一的资源管理和调度

The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job or a DAG of jobs.
YARN的基本思想是将资源管理和作业调度/监视的功能拆分为单独的守护程序。这个想法是拥有一个全局的ResourceManager（RM）和每个应用程序的ApplicationMaster（AM）。应用程序可以是单个作业，也可以是作业的DAG。

The ResourceManager and the NodeManager form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.
ResourceManager和NodeManager组成数据计算框架。ResourceManager是在系统中所有应用程序之间仲裁资源的最终权限。NodeManager是每台机器的框架代理，负责容器，监视其资源使用情况（cpu，内存，磁盘，网络），并将其报告给ResourceManager / Scheduler。

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
实际上，每个应用程序的ApplicationMaster是框架特定的库，并负责与ResourceManager协商资源并与NodeManager一起执行和监视任务。。

YARN架构详解
Client 客户端、ResourceManager全局资源管理、NodeManager节点管理、ApplicationMaster应用程序

Client :向RM提交任务、杀死任务等
ApplicationMaster:
每个应用程序对应一个AM，
AM向RM申请资源用于在NM上启动对应的Task
数据切分
为每个task向RM申请资源（contation）
NodeManager通行
任务监控

NodeManager:干活
RM发送心跳信息
任务的执行情况
接收RM请求启动任务
处理来自AM的任务

ResourceManager:集群中同一时刻对外提供服务的只有1个，负责资源相关启动/监控

contatiner:任务的运行抽象 memory cpu task是运行在container里面
在这里插入图片描述
The ResourceManager has two main components: Scheduler and ApplicationsManager.
ResourceManager具有两个主要组件：Scheduler和ApplicationsManager。

The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.
调度程序负责将资源分配给各种正在运行的应用程序，但要遵循熟悉的容量，队列等约束。调度程序在不对应用程序状态进行监视或跟踪的意义上是纯调度程序。此外，它也不保证由于应用程序故障或硬件故障而重新启动失败的任务。调度程序根据应用程序的资源需求执行调度功能；它是基于资源容器的抽象概念来实现的，该容器包含诸如内存，cpu，磁盘，网络等元素。

The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based on the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc.
调度程序具有可插拔策略，该策略负责在各种队列，应用程序等之间分配群集资源。当前的调度程序（例如CapacityScheduler和FairScheduler）将是一些插件示例。

The Scheduler has a pluggable policy which is responsible for partitioning the cluster resources among the various queues, applications etc. The current schedulers such as the CapacityScheduler and the FairScheduler would be some examples of plug-ins.
The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.
ApplicationsManager负责接受作业提交，协商用于执行特定于应用程序的ApplicationMaster的第一个容器，并提供在失败时重新启动ApplicationMaster容器的服务。每个应用程序ApplicationMaster负责与调度程序协商适当的资源容器，跟踪其状态并监视进度。

MapReduce in hadoop-2.x maintains API compatibility with previous stable release (hadoop-1.x). This means that all MapReduce jobs should still run unchanged on top of YARN with just a recompile.
hadoop-2.x中的MapReduce保持与以前的稳定版本（hadoop-1.x）的API兼容性。这意味着仅通过重新编译，所有MapReduce作业仍应在YARN上保持不变。

YARN supports the notion of resource reservation via the ReservationSystem, a component that allows users to specify a profile of resources over-time and temporal constraints (e.g., deadlines), and reserve resources to ensure the predictable execution of important jobs.The ReservationSystem tracks resources over-time, performs admission control for reservations, and dynamically instruct the underlying scheduler to ensure that the reservation is fullfilled.
YARN支持的概念，资源预留通过ReservationSystem，即允许用户指定资源的分布随时间和时间限制（例如，截止日期），以及后备资源，确保重要jobs.The可预见的执行组件ReservationSystem跟踪资源超时，执行保留的准入控制，并动态指示底层的调度程序以确保保留已满。

In order to scale YARN beyond few thousands nodes, YARN supports the notion of Federation via the YARN Federation feature. Federation allows to transparently wire together multiple yarn (sub-)clusters, and make them appear as a single massive cluster. This can be used to achieve larger scale, and/or to allow multiple independent clusters to be used together for very large jobs, or for tenants who have capacity across all of them.
© 2008-2019 Apache Software Foundation - Privacy Policy. Apache Maven, Maven, Apache, the Apache feather logo, and the Apache Maven project logos
为了超越几千个节点规模，纱线支持的概念联合会通过YARN联合会功能。联合允许将多个纱线（子）簇透明地连接在一起，并使它们看起来像是一个整体簇。这可以用于实现更大的规模，和/或允许将多个独立的群集一起用于非常大的工作，或用于具有全部能力的租户

YARN的执行流程
在这里插入图片描述
Client---->1、提交了任务在RM---->2、RM要求在二个NM启动了一个容器—>3、容器里面来运行 AM或者 MAP和 Reducer 任务—>4、AM像RM申请注册，这样就可以直接通过客户端查到作业的运行情况---->5、AM像另外两个节点启动容器—>6、另外两个节点又在容器中运行任务

YARN的环境部署但节点
去hadoop安装目录：/home/hadoop/app/hadoop-2.6.0-cdh5.15.1/etc/hadoop 配置两个文件
mapred-site.xml:和yarn-site.xml

在mapred-site.xml添加代码块

mapreduce.framework.name
yarn

在yarn-site.xml添加代码块

yarn.nodemanager.aux-services
mapreduce_shuffle

然后去sbin/目录下一启动start-yarn.sh就好了关闭是stop-yarn.sh
出来一下进程就好了，可以登陆http://ip:8088/查看YARN
在这里插入图片描述
提交example案例到YARN上运行
到这个路径下执行系统给的案例example
/home/hadoop/app/hadoop-2.6.0-cdh5.15.1/share/hadoop/mapreduce
如下所示：
hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.15.1.jar pi 2 3
如果抱着个错误

那么就在yarn-site.xml文件里增加就可以了

yarn.nodemanager.local-dirs
/home/hadoop/app/tmp/nm-local-dir

在这里插入图片描述
提交流量统计案例到YARN上运行
1、现在我们本地的流量项目的AccessLocalApp文件改如下地方这样可以在YARN提交时指定路径

2、去到我们项目的更目录进行打包，打包命令为：mvn clean package -DskipTests

3、把我们本地打完的包还有要分析流量数据文件上传到服务器，流量文件还要上传到HDFS
sudo scp /Users/zhaoxinbo/IdeaProjects/untitled/hadooptrainv2/hadooptrainv2/target/hadoop-train-v2-1.0.jar hadoop@192.168.1.200:~/lib/

sudo scp /Users/zhaoxinbo/IdeaProjects/untitled/hadooptrainv2/hadooptrainv2/input/access.log hadoop@192.168.1.200:~/data/

新建个HDFS文件目录access/input
hadoop fs -mkdir /access/input
把access.log文件上传到目录下
hadoop fs -put access.log /access/input/

4、提交YARN提交执行
hadoop jar 打好的包项目包名加主类名参数
hadoop jar ~/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.access.AccessLocalApp /access/input/access.log /access/output
在这里插入图片描述