【Hadoop】学习笔记（八）

最新推荐文章于 2023-06-06 17:31:33 发布

擅长开发Bug的Mr.NaCl

最新推荐文章于 2023-06-06 17:31:33 发布

阅读量686

点赞数

分类专栏： Hadoop

本文链接：https://blog.csdn.net/lushixuan12345/article/details/126696488

版权

Hadoop 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Hadoop

四、MapReduce

四、MapReduce

4.1、Yarn概述

4.1.1、Yarn

Yarn是一个资源调度平台，负责为运算程序提供服务器运算资源，相当于一个分布式的操作系统平台，而MapReduce等运算程序则相当于运行于操作系统之上的应用程序。

Yarn 是 Yet Another Resource Negotiator（另一种资源协调者）的缩写。

Yarn 是一个通用的资源管理系统和调度平台，可以为上层应用提供统一的资源管理和调度，它的引入为集群在利用率、资源统一管理和数据共享等方面带来了巨大好处。

虽然 Yarn 属于Hadoop的一部分，但是Yarn不仅仅能运行 MapReduce，还可以运行 Tez、HBase、Spark、Flink等等程序，理论上支持各种计算程序。Yarn不关心你做的什么，只负责管理资源（内存和 CPU）。

4.1.2、Yarn架构

Yarn由 ResourceManager、NodeManager、ApplicationMaster 和 Container 等组件构成。
在这里插入图片描述

4.2、Yarn工作机制

4.2.1、Yarn工作机制

在这里插入图片描述

4.2.2、调度器和调度算法

调度器（Resource Scheduler）是属于ResourceManager内部的一个组件，当有多个 job 提交过来的时候，使用调度器进行调度。

在理想情况下，应用程序提出的请求都将立即得到Yarn批准。但在实际中，资源是有限的，并且繁忙的在群集上。Yarn调度程序的工作是根据一些定义的策略为应用程序分配资源。

在Yarn中，负责给应用程序分配资源的就是Scheduler，它是ResourceMananger的核心组件之一。
Scheduler完全专用于调度作业，它无法跟踪应用程序的状态。

一般而言，调度并没有一个“最佳”策略，而是要根据实际情况进行调整，所以Yarn提供了多种调度器和可配置的策略供选择。

Hadoop作业调度器主要有三种：先进先出调度器（FIFO）、容量调度器（Capacity Scheduler）、公平调度器（Fair Scheduler）。

Apache的Hadoop 3.x 默认的资源调度器是 Capacity Scheduler。CDH 的yarn默认使用的是 Fair Scheduler。

可以通过配置文件配置：yarn-site.xml

<property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <!-- 默认使用的 Capacity Scheduler -->
	<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

4.2.2.1、先进先出调度器（FIFO）

FIFO调度器（First In First Out）：单队列，根据提交作业的先后顺序，先来先服务。调度工作不考虑优先级和范围，适用于负载较低的小规模集群。当使用大型共享集群时，它的效率较低且会导致一些问题。

FIFO Scheduler拥有一个控制全局的队列queue，默认queue名称为default，该调度器会获取当前集群上所有的资源信息作用与这个全局的queue。

优点：无需配置，先到先得，易于执行。

缺点：任务的优先级不会变高，因此高优先级的作业也需要等待，不适合共享集群。

FIFO是 Hadoop 1.x 中的 JobTracker 原有的调度器实现，在Yarn中保留了下来。实际生产中用到大数据的场景都是高并发的，所以一般不会采用这种调度器。

在这里插入图片描述

4.2.2.2、容量调度器（Capacity Scheduler）

Capacity Scheduler 是 Yahoo 开发的多用户调度器，是Apache版的 Hadoop 3.x 的默认调度策略。

该策略允许多个组织共享整个集群资源，每个组织可以获得集群的一部分计算能力。通过为每个组织分配专门的队列，然后再为每个队列分配一定的集群资源，这样整个集群就可以通过设置多个队列的方式给多个组织提供服务了。

Capacity可以理解成一个个的资源队列，这个资源队列是用户自己去分配的。队列内部又可以垂直划分，这样一个组织内部的多个成员就可以共享这个队列资源了，在一个队列内部，资源的调度是采用的是先进先出（FIFO）策略。

Capacity Scheduler调度器以队列为单位划分资源。简单点说，就是一个个队列有独立的资源，队列的结构和资源是可以进行配置的。

优点：

层次化的队列设计（Hierarchical Queues）：层次化的管理，可以更容易、更合理分配和限制资源的使用
容量保证（Capacity Guarantees）：每个队列上都可以设置一个资源的占比，保证每个队列都不会占用整个集群的资源
安全（Security）：每个队列有严格的访问控制。用户只能向自己的队列里提交任务，而且不能修改或访问其他队列的任务
弹性分配（Elasticity）：空闲的资源可以被分配给任何队列；当多个队列出现争用的时候，则按照权重比例进行平衡

在这里插入图片描述

特点：

多队列：每个队列可配置一定的资源量，每个队列可配置一定的资源量，每个队列采用FIFO调度策略。
虽然每个队列采用的是FIFO，但是如果前一个job占用的资源较少，第二个job也可以同步执行。例如 queueA中设置了8G内存，job11只消耗2G内存，那么job12就可以同步执行
容量保证：管理员可以为每个队列设置资源最低保证、使用上限
灵活性：如果一个队列中的资源有剩余，可以暂时共享给其他需要资源的队列，而一旦该队列有新的应用程序提交，则其他队列借调的资源会归还给该队列
多租户：支持多用户共享集群和多应用程序同时运行。为了防止同一个用户的作业独占队列中的资源，该调度器会对同一个用户提交的作业所占的资源量进行限定。

4.2.2.3、公平调度器（Fair Scheduler）

Fair Scheduler，公平调度器，是 Facebook 开发的多用户调度器。提供了Yarn应用程序公平地共享大型集群中资源的另一种形式。使所有应用在平均情况下随着时间的流式可以获得相等的资源份额。

Fair Scheduler设计目标是为所有的应用分配公平的资源（对公平的定义通过参数来设置）。

公平调度可以在多个丢列建工作，允许资源共享和抢占。

优点：

分层队列：队列可以按层次结构排列以划分资源，并可以配置权重以按特定比例共享集群
基于用户或组的队列映射：可以根据提交任务的用户名或组来分配队列。如果任务指定了一个队列，则在该队列中提交任务
资源抢占：根据应用的配置，抢占和分配资源可以是友好的或者强制的。默认不启用资源抢占
保证最小配额：可以设置队列最小资源，允许将保证的最小份额分配给队列，保证用户可以启动任务。当队列不能满足最小资源时，可以从其他队列抢占。当队列资源使用不完时，可以给其他队列使用。这对于确保某些用户、组或者生产应用使用获得足够的资源
允许资源共享：即当一个应用运行时，如果其他队列没有任务执行，则可以使用其他队列。当其他队列有应用需要资源时再将占用的队列释放出来。所有的应用都从资源队列中分配资源
默认不限制每个队列和用户可以同时运行应用的数量：可以配置来限制队列和用户并行执行的应用数量。限制并行执行应用数量不会导致任务提交失败，超出的应用会在队列中等待。

公平调度器的默认配置下：如果目前一个队列中有4个job正在运行，此时来了第5个job，这5个job共同平分整个队列的资源。
在这里插入图片描述

与容量调度器相同点：

多队列：支持多队列多作业
容量保证：管理员可以为每个队列设置资源最低保证、资源使用上限
灵活性：如果一个队列中的资源有剩余，可以暂时共享给那些需要资源的队列，而一旦该队列有新的应用程序提交，则其他队列借调的资源会归还给该队列
多租户：支持多用户共享集群和多应用程序同时运行；为了防止同一个用户的作业独占队列中的资源，该调度器会对同一个用户提交的作业所占资源量进行限定

公平调度器与容量调度器的核心调度策略不同：

容量调度器：优先选择资源利用率低的队列
公平调度器：优先选择对资源的缺额比例大的

公平调度器与容量调度器支持的单独设置资源分配方式不同：

容量调度器：FIFO、DRF
公平调度器：FIFO、FAIR、DRF

缺额产生的原因：

如果目前一个队列中有4个job正在运行，此时来了第5个job。理想情况下，如果使用公平调度器，第5个job进来之后马上就和前4个job平分整个队列的资源。

但是实际情况中，job5进来之后，只能先得到一点点资源，等待一段时间让其他job释放出一些资源出来。此时，第5个job应该分配的资源和实际得到的资源之间的差额就是缺额。

公平调度器设计目标是：在时间尺度上，所有作业获得公平的资源。调度器会优先为缺额大的作业分配资源。
在这里插入图片描述
公平调度器的资源分配策略：

FIFO策略：公平调度器每个队列资源分配策略如果选择FIFO的话，此时就相当于是容量调度器
Fair策略（默认）
DRF策略

Fair策略：

Fair策略是一种基于最大最小公平算法实现的资源多路复用方式，默认情况下，每个队列内部采用该方式分配资源。这意味着，如果一个队列中有两个应用程序同时运行，则每个应用程序可得到 1/2 的资源；如果三个应用程序同时运行，则每个应用程序可以得到 1/3 的资源。
具体的资源分配流程和容量调度器一致：先选择队列，再选择作业，再选择容器。这三步的每一步都是按照公平策略分配资源。（银行家算法）

实际最小资源份额：mindshare = Min(资源需求量, 配置的最小资源)

是否饥饿：isNeedy = 资源使用量 < mindshare

资源分配比：minShareRatio = 资源使用量 / Max(mindshare, 1)

资源使用权重比：useToWeightRatio = 资源使用量 / 权重
在这里插入图片描述

DRF策略：（Dominant Resource Fairness）
其他的调度器分配资源时都是单一标准，例如只考虑内存（Yarn默认情况），但是很多时候我们的资源分配要同时兼顾CPU、内存、网络带宽等资源，很难衡量两个应用应该分配的资源比例。
假设集群中的应用A需要2个CPU、300GB内存，应用B需要6个CPU、100G内存，那么意味着应用A是内存主导的，应用B是CPU主导的。针对这种情况，可以选择DFR策略对不同应用进行不同资源的同一个比例限制。

4.3、Yarn常用命令

Yarn状态的查询，除了可以在 http://hadoop103:8088（ResourceManager）页面查看外，还可以通过命令操作。

4.3.1、yarn application 查看任务

列出所有的 Application：

# 和浏览器页面看到的All Applications列表相同
yarn application -list

根据 Application 状态过滤：

# yarn application -list -appStates <ALL | NEW | NEW_SAVING | SUBMITTED | ACCEPTED | RUNNING | FINISHED | FAILED | KILLED>
yarn application -list -appStates FINISHED

Kill 掉 Application：

# yarn application -kill <application_id>
yarn application -kill application_1653269087987_0001

4.3.2、yarn logs 查看日志

查询 Application 日志：

# yarn logs -applicationId <application_id>
yarn logs -applicationId application_1653269087987_0001

查询 Container 日志：

# yarn logs -applicationId <application_id> -containerId <container_id>
yarn logs -applicationId application_1653269087987_0001 -containerId container_1653269087987_0001_01_000001

4.3.3、yarn applicationattempt 查看尝试运行的任务

即任务运行过程中的状态。可以获取到任务的AppAttemptID、AppMaster的ContainerID。

列出所有 Application 尝试的列表：

# yarn applicationattempt -list <application_id>
yarn applicationattempt -list application_1653269087987_0001

打印 ApplicationAttempt状态：

# yarn applicationattempt -status <AppAttempt_id>
yarn applicationattempt -status appattempt_1653269087987_0001_000001

4.3.4、yarn container 查看容器

只有在任务运行过程中才能看到的状态，任务运行完成后无法查看Container
列出所有 Container：

# yarn container -list <AppAttempt_id>

打印Container状态：

# yarn container -status <container_id>

4.3.5、yarn node 查看节点状态

yarn node -list -all

4.3.6、yarn rmadmin 更新配置

加载队列配置：（重新读了配置文件，可以不停机的情况下重新加载配置参数）

yarn rmadmin -refreshQueues

4.3.7、yarn queue 查看队列

打印队列信息：

# yarn queue -status <queue_name>
# 容量调度器、公平调度器都有一个default队列，可以查看该队列信息。也可以根据实际情况增加队列
yarn queue -status default

在浏览器页面也可以查看调度器的队列状态：Scheduler -> Application Queues -> 要查看的队列名称

4.4、Yarn生产环境核心配置参数

4.4.1、ResourceManager相关配置

yarn.resourcemanager.scheduler.class：配置调度器，Apache版默认容量调度器，CDH版默认公平调度器
yarn.resourcemanager.scheduler.client.thread-count：ResourceManager处理调度器请求的线程数量，默认50

4.4.2、NodeManager相关配置

CPU配置：

yarn.nodemanager.resource.detect-hardware-capabilities：是否让yarn自己检测硬件进行配置，默认false
yarn.nodemanager.resource.count-logical-processors-as-cores：是否将虚拟核数当做cpu核数，默认false
yarn.nodemanager.resource.pcores-vcores-multiplier：虚拟核数和物理核数乘数，默认值1。例如：4核8线程，该参数就应该设置为2
yarn.nodemaager.resource.cpu-vcores：NodeManager使用的cpu核数，默认8个

内存配置：

yarn.nodemanager.resource.memory-mb：NodeManager使用内存，默认8G。和system-reserved-memory-mb只需要配置其中一个
yarn.nodemanager.resource.system-reserved-memory-mb：NodeManager为系统保留多少内存。和memory-mb只需要配置其中一个
yarn.nodemanager.pmem-check-enabled：是否开启物理内存检查限制Container，默认打开。检查到超过限制时作业将被kill停止。
yarn.nodemanager.vmem-check-enabled：是否开启虚拟内存检查限制Container，默认打开
yarn.nodemanager.vmem-pmem-ration：虚拟内存物理内存比例，默认2.1 \

4.4.3、Container相关配置

内存配置：

yarn.scheduler.minimum-allocation-mb：容器最小内存，默认1G
yarn.scheduler.maximum-allocation-mb：容器最大内存，默认8G
CPU配置：
yarn.scheduler.minimum-allocation-vcores：容器最小CPU核数，默认1个
yarn.scheduler.maximum-allocation-vcores：容器最大CPU核数，默认4个

4.5、配置案例

4.5.1、Yarn配置

例如：从 1G 数据中，统计每个单词出现次数。
服务器：3台，每台4G内存，每台4核4线程的CPU
分析：
1G / 128M（切片大小） = 8 个MapTask；
统计单词个数，可以输出到一个结果文件，即 1 个ReduceTask；
统计单词个数只需要一个MapReduce任务，所以 MRAppMaster 1个；
共计需要 8 + 1 + 1 = 10 个Container。
平均每个 NodeManager 需要运行： 10个 / 3 台 = 3 个任务（4 / 3 / 3）

修改yarn-site.xml配置：

<property>
    <!-- 选择调度器，例如容量调度器 -->
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

<property>
    <!-- ResourceManager处理器调度器请求的线程数量，默认50 -->
    <!-- 如果提交的任务数量大于50，可以增加该值，但是不能超过 3台 * 4线程 = 12 线程（去除其他应用程序，实际上不能超过8） -->
    <description>Number of threads to handle scheduler interface.</description>
    <name>yarn.resourcemanager.scheduler.client.thread-count</name>
    <value>8</value>
</property>

<property>
    <!-- 是否让yarn自动检测硬件进行配置，默认false -->
    <!-- 如果该节点有很多其他应用程序，建议手动配置 -->
    <!-- 如果该节点没有其他应用程序，可以采用自动配置 -->
    <description>Enable auto-detection of node capabilities such as memory and CPU.</description>
    <name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
    <value>false</value>
</property>

<property>
    <!-- 是否将虚拟核数当做cpu核数，默认值false，采用物理核数 -->
    <description>Flag to determine if logical processors(such as
        hyperthreads) should be counted as cores. Only applicable on Linux
        when yarn.nodemanager.resource.cpu-vcores is set to -1 and
        yarn.nodemanager.resource.detect-hardware-capabilities is true.
    </description>
    <name>yarn.nodemanager.resource.count-logical-processors-as-cores</name>
    <value>false</value>
</property>

<property>
    <!-- 虚拟核数和物理核数乘数，默认值1.0 -->
    <!-- 此处我们的服务器时4核4线程，即核数和线程数比值为1.0 -->
    <description>Multiplier to determine how to convert phyiscal cores to
        vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
        is set to -1(which implies auto-calculate vcores) and
        yarn.nodemanager.resource.detect-hardware-capabilities is set to true. The
        number of vcores will be calculated as
        number of CPUs * multiplier.
    </description>
    <name>yarn.nodemanager.resource.pcores-vcores-multiplier</name>
    <value>1.0</value>
</property>

<property>
    <!-- NodeManager使用内存，默认设置的 -1，即不开启硬件检测时默认8G，开启的话自动计算 -->
    <!-- 这里我们服务器是4G，需要调整为4G -->
    <description>Amount of physical memory, in MB, that can be allocated 
        for containers. If set to -1 and
        yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
        automatically calculated(in case of Windows and Linux).
        In other cases, the default is 8192MB.
    </description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>4096</value>
</property>

<property>
    <!-- NodeManager的CPU核数，默认值-1。即不开启硬件检测时默认8，开启的话自动计算-->
    <!-- 此处我们的服务器只有4核4线程 -->
    <description>Number of vcores that can be allocated
        for containers. This is used by the RM scheduler when allocating
        resources for containers. This is not used to limit the number of
        CPUs used by YARN containers. If it is set to -1 and
        yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
        automatically determined from the hardware in case of Windows and Linux.
        In other cases, number of vcores is 8 by default.</description>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>4</value>
</property>

<property>
    <!-- 容器最小内存，默认1G -->
    <description>The minimum allocation for every container request at the RM
        in MBs. Memory requests lower than this will be set to the value of this
        property. Additionally, a node manager that is configured to have less memory
        than this value will be shut down by the resource manager.</description>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
</property>

<property>
    <!-- 容器最大内存，默认8G -->
    <!-- 此处我们的服务器只有4G内存，根据前面分析，每台服务器要启动3个容器，所以容器最大内存可以修改为 2G -->
    <description>The maximum allocation for every container request at the RM
        in MBs. Memory requests higher than this will throw an
        InvalidResourceRequestException.</description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
</property>

<property>
    <!-- 容器最小CPU核数，默认1个 -->
    <description>The minimum allocation for every container request at the RM
        in terms of virtual CPU cores. Requests lower than this will be set to the
        value of this property. Additionally, a node manager that is configured to
        have fewer virtual cores than this value will be shut down by the resource
        manager.</description>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
</property>

<property>
    <!-- 容器最大CPU核数，默认值4 -->
    <!-- 此处我们的服务器是4核，根据前面分析每台服务器要启动3个容器，所以容器最大CPU核数设置为2个 -->
    <description>The maximum allocation for every container request at the RM
        in terms of virtual CPU cores. Requests higher than this will throw an
        InvalidResourceRequestException.</description>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>2</value>
</property>

<property>
    <!-- 虚拟内存检测，默认打开 -->
    <!-- 如果是 CentOS 7 + JDK 8，建议关闭该检测 -->
    <description>Whether virtual memory limits will be enforced for
        containers.</description>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

<property>
	<!-- 虚拟内存和物理内存比例（用作虚拟内存检测的限制），默认值2.1 -->
    <description>Ratio between virtual memory to physical memory when
        setting memory limits for containers. Container allocations are
        expressed in terms of physical memory, and virtual memory usage
        is allowed to exceed this allocation by this ratio.
    </description>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>2.1</value>
</property>

4.5.2、容量调度器多队列提交案例

容量调度器一般在中小型公司中使用。

4.5.2.1、多队列配置

生产环境配置调度器多队列：

调度器默认只有1个default队列，不能满足生产要求
可以按照框架区分多队列，例如创建：hive队列、spark队列、flink队列等，每个框架的任务放入指定的队列（企业里面很少使用这中方式）
可以按照业务模块区分多队列，例如创建：登录模块队列、购物车队列、下单队列、部门1队列、部门2队列等

创建多队列的优势：

防止程序中出现递归死循环等代码，把所有的资源全部耗尽
实现任务的降级使用，特殊时期可以保证重要的任务队列资源充足

例如：default队列占总内存 40% ，最大资源容量占总资源 60%（即不够的时候最多可以向其他队列借到60%）；hive队列占总内存60%，最大资源容量占总资源 80%（即不够的时候最多可以向其他队列借到80%）。配置队列的优先级。

配置多队列的容量调度器，在 $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml配置文件中配置。

在 capacity-scheduler.xml中修改配置：

<!-- yarn.scheduler.capacity.root.queues前面的配置项保持默认即可  -->

<property>
    <!-- 为容量调度器root指定多队列，默认值default -->
    <!-- 配置为 default,hive，即增加一个hive队列 -->
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,hive</value>
    <description>
        The queues at the this level (root is the root queue).
    </description>
</property>

<property>
    <!-- root调度器下的default队列的内存容量，默认100% -->
    <!-- 根据前面的需求，调整为 40% -->
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>40</value>
    <description>Default queue target capacity.</description>
</property>

<property>
    <!-- hive队列的内存容量，默认没有该队列，需要增加 -->
    <!-- 根据前面的需求，调整为 60% -->
    <name>yarn.scheduler.capacity.root.hive.capacity</name>
    <value>40</value>
    <description>Default queue target capacity.</description>
</property>

<property>
    <!-- default队列中，单个用户最多占用的资源比例，默认1（即可以占用完default队列的所有资源） -->
    <!-- 可以根据实际需求进行调整，防止某一个用户的死循环等操作将整个队列资源全部耗尽 -->
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
        Default queue user limit a percentage from 0.0 to 1.0.
    </description>
</property>

<property>
    <!-- hive队列中，单个用户最多占用的资源比例。默认没有该队列，需要自行添加 -->
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>
        Default queue user limit a percentage from 0.0 to 1.0.
    </description>
</property>

<property>
    <!-- default队列，最大可以占用的资源容量，默认100% -->
    <!-- 根据前面的需求，调整为60%（default队列的资源容量为40%，所以最大可以再向其他队列借调20%） -->
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>60</value>
    <description>
        The maximum capacity of the default queue. 
    </description>
</property>
<property>
    <!-- hive队列，最大可以占用的资源容量，默认没有该队列，需要自行添加-->
    <name>yarn.scheduler.capacity.root.hive.maximum-capacity</name>
    <value>80</value>
    <description>
        The maximum capacity of the default queue. 
    </description>
</property>

<property>
    <!-- default队列的状态，默认值RUNNING启动，不需要修改 -->
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
    <description>
        The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
</property>

<property>
    <!-- hive队列的状态，默认没有该项，需要自行添加 -->
    <name>yarn.scheduler.capacity.root.hive.state</name>
    <value>RUNNING</value>
    <description>
        The state of the default queue. State can be one of RUNNING or STOPPED.
    </description>
</property>

<property>
    <!-- default队列任务提交的acl权限，默认*（即所有用户都可以向该队列进行提交），不需要调整 -->
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
    <description>
        The ACL of who can submit jobs to the default queue.
    </description>
</property>

<property>
    <!-- hive队列任务提交的acl权限，默认没有该队列，需要自行添加 -->
    <name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name>
    <value>*</value>
    <description>
        The ACL of who can submit jobs to the default queue.
    </description>
</property>

<property>
    <!-- default队列操作管理的acl权限，默认*（即所有用户都可以对队列任务进行kill等操作），不需要调整 -->
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>
        The ACL of who can administer jobs on the default queue.
    </description>
</property>

<property>
    <!-- hive队列操作管理的acl权限，默认没有该队列，需要自行添加 -->
    <name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name>
    <value>*</value>
    <description>
        The ACL of who can administer jobs on the default queue.
    </description>
</property>

<property>
    <!-- default队列的提交任务优先级设置的acl权限，默认*（即所有用户都可以设置队列中的优先级），不需要调整 -->
    <name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
    <value>*</value>
    <description>
        The ACL of who can submit applications with configured priority.
        For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
</property>

<property>
    <!-- hive队列的提交任务优先级设置的acl权限，默认没有该队列，需要自行添加 -->
    <name>yarn.scheduler.capacity.root.hive.acl_application_max_priority</name>
    <value>*</value>
    <description>
        The ACL of who can submit applications with configured priority.
        For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
</property>

<property>
    <!-- default队列的application能够指定的最大超时时间 -->
    <!-- 如果application指定了超时时间，则提交到该队列的application能够指定的最大超时时间不能超过该值 -->
    <!-- 任务的超时时间设置：yarn application -appId <app_id> -updateLifetime <Timeout>  -->
    <!-- 任务执行时间如果超过了指定的超时时间，将会被kill掉 -->
    <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
    </name>
    <value>-1</value>
    <description>
        Maximum lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        This will be a hard time limit for all applications in this
        queue. If positive value is configured then any application submitted
        to this queue will be killed after exceeds the configured lifetime.
        User can also specify lifetime per application basis in
        application submission context. But user lifetime will be
        overridden if it exceeds queue maximum lifetime. It is point-in-time
        configuration.
        Note : Configuring too low value will result in killing application
        sooner. This feature is applicable only for leaf queue.
    </description>
</property>

<property>
    <!-- 默认没有hive队列，需要自行添加 -->
    <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
    </name>
    <value>-1</value>
    <description>
        Maximum lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        This will be a hard time limit for all applications in this
        queue. If positive value is configured then any application submitted
        to this queue will be killed after exceeds the configured lifetime.
        User can also specify lifetime per application basis in
        application submission context. But user lifetime will be
        overridden if it exceeds queue maximum lifetime. It is point-in-time
        configuration.
        Note : Configuring too low value will result in killing application
        sooner. This feature is applicable only for leaf queue.
    </description>
</property>

<property>
    <!-- default队列，如果没有为application指定超时时间，则使用 default-application-lifetime 作为默认值 -->
    <name>yarn.scheduler.capacity.root.default.default-application-lifetime
    </name>
    <value>-1</value>
    <description>
        Default lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        If the user has not submitted application with lifetime value then this
        value will be taken. It is point-in-time configuration.
        Note : Default lifetime can't exceed maximum lifetime. This feature is
        applicable only for leaf queue.
    </description>
</property>

<property>
    <!-- 默认没有hive队列，需要自行添加 -->
    <name>yarn.scheduler.capacity.root.default.default-application-lifetime
    </name>
    <value>-1</value>
    <description>
        Default lifetime of an application which is submitted to a queue
        in seconds. Any value less than or equal to zero will be considered as
        disabled.
        If the user has not submitted application with lifetime value then this
        value will be taken. It is point-in-time configuration.
        Note : Default lifetime can't exceed maximum lifetime. This feature is
        applicable only for leaf queue.
    </description>
</property>

<!-- 后面的配置和容量调度器root没有关系，保持默认即可 -->

启动wordcount程序，指定提交的队列：

# 指定提交到hive队列
# -D 运行时改变参数值
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.3.jar wordcount -D mapreduce.job.queuename=hive /input /output

如果运行的是自己编写的 java 程序，也可以通过configuration对象指定提交的队列：

// 指定提交到hive队列
configuration.set("mapreduce.job.queuename", "hive");

4.5.2.1、任务优先级配置

容量调度器支持任务优先级的配置，在资源紧张时，优先级高的任务将优先获取资源。

默认情况下，Yarn将所有任务的优先级限制为0，若想使用任务的优先级功能，必须开放该限制。

修改yarn-site.xml，增加参数：

<property>
    <!-- 设置Yarn的任务优先级，默认值0 -->
    <!-- 设置5，表示我们可以有5个优先级：0/1/2/3/4/5，数字越大优先级越高 -->
    <name>yarn.cluster.max-application-priority</name>
    <value>5</value>
</property>

当集群中资源不足出现排队时，可以通过调整任务的优先级达到优先执行的目的：

# 在任务启动时就指定任务的优先级
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.3.jar wordcount -D mapreduce.job.priority=5 /input /output

# 也可以通过命令修改正在执行的任务的优先级
yarn application -appID <app_id> -updatePriority 5

4.5.3、公平调度器案例

需求：除了默认的default队列，再创建两个队列，分别是 test 和 tengyer（用户所属组）。希望实现以下效果：若用户提交任务时指定队列，则任务提交到指定队列运行；若未指定队列，test用户提交的任务到 root.group.test 队列运行，tengyer用户提交的任务到 root.group.tengyer队列运行（group为用户所属组）。

公平调度器的配置涉及两个文件：一个是 yarn-site.xml，一个是公平调度器队列分配文件 fair-scheduler.xml（文件名可自定义）。

公平调度器在中大型公司中被广泛使用。

修改yarn-site.xml，进行以下配置：

<property>
    <!-- 使用公平调度器 -->
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
    <!-- 声明公平调度器队列分配的配置文件 -->
    <name>yarn.scheduler.fair.allocation.file</name>
    <value>/opt/module/hadoop-3.2.3/etc/hadoop/fair-scheduler.xml</value>
</property>

<property>
    <!-- 禁止队列间资源抢占 -->
    <name>yarn.scheduler.fair.preemption</name>
    <value>false</value>
</property>

配置 fair-scheduler.xml：

<?xml version="1.0"?>
<allocations>
    <!-- 单个队列中 Application Master占用资源的最大比例，取值 0-1，企业一般配置 0.1 -->
	<queueMaxAMShareDefault>0.5</queueMaxAMShareDefault>
    <!-- 单个队列最大资源的默认值 test tenger default -->
    <queueMaxResourcesDefault>4096mb,4vcores</queueMaxResourcesDefault>
    
    <!-- 增加一个队列test -->
    <queue name="test">
        <!-- 队列最小资源 -->
        <minResources>2048mb,2vcores</minResources>
        <!-- 队列最大资源 -->
        <maxResources>4096mb,4vcores</maxResources>
        <!-- 队列中最多同时运行的应用数，默认50，根据线程数配置 -->
        <maxRunningApps>4</maxRunningApps>
        <!-- 队列中 Application Master 占用资源的最大比例 -->
        <maxAMShare>0.5</maxAMShare>
        <!-- 该队列资源权重，默认值1.0 -->
        <weight>1.0</weight>
        <!-- 队列内部的资源分配策略 -->
        <schedulingPolicy>fair</schedulingPolicy>
    </queue>
    
    <!-- 增加一个队列tengyer -->
    <!-- 当type设置为parent时，它会成为父队列 -->
    <queue name="tengyer" type="parent">
        <!-- 队列最小资源 -->
        <minResources>2048mb,2vcores</minResources>
        <!-- 队列最大资源 -->
        <maxResources>4096mb,4vcores</maxResources>
        <!-- 队列中最多同时运行的应用数，默认50，根据线程数配置 -->
        <maxRunningApps>4</maxRunningApps>
        <!-- 队列中 Application Master, maxAMShare只能用于叶子队列，不能用于父队列。所以此处不能配置maxAMShare，否则ResourceMananger启动不了。 -->
        <!--<maxAMShare>0.5</maxAMShare>-->
        <!-- 该队列资源权重，默认值1.0 -->
        <weight>1.0</weight>
        <!-- 队列内部的资源分配策略 -->
        <schedulingPolicy>fair</schedulingPolicy>
    </queue>
    
    <!-- 任务队列分配策略，可配置多层规则，从第一个规则开始匹配，直到匹配成功 -->
    <queuePlacementPolicy>
        <!-- 任务队列分配策略，如果未指定提交队列，则继续匹配下一个规则；false表示：如果指定队列不存在，不允许自动创建 -->
        <rule name="specified" create="false" />
        <!-- 提交到 root.group.username 队列，若 root.group 不存在，不允许自动创建；若 root.group.user 不存在，允许自动创建 -->
        <rule name="nestedUserQueue" create="true">
            <rule name="primaryGroup" create="false" />
        </rule>
        <!-- 最后一个规则必须是reject或者default。reject表示如果前面的条件都不满足，则拒绝创建队列，提交失败。default表示把任务提交到default队列 -->
        <!-- 或者配置成默认： name="default" queue="指定一个默认队列" -->
        <rule name="reject" />        
    </queuePlacementPolicy>
</allocations>

根据配置的分配策略，如果指定了队列，那么会到指定的队列中执行。但是如果指定的队列不存在，则不允许创建队列。<rule name="specified" create="false" />

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.3.jar wordcount -D mapreduce.job.queuename=root.test /input /output

根据前面配置的策略，如果不指定队列，则提交到对应用户名称的队列，即root.tengyer.tengyer队列（root.用户组.用户名）：<rule name="nestedUserQueue" create="true"><rule name="primaryGroup" create="false" /></rule>