YARN调度器【capacity-scheduler.xml】默认配置

最新推荐文章于 2024-08-18 17:49:50 发布

小基基o_O

最新推荐文章于 2024-08-18 17:49:50 发布

阅读量1.3k

点赞数 2

分类专栏： Hadoop 文章标签： yarn 队列资源调度器

本文链接：https://blog.csdn.net/Yellow_python/article/details/116021592

版权

Hadoop 专栏收录该内容

24 篇文章 3 订阅

订阅专栏

Hadoop版本：3.1.3

1、三种常见调度器
2、容量调度器多队列配置
3、单词
4、默认配置【capacity-scheduler.xml】

1、三种常见调度器

1.1、先进先出调度器

first-in first-out scheduler
FIFO Scheduler
后入队的任务要等待前入队的任务出队

可配置：

1、每个用户的最大资源占比，防止单个用户把资源占满
2、限制哪些用户可以提交应用到本队列

1.2、容量调度器

Capacity Scheduler
相当于多个 FIFO Scheduler
不同队列上的任务可以并行（比如 3个队列就可以并行3个任务）
相同队列上的任务不能并行

可配置：

1、默认容量占比：各个队列占据一定百分比的资源
（如：a队列40% b队列60%）
2、最大容量占比：队列占据的资源百分比的最大值
（如：a队列最大70%，当超出40%时，可以借b队列的空闲资源，最多借30%）

划分方式：
按业务划分（更常用）：下单、支付、物流…
按技术划分：HIVE、Spark、Flink…

1.3、公平调度器

Fair Scheduler
和Capacity Scheduler类似，可以多队列配置；不同的是，叶子队列不是FIFO的
在同一条叶子队列上，所有作业可以并发；
资源分配的依据：时间尺度、优先级、资源缺额…

在时间尺度上获得公平的资源

最大最小公平分配算法

2、容量调度器多队列配置

1、编辑配置文件

vim $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml

2、修改根队列下面的叶队列名称，逗号分隔（此处新增队列名称为hive）

<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default,hive</value>
    <description>在根队列下设置叶队列名称</description>
</property>

3、修改名为default的队列的容量占比

<property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>40</value>
    <description>设置root下名为default队列的容量占比</description>
</property>

4、给新队列配置

<property>
    <name>yarn.scheduler.capacity.root.hive.capacity</name>
    <value>60</value>
    <description>root下名为hive的队列 的 容量占比</description>
</property>
<property>
    <name>yarn.scheduler.capacity.root.hive.user-limit-factor</name>
    <value>1</value>
    <description>每个用户可以占据该队列资源占比的上限（防止某用户把资源占满）</description>
</property>
<property>
    <name>yarn.scheduler.capacity.root.hive.maximum-capacity</name>
    <value>80</value>
    <description>该队列的最大容量占比（可外借80-60=20）</description>
</property>
<property>
    <name>yarn.scheduler.capacity.root.hive.state</name>
    <value>RUNNING</value>
    <description>该队列状态（RUNNING或STOPPED）</description>
</property>
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_submit_applications</name>
    <value>*</value>
    <description>访问控制列表：限定哪些用户 能访问该队列</description>
</property>
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_administer_queue</name>
    <value>*</value>
    <description>访问控制列表：限定哪些用户 可以管理该队列上的作业</description>
</property>
<property>
    <name>yarn.scheduler.capacity.root.hive.acl_application_max_priority</name>
    <value>*</value>
</property>
<property>
    <name>yarn.scheduler.capacity.root.hive.maximum-application-lifetime</name>
    <value>-1</value>
    <description>提交到该队列的应用 的 最大生存时间（-1表示无限时间）</description>
</property>
<property>
    <name>yarn.scheduler.capacity.root.hive.default-application-lifetime</name>
    <value>-1</value>
    <description>提交到该队列的应用 的 默认生存时间（-1表示无限时间；要求小于最大生存时间）</description>
</property>

5、分发配置

rsync.py $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml

6、查看浏览器，端口8088

7、提交到指定队列

Java代码的org.apache.hadoop.conf.Configuration

configuration.set("mapred.job.queuename", "hive");

HIVE

set mapred.job.queue.name=hive

3、单词

en	🔉	cn
FIFO	first-in first-out	先入先出
ACL	Access Control Lists	访问控制列表
schedule	ˈskedʒuːl	n. 计划（表）；时间表；v. 预定
scheduler	ˈskedʒuːlər	调度程序；计划员；时间调度员
capacity	kəˈpæsəti	能力；容量；资格
application	ˌæplɪˈkeɪʃn	n. 应用；申请；应用程序
priority	praɪˈɔːrəti	优先权
property	ˈprɑːpərti	性质；财产；所有权

4、默认配置【capacity-scheduler.xml】

<!-- 容量调度器中 挂起和运行的应用程序 的 最大数量 -->
<property>
    <name>yarn.scheduler.capacity.maximum-applications</name>
    <value>10000</value>
    <description>Maximum number of applications that can be pending and running.</description>
</property>

<!-- 可以用来运行【Application Masters】的最大资源占比 -->
<property>
    <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    <value>0.1</value>
    <description>
Maximum percent of resources in the cluster which can be used to run 
application masters i.e. controls number of concurrent running applications.
    </description>
</property>

<!-- 容量调度器中的【资源计算器】，它用来 比较资源，默认比较资源的内存，
另外可以选择别的资源计算器，从资源的多个维度（不仅内存，还有CPU等）来比较 -->
<property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    <description>
The ResourceCalculator implementation to be used to compare Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare 
multi-dimensional resources such as Memory, CPU etc.
    </description>
</property>

<!-- 在 名为root的队列 下 设置队列名称（默认default一条队列，可设置多队列） -->
<property>
    <name>yarn.scheduler.capacity.root.queues</name>
    <value>default</value>
    <description>The queues at the this level (root is the root queue).</description>
</property>

<!-- root下名为default的队列 的 容量占比 -->
<property>
    <name>yarn.scheduler.capacity.root.default.capacity</name>
    <value>100</value>
    <description>Default queue target capacity.</description>
</property>

<!-- 每个用户可以占据该队列资源占比的上限（防止某用户把资源占满） -->
<property>
    <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
    <value>1</value>
    <description>Default queue user limit a percentage from 0.0 to 1.0.</description>
</property>

<!-- 该队列的最大容量占比 -->
<property>
    <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
    <value>100</value>
</property>

<!-- 该队列状态（RUNNING or STOPPED） -->
<property>
    <name>yarn.scheduler.capacity.root.default.state</name>
    <value>RUNNING</value>
</property>

<!-- 访问控制列表：限定哪些用户 能访问该队列 -->
<property>
    <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
    <value>*</value>
</property>

<!-- 访问控制列表：限定哪些用户 可以管理该队列上的作业 -->
<property>
    <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
    <value>*</value>
    <description>The ACL of who can administer jobs on the default queue.</description>
</property>

<property>
    <name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
    <value>*</value>
    <description>
The ACL of who can submit applications with configured priority.
For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    </description>
</property>

<!-- 提交到该队列的应用 的 最大生存时间（-1表示无限时间） -->
<property>
    <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime</name>
    <value>-1</value>
    <description>
Maximum lifetime of an application which is submitted to a queue
in seconds. Any value less than or equal to zero will be considered as disabled.
This will be a hard time limit for all applications in this
queue. If positive value is configured then any application submitted
to this queue will be killed after exceeds the configured lifetime.
User can also specify lifetime per application basis in
application submission context. But user lifetime will be
overridden if it exceeds queue maximum lifetime. It is point-in-time
configuration.
Note : Configuring too low value will result in killing application
sooner. This feature is applicable only for leaf queue.
    </description>
</property>

<!-- 提交到该队列的应用 的 默认生存时间（-1表示无限时间；要求小于最大生存时间） -->
<property>
    <name>yarn.scheduler.capacity.root.default.default-application-lifetime</name>
    <value>-1</value>
    <description>
Default lifetime of an application which is submitted to a queue
in seconds. Any value less than or equal to zero will be considered as
disabled.
If the user has not submitted application with lifetime value then this
value will be taken. It is point-in-time configuration.
Note : Default lifetime can't exceed maximum lifetime. This feature is
applicable only for leaf queue.
    </description>
</property>

<property>
    <name>yarn.scheduler.capacity.node-locality-delay</name>
    <value>40</value>
    <description>
Number of missed scheduling opportunities after which the CapacityScheduler 
attempts to schedule rack-local containers.
When setting this parameter, the size of the cluster should be taken into account.
We use 40 as the default value, which is approximately the number of nodes in one rack.
Note, if this value is -1, the locality constraint in the container request
will be ignored, which disables the delay scheduling.
    </description>
</property>

<property>
    <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
    <value>-1</value>
    <description>
Number of additional missed scheduling opportunities over the node-locality-delay
ones, after which the CapacityScheduler attempts to schedule off-switch containers,
instead of rack-local ones.
Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
after 40+20=60 missed opportunities.
When setting this parameter, the size of the cluster should be taken into account.
We use -1 as the default value, which disables this feature. In this case, the number
of missed opportunities for assigning off-switch containers is calculated based on
the number of containers and unique locations specified in the resource request,
as well as the size of the cluster.
    </description>
</property>

<property>
    <name>yarn.scheduler.capacity.queue-mappings</name>
    <value></value>
    <description>
A list of mappings that will be used to assign jobs to queues
The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
Typically this list will be used to map users to queues,
for example, u:%user:%user maps all users to queues with the same name
as the user.
    </description>
</property>

<property>
    <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    <value>false</value>
    <description>
If a queue mapping is present, will it override the value specified
by the user? This can be used by administrators to place jobs in queues
that are different than the one specified by the user.
The default is false.
    </description>
</property>

<property>
    <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
    <value>1</value>
    <description>
Controls the number of OFF_SWITCH assignments allowed
during a node's heartbeat. Increasing this value can improve
scheduling rate for OFF_SWITCH containers. Lower values reduce
"clumping" of applications on particular nodes. The default is 1.
Legal values are 1-MAX_INT. This config is refreshable.
    </description>
</property>

<property>
    <name>yarn.scheduler.capacity.application.fail-fast</name>
    <value>false</value>
    <description>
Whether RM should fail during recovery if previous applications'
queue is no longer valid.
    </description>
</property>

name	value	description	译文
yarn. scheduler. capacity. maximum- applications	10000	Maximum number of applications that can be pending and running.	容量调度器中挂起和运行的应用程序的最大数量
yarn. scheduler. capacity. maximum- am- resource- percent	0. 1	Maximum percent of resources in the cluster which can be used to run application masters i.e. controls number of concurrent running applications.	可以用来运行【application masters】的最大资源占比
yarn. scheduler. capacity. resource- calculator	org. apache. hadoop. yarn. util. resource. Default Resource Calculator	The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. DefaultResourceCalculator only uses Memory while DominantResourceCalculator uses dominant-resource to compare multi-dimensional resources such as Memory, CPU etc.	容量调度器中的【资源计算器】，它用来比较资源，默认比较资源的内存，另外可以选择别的资源计算器，从资源的多个维度（不单单内存，还有CPU等）来比较
yarn. scheduler. capacity. root. queues	default	The queues at the this level (root is the root queue).	在名为root的队列下设置队列名称（默认default一条队列，可设置多队列）
yarn. scheduler. capacity. root. default. capacity	100	Default queue target capacity.	root下名为default的队列的容量占比
yarn. scheduler. capacity. root. default. user- limit- factor	1	Default queue user limit a percentage from 0.0 to 1.0.	每个用户可以占据该队列资源占比的上限（防止某用户把资源占满）
yarn. scheduler. capacity. root. default. maximum- capacity	100	The maximum capacity of the default queue.	该队列的最大容量占比
yarn. scheduler. capacity. root. default. state	RUNNING	The state of the default queue. State can be one of RUNNING or STOPPED.	该队列状态（RUNNING or STOPPED）
yarn. scheduler. capacity. root. default. acl_ submit_ applications	*	The ACL of who can submit jobs to the default queue.	访问控制列表：限定哪些用户能访问该队列
yarn. scheduler. capacity. root. default. acl_ administer_ queue	*	The ACL of who can administer jobs on the default queue.	访问控制列表：限定哪些用户可以管理该队列上的作业
yarn. scheduler. capacity. root. default. acl_ application_ max_ priority	*	The ACL of who can submit applications with configured priority. For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]	不知所云
yarn. scheduler. capacity. root. default. maximum- application- lifetime	-1	Maximum lifetime of an application which is submitted to a queue in seconds. Any value less than or equal to zero will be considered as disabled. This will be a hard time limit for all applications in this queue. If positive value is configured then any application submitted to this queue will be killed after exceeds the configured lifetime. User can also specify lifetime per application basis in application submission context. But user lifetime will be overridden if it exceeds queue maximum lifetime. It is point-in-time configuration. Note : Configuring too low value will result in killing application sooner. This feature is applicable only for leaf queue.	提交到该队列的应用的最大生存时间（-1表示无限时间）
yarn. scheduler. capacity. root. default. default- application- lifetime	-1	Default lifetime of an application which is submitted to a queue in seconds. Any value less than or equal to zero will be considered as disabled. If the user has not submitted application with lifetime value then this value will be taken. It is point-in-time configuration. Note : Default lifetime can’t exceed maximum lifetime. This feature is applicable only for leaf queue.	提交到该队列的应用的默认生存时间（-1表示无限时间；要求小于最大生存时间）
yarn. scheduler. capacity. node- locality- delay	40	Number of missed scheduling opportunities after which the CapacityScheduler attempts to schedule rack-local containers. When setting this parameter, the size of the cluster should be taken into account. We use 40 as the default value, which is approximately the number of nodes in one rack. Note, if this value is -1, the locality constraint in the container request will be ignored, which disables the delay scheduling.	不知所云
yarn. scheduler. capacity. rack- locality- additional- delay	-1	Number of additional missed scheduling opportunities over the node-locality-delay ones, after which the CapacityScheduler attempts to schedule off-switch containers, instead of rack-local ones. Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will attempt rack-local assignments after 40 missed opportunities, and off-switch assignments after 40+20=60 missed opportunities. When setting this parameter, the size of the cluster should be taken into account. We use -1 as the default value, which disables this feature. In this case, the number of missed opportunities for assigning off-switch containers is calculated based on the number of containers and unique locations specified in the resource request, as well as the size of the cluster.	不知所云
yarn. scheduler. capacity. queue- mappings		A list of mappings that will be used to assign jobs to queues The syntax for this list is [u或g]:[name]:[queue_name][,next mapping]* Typically this list will be used to map users to queues, for example, u:%user:%user maps all users to queues with the same name as the user.	不知所云
yarn. scheduler. capacity. queue- mappings- override. enable	false	If a queue mapping is present, will it override the value specified by the user? This can be used by administrators to place jobs in queues that are different than the one specified by the user. The default is false.	不知所云
yarn. scheduler. capacity. per- node- heartbeat. maximum- offswitch- assignments	1	Controls the number of OFF_SWITCH assignments allowed during a node’s heartbeat. Increasing this value can improve scheduling rate for OFF_SWITCH containers. Lower values reduce “clumping” of applications on particular nodes. The default is 1. Legal values are 1-MAX_INT. This config is refreshable.	不知所云
yarn. scheduler. capacity. application. fail- fast	false	Whether RM should fail during recovery if previous applications’ queue is no longer valid.	不知所云