一、FIFO调度器
单队列工作模式
二、容量调度器
1. 架构图
2. 特点
(1) 多队列工作方式;
(2) 当某一个队列的资源不够用(超过额定资源,没有达到最大资源)时,可以向其他队列借用资源。借用的资源+额定资源 <= 最大资源;
(3) 当被借取资源的队列来了新的Job后而导致它资源紧张后,该队列会立刻回收借出去的资源,如果此时其他队列正在使用这些资源进行计算,会将进行计算的进程kill掉,拿回资源。
3. 配置步骤
(1) 来到配置文件所在的目录中,打开配置文件
cd $HADOOP_HOME/etc/hadoop
vim capacity-scheduler.xml
(2) 修改默认配置
<!--
全局配置:yarn资源调度器能够缓存并且运行的最大Job数
-->
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
<description>
Maximum number of applications that can be pending and running.
</description>
</property>
<!--
全局配置:跑AppMaster可以最多使用的资源占比,用来控制可以并行跑多少个AppMaster(多少个Job可以并行提交执行)
-->
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description>
</property>
<!--
全局配置:使用哪一个实现类来计算Container中的计算机资源
DefaultResourceCalculator: 只算内存,不计算CPU
DominantResourceCalculator: 既计算内存,也计算CPU
-->
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
<description>
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
</description>
</property>
<!--
全局配置:root队列下,含有哪些队列,队列和队列直接用逗号分割(root队列是根队列)
注意:如果要添加自定义队列,必须先在这里指定
-->
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,hello</value>
<description>
The queues at the this level (root is the root queue).
</description>
</property>
<!-- 默认队列:default队列的配置 -->
<!-- ********************************default********************************************** -->
<!--
root下的default队列的额定资源占比
-->
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>40</value>
<description>Default queue target capacity.</description>
</property>
<!--
向root队列下的default队列提交Job的用户最多可以使用多少占比的队列资源,使用小数表示
-->
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<!--
root队列下default队列的最大资源占比
-->
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>60</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<!--
root队列下的default队列是否启用
RUNNING -> 启用
STOPPED -> 禁用
-->
<property>
<name>yarn.scheduler.capacity.root.default.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<!--
哪些用户可以向root队列下的default队列提交任务
-->
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<!--
哪些用户可以管理(kill、...)向root队列下的defalut队列提交的Job
-->
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<!--
哪些用户可以在向root队列下的default队列提交Job的时候指定优先级
-->
<property>
<name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
<value>*</value>
<description>
The ACL of who can submit applications with configured priority.
For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
</description>
</property>
<!--
root队列下default队列中Job的最长执行时间(类似于NameNode判断DataNode不可用所需的时间)
-->
<property>
<name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
</name>
<value>-1</value>
<description>
Maximum lifetime of an application which is submitted to a queue
in seconds. Any value less than or equal to zero will be considered as
disabled.
This will be a hard time limit for all applications in this
queue. If positive value is configured then any application submitted
to this queue will be killed after exceeds the configured lifetime.
User can also specify lifetime per application basis in
application submission context. But user lifetime will be
overridden if it exceeds queue maximum lifetime. It is point-in-time
configuration.
Note : Configuring too low value will result in killing application
sooner. This feature is applicable only for leaf queue.
</description>
</property>
<!--
root队列下default队列中Job的默认执行时间(类似于每隔3秒,DataNode会向NameNode通信),
如果超过该时间Job还没有执行完毕,不会丢掉该Job,只有当实际执行时间超过了
yarn.scheduler.capacity.root.default.maximum-application-lifetime后,才会丢弃掉该Job
-->
<property>
<name>yarn.scheduler.capacity.root.default.default-application-lifetime
</name>
<value>-1</value>
<description>
Default lifetime of an application which is submitted to a queue
in seconds. Any value less than or equal to zero will be considered as
disabled.
If the user has not submitted application with lifetime value then this
value will be taken. It is point-in-time configuration.
Note : Default lifetime can't exceed maximum lifetime. This feature is
applicable only for leaf queue.
</description>
</property>
<!-- ********************************default********************************************** -->
(3) 添加自定义hello队列的配置
<!-- ********************************your queue define************************************ -->
<!--
root下的hello队列的额定资源占比
-->
<property>
<name>yarn.scheduler.capacity.root.hello.capacity</name>
<value>60</value>
</property>
<!--
向root队列下的hello队列提交Job的用户最多可以使用多少占比的队列资源,使用小数表示
-->
<property>
<name>yarn.scheduler.capacity.root.hello.user-limit-factor</name>
<value>1</value>
</property>
<!--
root队列下hello队列的最大资源占比
-->
<property>
<name>yarn.scheduler.capacity.root.hello.maximum-capacity</name>
<value>80</value>
</property>
<!--
root队列下的hello队列是否启用
RUNNING -> 启用
STOPPED -> 禁用
-->
<property>
<name>yarn.scheduler.capacity.root.hello.state</name>
<value>RUNNING</value>
</property>
<!--
哪些用户可以向root队列下的hello队列提交任务
-->
<property>
<name>yarn.scheduler.capacity.root.hello.acl_submit_applications</name>
<value>*</value>
</property>
<!--
哪些用户可以管理(kill、...)向root队列下的hello队列提交的Job
-->
<property>
<name>yarn.scheduler.capacity.root.hello.acl_administer_queue</name>
<value>*</value>
</property>
<!--
哪些用户可以在向root队列下的hello队列提交Job的时候指定优先级
-->
<property>
<name>yarn.scheduler.capacity.root.hello.acl_application_max_priority</name>
<value>*</value>
</property>
<!--
root队列下hello队列中Job的最长执行时间(类似于NameNode判断DataNode不可用所需的时间)
-->
<property>
<name>yarn.scheduler.capacity.root.hello.maximum-application-lifetime
</name>
<value>-1</value>
</property>
<!--
root队列下hello队列中Job的默认执行时间(类似于每隔3秒,DataNode会向NameNode通信),
如果超过该时间Job还没有执行完毕,不会丢掉该Job,只有当实际执行时间超过了
yarn.scheduler.capacity.root.hello.maximum-application-lifetime后,才会丢弃掉该Job
-->
<property>
<name>yarn.scheduler.capacity.root.hello.default-application-lifetime
</name>
<value>-1</value>
</property>
<!-- ********************************your queue define************************************ -->
(4) 分发配置文件
xrsync.sh capacity-scheduler.xml
(5) 在MR中指定提交hello队列
conf.set("mapreduce.job.queuename","hello");