【Hadoop】Yarn Scheduler 调度器简述

Hadoop Yarn Scheduler 调度器简述

一、综述

  1. 调度器职能
  • 调度程序负责将资源分配给正在运行的程序,遵循约束(容量、队列等)
  • 纯调度,不进行监控、跟踪
  • 不能保证重启失败的任务(程序故障、硬件故障)
  • 会基于应用程序的资源需求进行调度
  • 抽象了容器的概念 Container
    • 原文 it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc.
    • 通过这种抽象容器来分配资源 memory, cpu, disk, network etc.
  1. 调度器有以下几种类型:
  • Capacity Scheduler
    • 基于容量分配的调度器
    • https://hadoop.apache.org/docs/r3.3.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
    • org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
  • Fair Scheduler
    • 基于权重分配的调度器
    • https://hadoop.apache.org/docs/r3.3.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
    • org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
  • Fifo Scheduler
    • 先进先出调度器,基本不使用
    • org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler

二、Capacity Scheduler

  1. 说明
  • 基于容量调度(百分比)
  • The CapacityScheduler is designed to run Hadoop applications as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster
  1. 配置示例
  • conf/yarn-site.xml
<!-- Yarn 使用 CapacityScheduler -->
<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
  • etc/hadoop/capacity-scheduler.xml
<!-- yarn.scheduler.capacity.<queue-path>.queues -->
<property>
  <name>yarn.scheduler.capacity.root.queues</name>
  <value>a,b</value>
</property>

<property>
  <name>yarn.scheduler.capacity.root.a.queues</name>
  <value>a1,a2</value>
</property>

<property>
  <name>yarn.scheduler.capacity.root.b.queues</name>
  <value>b1,b2,b3</value>
</property>


<!-- Queue Properties -->
<!-- Resource Allocation -->
<!-- yarn.scheduler.capacity.<queue-path>.capacity -->
<property>
  <name>yarn.scheduler.capacity.root.a.capacity</name>
  <value>60</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.b.capacity</name>
  <value>40</value>
</property>
<!-- yarn.scheduler.capacity.<queue-path>.maximum-capacity ; setting this value to -1 sets maximum capacity to 100% -->
<property>
  <name>yarn.scheduler.capacity.root.a.maximum-capacity</name>
  <value>-1</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.b.maximum-capacity</name>
  <value>70</value>
</property>

  1. 更新配置
yarn rmadmin -refreshQueues
  1. Web UI
  • 使用 Yarn RM Web UI 查看
ResourceManager	http://rm_host:port/	Default HTTP port is 8088.

三、Fair Scheduler

  1. 说明
  • 基于权重调取
  • Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time
  1. 配置示例
  • etc/hadoop/yarn-site.xml
<!-- Scheduler Config -->
<!-- Yarn 使用 FairScheduler -->
<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<!-- FairScheduler Properties -->
<!-- FairScheduler 分配配置文件; Defaults to fair-scheduler.xml -->
<property>
  <name>yarn.scheduler.fair.allocation.file</name>
  <value>fair-scheduler.xml</value>
</property>
<!-- 当前的用户名就作为默认队列名;默认:true -->
<property>
  <name>yarn.scheduler.fair.user-as-default-queue</name>
  <value>true</value>
</property>
<!-- 是否允许抢占;默认:false -->
<property>
  <name>yarn.scheduler.fair.preemption</name>
  <value>true</value>
</property>
<!-- 允许抢占全部资源的比例 0-1 浮点数;默认:0.8f -->
<property>
  <name>yarn.scheduler.fair.preemption.cluster-utilization-threshold</name>
  <value>0.9f</value>
</property>
<!-- 未在 allocation.file 声明的队列是否自动创建;false 就会让其走默认队列;默认:true -->
<property>
  <name>yarn.scheduler.fair.allow-undeclared-pools</name>
  <value>false</value>
</property>
  • allocation file (Defaults: $HADOOP_YARN_HOME/fair-scheduler.xml)
<?xml version="1.0"?>
<allocations>
  <!-- 队列声明及设置 -->
  
  <queueMaxAMShareDefault>0.5</queueMaxAMShareDefault>
  <queueMaxResourcesDefault>3072 mb,2vcores</queueMaxResourcesDefault>

  <!-- root 队列组 -->
  <queue name="root">
    <minResources>1024mb,1vcores</minResources>
    <maxResources>3072mb,2vcores</maxResources>
    <maxRunningApps>20</maxRunningApps>
    <maxAMShare>0.1</maxAMShare>
    <weight>10.0</weight>
    <schedulingPolicy>fair</schedulingPolicy>
    
    <!-- root.default 默认队列 -->
    <queue name="default">
      <maxRunningApps>10</maxRunningApps>
      <aclSubmitApps>*</aclSubmitApps>
      <weight>2.0</weight>
      <minResources>1024mb,1vcores</minResources>
      <maxResources>3072mb,1vcores</maxResources>
    </queue>
    
    <!-- root.visitor 游客队列 -->
    <queue name="visitor">
      <maxRunningApps>5</maxRunningApps>
      <weight>1.0</weight>
      <minResources>512mb,1vcores</minResources>
      <maxResources>2024mb,1vcores</maxResources>
    </queue>
  </queue>

  <!-- 用户配置 -->
  <user name="root">
    <maxRunningApps>20</maxRunningApps>
  </user>
  <userMaxAppsDefault>5</userMaxAppsDefault>

  <!-- 队列放置配置 -->
  <queuePlacementPolicy>
    <!-- 指定队列时,是否自动创建不存在的队列 -->
    <rule name="specified" create="false" />
    <!-- 指定队列组时,是否自动创建不存在的队列 -->
    <rule name="primaryGroup" create="false" />
    <!-- 默认队列 -->
    <rule name="default" queue="root.default"/>
  </queuePlacementPolicy>
</allocations>
  1. 更新配置
yarn rmadmin -refreshQueues
  1. Web UI
  • 使用 Yarn RM Web UI 查看
ResourceManager	http://rm_host:port/	Default HTTP port is 8088.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值