基础——Spark Job Scheduling

最新推荐文章于 2022-04-12 13:51:00 发布

hallao0

最新推荐文章于 2022-04-12 13:51:00 发布

阅读量361

点赞数

分类专栏：基础知识文章标签： spark

基础知识专栏收录该内容

22 篇文章 0 订阅

订阅专栏

official link
本文基本上是官网的翻译

Across Applications调度策略

用户提交请求以后，由cluster manager分配资源

静态划分static partitioning

每个application被分配一个最大可使用资源
1. Standalone
2. Mesos
3. YARN
YARN中内存CPU调度和隔离
 YARN常见问题
 YARN开启Lable Scheduler
YARN capacity scheduler
YARN capacity scheduler概念和配置

动态划分dynamic partitioning

Mesos中可以动态的分享CPU资源：当Application1不跑任务的时候，别的Application可以使用闲置的cpu

Within Application调度策略

这一层其实包括两种level的调度，一种是job scheduling; 一种是task scheduling.
1. job scheduling. job 的概念：一个Application包含多个jobs, job是由Spark action产生。
action 包括：reduce, count, collect, first, take, saveAsTextFile….
2. task scheduling : Task scheduling源码解析
 调度策略

调度策略：

FIFO & FAIR

默认调度策略为FIFO。依次执行。
FAIR：
hadoop公平调度器
1. Mode : FIFO/FAIR
2. Weight: pool的权重
3. Minshare：每个pool会给一个最小共享量（比如cpu cores），保证每个pool都会得到一定的资源，不管优先级怎样。

设置：

The pool properties can be set by creating an XML file, similar to conf/fairscheduler.xml.template, and setting a spark.scheduler.allocation.file property in your SparkConf.

conf.set("spark.scheduler.allocation.file", "/path/to/file")

The format of the XML file is simply a element for each pool, with different elements within it for the various settings. For example:

<?xml version="1.0"?>
<allocations>
  <pool name="production">
    <schedulingMode>FAIR</schedulingMode>
    <weight>1</weight>
    <minShare>2</minShare>
  </pool>
  <pool name="test">
    <schedulingMode>FIFO</schedulingMode>
    <weight>2</weight>
    <minShare>3</minShare>
  </pool>
</allocations>

A full example is also available in conf/fairscheduler.xml.template. Note that any pools not configured in the XML file will simply get default values for all settings (scheduling mode FIFO, weight 1, and minShare 0).