official link
本文基本上是官网的翻译
Across Applications调度策略
用户提交请求以后,由cluster manager分配资源
静态划分static partitioning
每个application被分配一个最大可使用资源
1. Standalone
2. Mesos
3. YARN
YARN中内存CPU调度和隔离
YARN常见问题
YARN开启Lable Scheduler
YARN capacity scheduler
YARN capacity scheduler概念和配置
动态划分dynamic partitioning
Mesos中可以动态的分享CPU资源: 当Application1不跑任务的时候,别的Application可以使用闲置的cpu
Within Application调度策略
这一层其实包括两种level的调度,一种是job scheduling; 一种是task scheduling.
1. job scheduling. job 的概念: 一个Application包含多个jobs, job是由Spark action产生。
action 包括:reduce, count, collect, first, take, saveAsTextFile….
2. task scheduling : Task scheduling源码解析
调度策略
调度策略:
默认调度策略为FIFO。依次执行。
FAIR:
hadoop公平调度器
1. Mode : FIFO/FAIR
2. Weight: pool的权重
3. Minshare: 每个pool会给一个最小共享量(比如cpu cores),保证每个pool都会得到一定的资源,不管优先级怎样。
设置:
The pool properties can be set by creating an XML file, similar to conf/fairscheduler.xml.template, and setting a spark.scheduler.allocation.file property in your SparkConf.
conf.set("spark.scheduler.allocation.file", "/path/to/file")
The format of the XML file is simply a element for each pool, with different elements within it for the various settings. For example:
<?xml version="1.0"?>
<allocations>
<pool name="production">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>2</minShare>
</pool>
<pool name="test">
<schedulingMode>FIFO</schedulingMode>
<weight>2</weight>
<minShare>3</minShare>
</pool>
</allocations>
A full example is also available in conf/fairscheduler.xml.template. Note that any pools not configured in the XML file will simply get default values for all settings (scheduling mode FIFO, weight 1, and minShare 0).