TaskScheduler
是抽象类,目前Spark仅提供了TaskSchedulerImpl
一种实现;其初始化是在SparkContext
中
private[spark] class TaskSchedulerImpl(
val sc: SparkContext,
val maxTaskFailures: Int,
isLocal: Boolean = false)
extends TaskScheduler with Logging
TaskScheduler
实际是SchedulerBackend
的代理,本身处理一些通用逻辑,如不同Job间的调度顺序,将运行缓慢的task在空闲节点上重新提交(speculation
)等
// SparkContext调用TaskSchedulerImpl.initialize方法,传入SchedulerBackend对象
def initialize(backend: SchedulerBackend) {
this.backend = backend
// temporarily set rootPool name to empty
rootPool = new Pool("", schedulingMode, 0, 0)
schedulableBuilder = {
schedulingMode match {
case SchedulingMode.FIFO =>
new FIFOSchedulableBuilder(rootPool)
case SchedulingMode.FAIR =>
new FairSchedulableBuilder(rootPool, conf)
}
}
schedulableBuilder.buildPools()
}
Pool
用于调度TaskManager
,实际上Pool
和TaskManager
都继承了Schedulable
特征,因此Pool
可以包含TaskManager
或其他Pool
;
Spark
默认使用FIFO(First In,First Out)
调度模式,另外还有FAIR
模式。FIFO
模式只有一个pool
,FAIR
模式有多个pool
。Pool
也分FIFO
和FAIR
两种模式,两种模式分别对应于FairSchedulableBuilder
和FIFOSchedulableBuilder
SparkContext
根据master
参数决定采用何种SchedulerBackend
,以Spark Standalone
模式为例,使用的是SparkDeploySchedulerBackend
,继承CoarseGrainedSchedulerBackend
父类
private[spark] class SparkDeploySchedulerBackend(
scheduler: TaskSchedulerImpl,
sc: SparkContext,
masters: Array[String])
extends CoarseGrainedSchedulerBackend(scheduler, sc.env.rpcEnv)
with AppClientListener
with Logging {
// SparkContext调用TaskSchedulerImpl.start方法
override def start() {
backend.start()
// 判断speculation是否开启,如是,则启动线程将运行缓慢的任务在空闲的资源上重新提交
if (!isLocal && conf.getBoolean("spark.speculation", false)) {
logInfo("Starting speculative execution thread")
sc.env.actorSystem.scheduler.schedule(SPECULATION_INTERVAL_MS milliseconds, SPECULATION_INTERVAL_MS milliseconds) {
Utils.tryOrStopSparkContext(sc) { checkSpeculatableTasks() }
}(sc.env.actorSystem.dispatcher)
}
}
SparkDeploySchedulerBackend.start
方法中初始化了AppClient
对象,主要用于Driver
和Master
的Akka
通信交互信息、注册Spark Application
等
// SparkDeploySchedulerBackend.start()
override def start() {
super.start()
...
val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
val javaOpts = sparkJavaOpts ++ extraJavaOpts
// 将CoarseGrainedExecutorBackend封装入Command
val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend", args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)
val appUIAddress = sc.ui.map(_.appUIAddress).getOrElse("")
val coresPerExecutor = conf.getOption("spark.executor.cores").map(_.toInt)
val appDesc = new ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command, appUIAddress, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor)
// 初始化AppClient对象
cl