TaskScheduler是SparkContext重要成员之一,负责任务的提交,并且请求集群管理器对任务调度。他也可以看做任务调度的客户端。
SparkContext 522行 创建TaskScheduler:
val (sched, ts) = SparkContext.createTaskScheduler(this, master)
SparkContext 2592行 为createTaskScheduler具体实现方法:
private def createTaskScheduler(
sc: SparkContext,
master: String): (SchedulerBackend, TaskScheduler) = {
import SparkMasterRegex._
// When running locally, don't try to re-execute tasks on failure.
val MAX_LOCAL_TASK_FAILURES = 1
master match {
case "local" =>
val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
val backend = new LocalBackend(sc.getConf, scheduler, 1)
scheduler.initialize(backend)
(backend, scheduler)
它会根据不同的master 产生不同的行为本文以Local为例子。它会创建TaskSchedulerImpl 并且创建LocalBackend:
构造代码TaskSchedulerImpl 102行:
var dagScheduler: DAGScheduler = null
var backend: SchedulerBackend = null
val mapOutputTracker = SparkEnv.get.mapOutputTracker
var schedulableBuilder: SchedulableBuilder = null
var rootPool: Pool = null
// default scheduler is FIFO
private val schedulingModeConf = conf.get("spark.scheduler.mode", "FIFO")
val schedulingMode: SchedulingMode = try {
SchedulingMode.withName(schedulingModeConf.toUpperCase)
} catch {
case e: java.util.NoSuchElementException =>
throw new SparkException(s"Unrecognized spark.scheduler.mode: $schedulingModeConf")
}
// This is a var so that we can reset it for testing purposes.
private[spark] var taskResultGetter = new TaskResultGetter(sc.env, this)
解析:(1)获取配置信息比如调度模式(FIFO,FAIR)
(2)创建TaskResultGetter 作用是通过线程池对Worker上的Executor发送Task的执行结果进行处理。
TaskScheduleImpl的调度方式有两种,但任务的最终调度都会落到ScheduleBackend的具体实现。
SparkContext 2603行 创建LoaclBackend:
val backend = new LocalBackend(sc.getConf, scheduler, 1)
LoaclBackend比较注意的方法 123行 :
override def start() {
val rpcEnv = SparkEnv.get.rpcEnv
val executorEndpoint = new LocalEndpoint(rpcEnv, userClassPath, scheduler, this, totalCores)
localEndpoint = rpcEnv.setupEndpoint("LocalBackendEndpoint", executorEndpoint)
listenerBus.post(SparkListenerExecutorAdded(
System.currentTimeMillis,
executorEndpoint.localExecutorId,
new ExecutorInfo(executorEndpoint.localExecutorHostname, totalCores, Map.empty)))
launcherBackend.setAppId(appId)
launcherBackend.setState(SparkAppHandle.State.RUNNING)
}
解析:它会创建LocalEndpoint,可以看出LoaclBackend会同过LoaclEndpoint来进行消息的通信。
TaskSchedulerImpl和LoaclBackEnd创建好了便进行初始化。
SparkContext 2616行 调用初始化方法:
scheduler.initialize(backend)
调用TaskSchedulerImpl 126行:
def initialize(backend: SchedulerBackend) {
//获得LoaclBackend引用
this.backend = backend
// temporarily set rootPool name to empty创建缓存队列
rootPool = new Pool("", schedulingMode, 0, 0)
//创建不同的调度策略来操作队列
schedulableBuilder = {
schedulingMode match {
case SchedulingMode.FIFO =>
new FIFOSchedulableBuilder(rootPool)
case SchedulingMode.FAIR =>
new FairSchedulableBuilder(rootPool, conf)
}
}
schedulableBuilder.buildPools()
}
TaskScheduler创建完毕。