本篇文章分析了Spark中分发任务的函数调用关系,从executor中线程执行任务,一直到上层DAGScheduler的submitWaitingStages()提交stage,自底向上地跟踪了Spark 中任务执行的完整函数链,由于任务最终的计算是由 Executor 中的线程承担的,我们就从任务在线程里面的执行开始,一步步向上回溯。
任务最终在 Executor 的线程池内执行,将任务封装成 TaskRunner ,然后放入运行任务列表,线程池调用 execute 函数执行具体的计算任务
def launchTask(context: ExecutorBackend, taskId: Long, attemptNumber: Int, taskName: String, serializedTask: ByteBuffer){
val tr = new TaskRunner(context, taskId = taskId, attemptNumber = attemptNumber, taskName, serializedTask)
runningTasks.put(taskId, tr)
threadPool.execute(tr)
}
这一过程是在 Executor 的 launchTask中 被调用的,而 executor.launchTask 在集群模式下是由 CoarseGrainedExecutorBackend 在收到由 Driver发送过来的 LaunchTask 消息后调用的
case LaunchTask(data) =>
if (executor == null) {
logError("Received LaunchTask command but executor was null")
System.exit(1)
} else {
val ser = env.closureSerializer.newInstance()
val taskDesc = ser.deserialize[TaskDescription](data.value)
logInfo("Got assigned task " + taskDesc.taskId)
executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,
taskDesc.name, taskDesc.serializedTask)
}
而 LaunchTask 这个消息是由在 Driver 端的 CoarseGrainedSchedulerBackend 中 launchTask 函数发出的,注意这里的 CoarseGrainedSchedulerBackend 和 CoarseGrainedExecutorBackend 分别运行在 Driver 端和 Executor 端
def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
//...
val executorData = executorDataMap(task.executorId)
executorData.freeCores -= scheduler.CPUS_PER_TASK
executorData.executorActor ! LaunchTask(new SerializableBuffer(serializedTask))
//...
}
CoarseGrainedSchedulerBackend 的 launchTask 在 makeOffers 函数里调用,这里的 makeOffers 被重载了,有两个函数定义,我们考虑在所有 Executor 上发起任务的那个函数,在指定 Executor 上发起任务的函数只会在 Executor 状态发生变化时被调用,比如有任务完成,某个Executor空闲出来,资源可供其他任务运行
// Make fake resource offers on all executors
def makeOffers() {
launchTasks(scheduler.resourceOffers(executorDataMap.map { case (id, executorData) =>
new WorkerOffer(id, executorData.executorHost, executorData.freeCores)
}.toSeq))
}
// Make fake resource offers on just one executor
def makeOffers(executorId: String) {
val executorData =