概要
在前面,我们介绍了Driver的启动、注册以及Application的注册。在此之后,就要进行Task任务的执行了。
1. 执行用户编写的代码
Spark 任务调度之Register App中介绍了Driver中初始化SparkContext对象及注册APP的流程,SparkContext初始化
完毕后,执行用户编写代码
,仍以SparkPi为例,如下
如上图,SparkPi中调用RDD的reduce
,reduce中
调用SparkContext.runJob
方法提交任务,SparkContext.runJob方法后面又会调用DAGScheduler.runJob
方法,如下
在2217行,能看到,runJob方法又回到了DAGScheduler.runJob
,用来生成Task并提交
。下面看看Task的生成。
2. DAGScheduler生成task
DAGScheduler中,根据RDD
的Dependency
生成stage
,并根据stage类型生成对应的task
Stage类型 | Task类型 |
---|---|
ShuffleMapStage | ShuffleMapTask |
ResultStage | ResultTask |
stage分为ShuffleMapStage和ResultStage两种类型,根据stage类型生成对应的task,分别是ShuffleMapTask、ResultTask,最后调用TaskScheduler提交任务。
当Stage的父亲母是可用的时候,那么我们可以执行它的Task.,如下图,在submitMissingTasks中。
这里我们只关注流程,DAGScheduler的具体细节后续介绍,参见Spark DAG之SubmitTask。
3. TaskSchedulerImpl提交Task
看看TaskSchedulerImpl
的介绍
TaskScheduler中使用TaskSetManager
管理TaskSet
,submitTasks方法最终调用CoarseGrainedSchedulerBackend
的launchTasks
方法将task发送到Executor,如下
234行:backend.reviveOffers()
DriverEndpoint.receive() , DriverEndpoint.makeoffers() ,DriverEndpoint.launchTasks()
executorDataMap
中保存了所有Executor的连接方式
,关于Executor如何注册到executorDataMap中,参考Spark 任务调度之创建Executor。
关于如何确定执行某Task的Executor,是在DAG创建此Task时就确定了的,那个时候通过加锁,保证Executor不能在分配Task执行位置期间死亡。也确定就是320行的Task.executorId。参见Spark DAG之SubmitTask中的2.2 确定Task的执行位置
4. Executor接收Task
Worker节点的CoarseGrainedExecutorBackend
进程接收Driver发送的task,交给Executor对象处理,如下
Executor
的launchTask
方法将收到的信息封装为TaskRunner
对象,TaskRunner继承自Runnable,Executor使用线程池threadPool
调度TaskRunner
Executor的创建过程请参考Spark 任务调度之创建Executor。
至此从RDD的action开始,至Executor对象接收任务的流程就结束了。
总结
介绍了从RDD的action开始,到Executor接收到task的流程,其中省略了DAG相关的部分,后续单独介绍,整理流程大致如下
附录
---------------RDD.scala-------------
// 使用指定的可交换和关联二元运算符减少此RDD的元素。
def reduce(f: (T, T) => T): T = withScope {
val cleanF = sc.clean(f)
val reducePartition: Iterator[T] => Option[T] = iter => {
if (iter.hasNext) {
Some(iter.reduceLeft(cleanF))
} else {
None
}
}
var jobResult: Option[T] = None
val mergeResult = (index: Int, taskResult: Option[T]) => {
if (taskResult.isDefined) {
jobResult = jobResult match {
case Some(value) => Some(f(value, taskResult.get))
case None => taskResult
}
}
}
// spark提交任务
sc.runJob(this, reducePartition, mergeResult)
// Get the final result out of our Option, or throw an exception if the RDD was empty
jobResult.getOrElse(throw new UnsupportedOperationException("empty collection"))
}
----------------SparkContext.runJob---------------
//在RDD中的所有分区上运行作业,并将结果传递给处理函数。
def runJob[T, U: ClassTag](
rdd: RDD[T],
processPartition: Iterator[T] => U,
resultHandler: (Int, U) => Unit)
{
val processFunc = (context: TaskContext, iter: Iterator[T]) => processPartition(iter)
runJob[T, U](rdd, processFunc, 0 until rdd.partitions.length, resultHandler)
}
/**
* Run a function on a given set of partitions in an RDD and pass the results to the given
* handler function. This is the main entry point for all actions in Spark.
* *在RDD中的给定分区集上运行函数,并将结果传递给给定的处理函数。 这是Spark中所有操作的主要入口点。
*
* @param rdd target RDD to run tasks on
* @param func a function to run on each partition of the RDD
* @param partitions set of partitions to run on; some jobs may not want to compute on all
* partitions of the target RDD, e.g. for operations like `first()`
* @param resultHandler callback to pass each result to
*/
// *在RDD中的给定分区集上运行函数,并将结果传递给给定的处理函数。 这是Spark中所有操作的主要入口点。
def runJob[T, U: ClassTag](
rdd: RDD[T],
func: (TaskContext, Iterator[T]) => U,
partitions: Seq[Int],
resultHandler: (Int, U) => Unit): Unit = {
if (stopped.get()) {
throw new IllegalStateException("SparkContext has been shutdown")
}
val callSite = getCallSite
val cleanedFunc = clean(func)
logInfo("Starting job: " + callSite.shortForm)
if (conf.getBoolean("spark.logLineage", false)) {
logInfo("RDD's recursive dependencies:\n" + rdd.toDebugString)
}
//调用DAGScheduler,生成Task并提交
dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get)
progressBar.foreach(_.finishAll())
rdd.doCheckpoint()
}
--------------------DAGScheduler.scala SubmitMissingTasks()-----------------
if (tasks.size > 0) {
logInfo(s"Submitting ${tasks.size} missing tasks from $stage (${stage.rdd}) (first 15 " +
s"tasks are for partitions ${tasks.take(15).map(_.partitionId)})")
taskScheduler.submitTasks(new TaskSet(
tasks.toArray, stage.id, stage.latestInfo.attemptNumber, jobId, properties))
} else {
// Because we posted SparkListenerStageSubmitted earlier, we should mark
// the stage as completed here in case there are no tasks to run
markStageAsFinished(stage, None)
stage match {
case stage: ShuffleMapStage =>
logDebug(s"Stage ${stage} is actually done; " +
s"(available: ${stage.isAvailable}," +
s"available outputs: ${stage.numAvailableOutputs}," +
s"partitions: ${stage.numPartitions})")
markMapStageJobsAsFinished(stage)
case stage : ResultStage =>
logDebug(s"Stage ${stage} is actually done; (partitions: ${stage.numPartitions})")
}
submitWaitingChildStages(stage)
}
------------TaskSchedulerImpl.submitTasks()--------------------------
override def submitTasks(taskSet: TaskSet) {
val tasks = taskSet.tasks
logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
this.synchronized {
val manager = createTaskSetManager(taskSet, maxTaskFailures)
val stage = taskSet.stageId
val stageTaskSets =
taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager])
//将此阶段的所有现有TaskSetManage标记为僵尸,因为我们正在添加一个新的。
//这是处理角落案件的必要条件。
//假设一个阶段有10个分区,并且有2个TaskSetManagers:TSM1(僵尸)和TSM2(活动)。
//TSM1具有分区10的运行任务,并且完成。 TSM2完成分区1-9的任务,并认为他仍处于活动状态,因为分区10尚未完成。
//但是,DAGScheduler获取所有10个分区的任务完成事件,并认为该阶段已完成。
//如果它是一个洗牌阶段,并且它以某种方式缺少了地图输出,那么DAGScheduler将重新提交它并为它创建一个TSM3。
//由于一个阶段不能有多个活动任务集管理器,我们必须将TSM2标记为僵尸(实际上是)。
stageTaskSets.foreach { case (_, ts) =>
ts.isZombie = true
}
stageTaskSets(taskSet.stageAttemptId) = manager
schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)
hasReceivedTask = true
}
backend.reviveOffers()
}
--------------------------CoarseGrainedSchedulerBackend.reviveOffers()------------------
override def reviveOffers() {
driverEndpoint.send(ReviveOffers)
}
![在这里插入图片描述](https://img-blog.csdnimg.cn/20190915111929492.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3ByZV90ZW5kZXI=,size_16,color_FFFFFF,t_70)
-------DriverEndpoint.receive() ,makeoffers() ,launchTasks()-----------
case ReviveOffers =>
makeOffers()
----------------------------------
// 对所有Executor提供假资源
private def makeOffers() {
// 加锁,确保在task发起时不会有executor被杀死
val taskDescs = withLock {
// Filter out executors under killing
val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
val workOffers = activeExecutors.map {
case (id, executorData) =>
new WorkerOffer(id, executorData.executorHost, executorData.freeCores,
Some(executorData.executorAddress.hostPort))
}.toIndexedSeq
scheduler.resourceOffers(workOffers)
}
if (!taskDescs.isEmpty) {
launchTasks(taskDescs)
}
}
---------------
// 启动一组资源报价返回的任务
private def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
for (task <- tasks.flatten) {
val serializedTask = TaskDescription.encode(task)
if (serializedTask.limit() >= maxRpcMessageSize) {
Option(scheduler.taskIdToTaskSetManager.get(task.taskId)).foreach { taskSetMgr =>
try {
var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +
"spark.rpc.message.maxSize (%d bytes). Consider increasing " +
"spark.rpc.message.maxSize or using broadcast variables for large values."
msg = msg.format(task.taskId, task.index, serializedTask.limit(), maxRpcMessageSize)
taskSetMgr.abort(msg)
} catch {
case e: Exception => logError("Exception in error callback", e)
}
}
}
else {
val executorData = executorDataMap(task.executorId)
executorData.freeCores -= scheduler.CPUS_PER_TASK
executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))
}
}
}