Stage的划分依据
Stage:每个任务会被划分为若干个阶段,每个都有自己的并行度,阶段与阶段之间有相互的依赖关系。Stage的划分依赖于(RDD血统),宽|窄依赖
之间的关系。如果为窄依赖则划分为一个Stage,如果为宽依赖则建立一个新的Stage。Spark 在任务的提交的时候会调用DAFScheduler方法根据最后
一个RDD逆向推导出
任务的阶段(根据宽窄依赖
)
宽|窄依赖详解剖析
进入submitStage方法。submitStage提交stage,第一个提交的是没有父依赖关系的。
由DAGScheduler.scala的submitStage可得
/** Submits stage, but first recursively submits any missing parents. */
// 由第一个finalRDD递归的寻找其他的父类
private def submitStage(stage: Stage) {
val jobId = activeJobForStage(stage)
if (jobId.isDefined) {
if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
val missing = getMissingParentStages(stage).sortBy(_.id) // 获取缺失的父类Stage
if (missing.isEmpty) { //判断missing是否为空
submitMissingTasks(stage, jobId.get) //提交Stage任务
} else {
for (parent <- missing) {
submitStage(parent)
}
waitingStages += stage //需要将当前的Stage存储到waitingStages
}
}
} else {
abortStage(stage, "No active job for stage " + stage.id, None)
}
}
如果计算中发现当前的stage没有任何的依赖关系。则直接提交task。
从上图中可以看到,RDD G与RDD F间的依赖是宽依赖,所以RDD F与 RDD G被划分为不同的Stage,而RDD G 与 RDD 间为窄依赖,因此 RDD B 与 RDD G被划分为同一个Stage。通过这种递归的调用方式,将所有RDD进行划分。
ShuffleDependency
:即是宽依赖,有Shuffle的过程
NarrowDependency
:窄依赖
ResultStage
:结果Stage,是由finalRDD产生的Stage
ShuflleMapStage
:除ResultStage外的其他Stage。
DAGScheduler的主要功能:
1、接收用户提交的job。
2、以stage的形式划分job,并记录物化的stage。在stage内产生的task以taskSet的方式提交给taskScheduler。
TaskSet/TaskScheduler详解剖析
继续submitStage,进入submitMissingTasks方法。该方法将stage根据parition拆分成task。然后生成TaskSet,并提交到TaskSchedule
private def submitMissingTasks(stage: Stage, jobId: Int) {
//计算分区
val partitionsToCompute: Seq[Int] = stage.findMissingPartitions()
...
//把当前Stage添加到运行runningStages
runningStages += stage
...
//更具分区计算任务最优位置
val taskIdToLocations: Map[Int, Seq[TaskLocation]] = try {
stage match {
case s: ShuffleMapStage =>
partitionsToCompute.map { id => (id, getPreferredLocs(stage.rdd, id))}.toMap
case s: ResultStage =>
partitionsToCompute.map { id =>
val p = s.partitions(id)
(id, getPreferredLocs(stage.rdd, p))
}.toMap
}
} catch {
case NonFatal(e) =>
stage.makeNewStageAttempt(partitionsToCompute.size)
listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))
abortStage(stage, s"Task creation failed: $e\n${Utils.exceptionString(e)}", Some(e))
runningStages -= stage
return
}
...
//封装任务集合
val tasks: Seq[Task[_]] = try {
val serializedTaskMetrics = closureSerializer.serialize(stage.latestInfo.taskMetrics).array()
stage match {
case stage: ShuffleMapStage =>
stage.pendingPartitions.clear()
partitionsToCompute.map { id =>
val locs = taskIdToLocations(id)
val part = partitions(id)
stage.pendingPartitions += id
new ShuffleMapTask(stage.id, stage.latestInfo.attemptNumber,
taskBinary, part, locs, properties, serializedTaskMetrics, Option(jobId),
Option(sc.applicationId), sc.applicationAttemptId, stage.rdd.isBarrier())
}
case stage: ResultStage =>
partitionsToCompute.map { id =>
val p: Int = stage.partitions(id)
val part = partitions(p)
val locs = taskIdToLocations(id)
new ResultTask(stage.id, stage.latestInfo.attemptNumber,
taskBinary, part, locs, id, properties, serializedTaskMetrics,
Option(jobId), Option(sc.applicationId), sc.applicationAttemptId,
stage.rdd.isBarrier())
}
}
} catch {
case NonFatal(e) =>
abortStage(stage, s"Task creation failed: $e\n${Utils.exceptionString(e)}", Some(e))
runningStages -= stage
return
}
...
//提交任务集
if (tasks.size > 0) {
taskScheduler.submitTasks(new TaskSet(
tasks.toArray, stage.id, stage.latestInfo.attemptNumber, jobId, properties))
} else {
//标记任务已完成
markStageAsFinished(stage, None)
stage match {
case stage: ShuffleMapStage =>
logDebug(s"Stage ${stage} is actually done; " +
s"(available: ${stage.isAvailable}," +
s"available outputs: ${stage.numAvailableOutputs}," +
s"partitions: ${stage.numPartitions})")
markMapStageJobsAsFinished(stage)
case stage : ResultStage =>
logDebug(s"Stage ${stage} is actually done; (partitions: ${stage.numPartitions})")
}
//计算当前Stage的所有子stage
submitWaitingChildStages(stage)
}
}
ShuffleMapTask
ResultTask