上一节中介绍了调用了RDD的collect的方法后的一些执行流程,首先调用一系列SparkContext的runJob的重载方法。然后调用DAGScheduler的rubJob方法,此方法主要调用submitJob方法,用来将一个Job提交到job Schedular。DAGScheduler的handleJobSubmitted方法用于处理提交的job,其中很重要的一步就是创建finalStage和Stage的划分。
首先介绍一些相关的概念:
在spark中每个RDD都由其依赖(除了最顶层RDD),这些依赖分为NarrowDependency和ShuffleDependency两种。
1) Partition:数据分区,一个RDD的数据可以被划分到若干个分区。
2) NarrowDepenency:窄依赖,即子RDD依赖于父RDD中固定的Partition。
3) ShuffleDependency:宽依赖,即子RDD依赖父RDD中所有的Partition。
4) Stage:一个Spark Job包含多个Stage,每个Stage是一系列执行相同功能的、独立的task的集合。DAGScheduler将DAG发生shuffle操作的地方切分Stage。Stage分为两种,一种是shuffle map stage,它的task执行结果是其他Stage的输入,另一种是result stage,它的task直接执行action操作。
DAGScheduler的handleJobSubmitted首先使用newStage方法创建finalStage,最下游的Stage被称为finalStage。在newStage方法中,处理步骤如下:
1) 获取所有父Stage。
2) 创建当前Stage,更新stageIdToStage。
3) 更新Stage及祖先Stage与JobId的对应关系。
newStage源码如表1:
private def newStage( rdd: RDD[_], numTasks: Int, shuffleDep: Option[ShuffleDependency[_, _, _]], jobId: Int, callSite: CallSite) : Stage = { //获取所有父Stage val parentStages = getParentStages(rdd,jobId) //创建当前Stage并更新stageIdToStage val id = nextStageId.getAndIncrement() val stage = newStage(id, rdd,numTasks, shuffleDep,parentStages, jobId,callSite) stageIdToStage(id) = stage //更新Stage及祖先Stage与JobId的关系 updateJobIdStageIdMaps(jobId, stage) stage } |
表 1newStage源码实现
newStage中getParentStages方法用于获取或创建给定RDD的所有父Stage,Stage的划分其实就是在这里完成。getParentStages方法中遍历指定RDD及其依赖的非宽依赖的RDD,直到找到第一个宽依赖,调用getShuffleMapStage,获取所有父Stage,也就是说获取或创建所有父Stage的工作是由getShuffleMapStage完成,getParentStages只是负责找到Shuffle Dependency。getParentStages源码如表2所示。
private def getParentStages(rdd: RDD[_],jobId: Int):List[Stage] = { val parents =new HashSet[Stage] //存放所有父Stage val visited =new HashSet[RDD[_]] //存放已遍历过的RDD // We are manually maintaining a stack here to prevent StackOverflowError // caused by recursively visiting val waitingForVisit =new Stack[RDD[_]] //待遍历RDD栈 def visit(r: RDD[_]) { if (!visited(r)) { visited += r // Kind of ugly: need to register RDDs with the cache here since // we can't do it in its constructor because # of partitions is unknown for (dep <- r.dependencies) { dep match { case shufDep: ShuffleDependency[_,_, _] => parents += getShuffleMapStage(shufDep,jobId) //如果是shuffle Dependency并没有将shuffleDependency的RDD放入waitingForVist栈中,所以遍历也将结束,获取父Stage的工作交由getShuffleMapStage方法完成 case _ => waitingForVisit.push(dep.rdd) } } } } waitingForVisit.push(rdd) while (!waitingForVisit.isEmpty) { visit(waitingForVisit.pop()) } parents.toList } |
表 2getParentStages源码实现
getParentStages中getShuffleMapStage方法用于获取或创建Stage,处理步骤如下:
1) 如果shuffleToMapStage中已经注册了shuffleDependency对应的Stage,则直接返回此Stage。
2) 否则调用registerShuffleDependencies注册祖先的shuffleDependency,然后创建当前Stage并注册,返回此Stage。
getShuffleMapStage及registerShuffleDependencies源码如表3所示。
private def getShuffleMapStage(shuffleDep: ShuffleDependency[_,_, _], jobId: Int): Stage = { shuffleToMapStage.get(shuffleDep.shuffleId)match { case Some(stage) => stage case None => // We are going to register ancestor shuffle dependencies registerShuffleDependencies(shuffleDep, jobId) // Then register current shuffleDep val stage = newOrUsedStage( shuffleDep.rdd, shuffleDep.rdd.partitions.size,shuffleDep, jobId, shuffleDep.rdd.creationSite) shuffleToMapStage(shuffleDep.shuffleId) = stage stage } } private def registerShuffleDependencies(shuffleDep: ShuffleDependency[_,_, _], jobId: Int) = { val parentsWithNoMapStage = getAncestorShuffleDependencies(shuffleDep.rdd) while (!parentsWithNoMapStage.isEmpty) { val currentShufDep = parentsWithNoMapStage.pop() val stage = newOrUsedStage( currentShufDep.rdd, currentShufDep.rdd.partitions.size,currentShufDep, jobId, currentShufDep.rdd.creationSite) shuffleToMapStage(currentShufDep.shuffleId) = stage } } |
表 3getShuffleMapStage及registerShuffleDependencies源码实现
registerShuffleDependencies中调用getAncestorShuffleDependencies找到RDD依赖的所有祖先中还没有注册过Stage的ShuffleDependency,然后调用newStage创建Stage。getAncestorShuffleDependencies源码如表4所示。
private def getAncestorShuffleDependencies(rdd: RDD[_]): Stack[ShuffleDependency[_,_, _]] = { val parents = newStack[ShuffleDependency[_, _,_]] val visited = newHashSet[RDD[_]] // We are manually maintaining a stack here to prevent StackOverflowError // caused by recursively visiting val waitingForVisit =new Stack[RDD[_]] def visit(r: RDD[_]) { if (!visited(r)) { visited += r for (dep <- r.dependencies) { dep match { case shufDep: ShuffleDependency[_,_, _] => //如果没有在shuffleToMapStage注册过该ShuffleDependency,放入parents if (!shuffleToMapStage.contains(shufDep.shuffleId)) { parents.push(shufDep) } waitingForVisit.push(shufDep.rdd) case _ => waitingForVisit.push(dep.rdd) } } } } waitingForVisit.push(rdd) while (!waitingForVisit.isEmpty) { visit(waitingForVisit.pop()) } parents } |
表 4 getAncestorShuffleDependencies源码实现
newStage中第二步是创建Stage,之前对Stage已经有了一些介绍,这里主要讨论Stage的构建过程。构建过程中会调用StageInfo的fromStage方法,方法中首先调用getNarrowAncestors方法获取RDD的所有直接或者间接的NarrowDependency的RDD。返回的Seq[RDD[_]]map到RDDInfo.fromRdd方法,生成RDDInfo。然后将当前RDD调用RDDInfo.fromRdd方法,生成RDDInfo,与上一步生成的所有RDDInfo合入rddinfos。最后创建StageInfo。fromStage、getNarrowAncestors和fromRdd方法如表5所示。
def fromStage(stage: Stage,numTasks: Option[Int] = None): StageInfo = { val ancestorRddInfos = stage.rdd.getNarrowAncestors.map(RDDInfo.fromRdd) val rddInfos = Seq(RDDInfo.fromRdd(stage.rdd)) ++ ancestorRddInfos new StageInfo( stage.id, stage.attemptId, stage.name, numTasks.getOrElse(stage.numTasks), rddInfos, stage.details) } private[spark] defgetNarrowAncestors: Seq[RDD[_]] = { val ancestors = newmutable.HashSet[RDD[_]] def visit(rdd: RDD[_]) { val narrowDependencies = rdd.dependencies.filter(_.isInstanceOf[NarrowDependency[_]]) val narrowParents = narrowDependencies.map(_.rdd) val narrowParentsNotVisited = narrowParents.filterNot(ancestors.contains) narrowParentsNotVisited.foreach { parent => ancestors.add(parent) visit(parent) } } visit(this) // In case there is a cycle, do not include the root itself ancestors.filterNot(_ == this).toSeq } private[spark] objectRDDInfo { def fromRdd(rdd: RDD[_]): RDDInfo = { val rddName = Option(rdd.name).getOrElse(rdd.id.toString) new RDDInfo(rdd.id,rddName, rdd.partitions.size,rdd.getStorageLevel) } } |
表 5 fromStage、getNarrowAncestors和fromRdd源码实现
newStage中最后调用updateJobStageIdMaps方法,用于更新JobId和stage的关系,方法实现如表6所示。
private def updateJobIdStageIdMaps(jobId:Int, stage: Stage) { def updateJobIdStageIdMapsList(stages:List[Stage]) { if (stages.nonEmpty) { val s = stages.head s.jobIds += jobId jobIdToStageIds.getOrElseUpdate(jobId,new HashSet[Int]()) += s.id val parents: List[Stage] = getParentStages(s.rdd, jobId) val parentsWithoutThisJobId = parents.filter { ! _.jobIds.contains(jobId) } updateJobIdStageIdMapsList(parentsWithoutThisJobId ++ stages.tail) } } updateJobIdStageIdMapsList(List(stage)) } |
表 6updateJobIdStageIdMap源码实现