接着上一节的dagScheduler.handleJobSubmitted方法,先看下这一句核心代码:
finalStage = newStage(finalRDD, partitions.size, None, jobId, callSite)
使用最后一个rdd创建了一个Stage,看下newStage这个核心方法:
private def newStage(
rdd: RDD[_],
numTasks: Int,
shuffleDep: Option[ShuffleDependency[_, _, _]],
jobId: Int,
callSite: CallSite)
: Stage =
{
//获得当前rdd的父辈Stage集合
val parentStages = getParentStages(rdd, jobId)
val id = nextStageId.getAndIncrement()
//为当前的rdd创建来了一个stage
val stage = new Stage(id, rdd, numTasks, shuffleDep, parentStages, jobId, callSite)
//stageID -> Stage
stageIdToStage(id) = stage
updateJobIdStageIdMaps(jobId, stage)
stage
}
先看下
getParentStages方法,下面大部分代码都是为了求当前Stage的父Stage集合
private def getParentStages(rdd: RDD[_], jobId: Int): List[Stage] = {
//保存该rdd的父Stage
val parents = new HashSet[Stage]
//用来标记该rdd是否被访问过
val visited = new HashSet[RDD[_]]
//栈保存待被访问的rdd
val waitingForVisit = new Stack[RDD[_]]
def visit(r: RDD[_]) {
if (!visited(r)) {
visited += r
for (dep <- r.dependencies) {
dep match {
case shufDep: ShuffleDependency[_, _, _] =>
parents += getShuffleMapStage(shufDep, jobId) //shuffleDep.rdd可获得父rdd
case _ =>
waitingForVisit.push(dep.rdd)
}
}
}
}
waitingForVisit.push(rdd)
while (!waitingForVisit.isEmpty) {
visit(waitingForVisit.pop())
}
parents.toList
}
以之前的WordCount为例,其最后rdd是一个MapPartitionsRDD(saveAsTextFile方法),把其记为finalRDD。getParentStages的运行过程如下:
(1) 把finalRDD先压栈
(2)若栈waitingForVisit不为空,则弹出栈顶元素并使用visit方法进行访问
(3) 在visit方法中,若该rdd没被访问过,则把它添加到已访问的集合visited中,然后遍历该rdd的依赖关系集合。若当前依赖是窄依赖,则通过依赖获取其父rdd并把它添加到waitingForVisit的集合中,之后会在下面while循环中进行访问。若当前的依赖是shuffle依赖(reduceByKey方法),则通过getShuffleMapStage获得父Stage(ShuffleStage)。具体看下getShuffleMapStage方法
private def getShuffleMapStage(shuffleDep: ShuffleDependency[_, _, _], jobId: Int): Stage = {
shuffleToMapStage.get(shuffleDep.shuffleId) match {
case Some(stage) => stage
case None =>
// We are going to register ancestor shuffle dependencies
registerShuffleDependencies(shuffleDep, jobId)
// Then register current shuffleDep
val stage =
newOrUsedStage(
shuffleDep.rdd, shuffleDep.rdd.partitions.size, shuffleDep, jobId,
shuffleDep.rdd.creationSite)
shuffleToMapStage(shuffleDep.shuffleId) = stage
stage
}
}
首先通过shuffleToMapStage的映射关系获得stage,若该stage已经存在,则直接返回;若不存在,则先注册其祖辈的shuffle依赖,然后在注册当前的shuffle依赖,下面看下
newOrUsedStage方法
private def newOrUsedStage(
rdd: RDD[_],
numTasks: Int,
shuffleDep: ShuffleDependency[_, _, _],
jobId: Int,
callSite: CallSite)
: Stage =
{
val stage = newStage(rdd, numTasks, Some(shuffleDep), jobId, callSite)
if (mapOutputTracker.containsShuffle(shuffleDep.shuffleId)) {
val serLocs = mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId)
val locs = MapOutputTracker.deserializeMapStatuses(serLocs)
for (i <- 0 until locs.size) {
stage.outputLocs(i) = Option(locs(i)).toList // locs(i) will be null if missing
}
stage.numAvailableOutputs = locs.count(_ != null)
} else {
logInfo("Registering RDD " + rdd.id + " (" + rdd.getCreationSite + ")")
mapOutputTracker.registerShuffle(shuffleDep.shuffleId, rdd.partitions.size)
}
stage
}
第一句代码是核心,val stage = newStage这里又调用了前面的newStage形成了递归,直到某个rdd没有父依赖才跳出递归。用下面以官方的示意图阐述上面的递归步骤:
<1>如上图所示G为finalRDD,Stage3就是第一次调用newStage获得的finalStage,把rdd_G压栈后弹栈调用visit方法进行访问
<2>在 for (dep <- r.dependencies)中获得了2个依赖,一个是窄依赖G->B,此时直接把rdd_B入栈。一个是shuffle依赖G->F,此时调用getShuffleMapStage方法,在其中又调用了newOrUsedStage,通过shuffleDep.rdd(rdd_F)再次调用newStage方法(这里对rdd_f进行了下一层递归)获得了Stage2,并它其加到parents的集合中。
<3>接下来弹出栈顶元素rdd_B,调用visit方法进行访问,其过程和上面类似。最后stageIdToStage保存了3个stageId到stage的映射,而stage中又保存了其父辈stage,其stage依赖关系构建完毕
(4)重复执行步骤2,直至栈为空
在切分完stage后,若是集群环境,在dagScheduler.handleJobSubmitted方法中提交了finalStage即submitStage(finalStage),下面看下submitStage方法
/** Submits stage, but first recursively submits any missing parents. */
private def submitStage(stage: Stage) {
val jobId = activeJobForStage(stage)
if (jobId.isDefined) {
logDebug("submitStage(" + stage + ")")
if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
//获取没有提交的父stage
val missing = getMissingParentStages(stage).sortBy(_.id)
logDebug("missing: " + missing)
if (missing == Nil) {
logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
//提交最前面的stage
submitMissingTasks(stage, jobId.get)
} else {
//递归提交父stage
for (parent <- missing) {
submitStage(parent)
}
waitingStages += stage
}
}
} else {
abortStage(stage, "No active job for stage " + stage.id)
}
}
下面看下
submitMissingTasks的源码
/** Called when stage's parents are available and we can now do its task. */
private def submitMissingTasks(stage: Stage, jobId: Int) {
//只保留关心的代码
// First figure out the indexes of partition ids to compute.
val partitionsToCompute: Seq[Int] = {
if (stage.isShuffleMap) {
(0 until stage.numPartitions).filter(id => stage.outputLocs(id) == Nil)
} else {
val job = stage.resultOfJob.get
(0 until job.numPartitions).filter(id => !job.finished(id))
}
}
runningStages += stage
//获得任务的集合
val tasks: Seq[Task[_]] = if (stage.isShuffleMap) {
partitionsToCompute.map { id =>
val locs = getPreferredLocs(stage.rdd, id)
val part = stage.rdd.partitions(id)
new ShuffleMapTask(stage.id, taskBinary, part, locs)
}
} else {
val job = stage.resultOfJob.get
partitionsToCompute.map { id =>
val p: Int = job.partitions(id)
val part = stage.rdd.partitions(p)
val locs = getPreferredLocs(stage.rdd, p)
new ResultTask(stage.id, taskBinary, part, locs, id)
}
}
if (tasks.size > 0) {
stage.pendingTasks ++= tasks
//把tasts封装成TaskSet,提交给TaskScheduler
taskScheduler.submitTasks(
new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, properties))
stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
} else {
markStageAsFinished(stage, None)
}
}
Task有2种类型,ShuffleMapTask和ResultTask。ShuffleMapTask和hadoop的reduce类似,拉取上游的数据,ResultTask负责把相关的数据写入到其它的存储介质中。获取任务tasks集合后,把其封装成TaskSet,然后提交给了TaskScheduler