Spark1.3从创建到提交:9)Stage的划分和提交源码分析

接着上一节的dagScheduler.handleJobSubmitted方法,先看下这一句核心代码:

finalStage = newStage(finalRDD, partitions.size, None, jobId, callSite)
使用最后一个rdd创建了一个Stage,看下newStage这个核心方法:
  private def newStage(
      rdd: RDD[_],
      numTasks: Int,
      shuffleDep: Option[ShuffleDependency[_, _, _]],
      jobId: Int,
      callSite: CallSite)
    : Stage =
  {
	//获得当前rdd的父辈Stage集合
    val parentStages = getParentStages(rdd, jobId)
    val id = nextStageId.getAndIncrement()
	//为当前的rdd创建来了一个stage
    val stage = new Stage(id, rdd, numTasks, shuffleDep, parentStages, jobId, callSite)
	//stageID -> Stage
    stageIdToStage(id) = stage
    updateJobIdStageIdMaps(jobId, stage)
    stage
  }
先看下 getParentStages方法,下面大部分代码都是为了求当前Stage的父Stage集合
  private def getParentStages(rdd: RDD[_], jobId: Int): List[Stage] = {
   //保存该rdd的父Stage
    val parents = new HashSet[Stage]
   //用来标记该rdd是否被访问过
    val visited = new HashSet[RDD[_]]
    //栈保存待被访问的rdd
    val waitingForVisit = new Stack[RDD[_]]
    def visit(r: RDD[_]) {
      if (!visited(r)) {
        visited += r
        for (dep <- r.dependencies) {
          dep match {
            case shufDep: ShuffleDependency[_, _, _] =>
              parents += getShuffleMapStage(shufDep, jobId) //shuffleDep.rdd可获得父rdd
            case _ =>
              waitingForVisit.push(dep.rdd)
          }
        }
      }
    }
    waitingForVisit.push(rdd)
    while (!waitingForVisit.isEmpty) {
      visit(waitingForVisit.pop())
    }
    parents.toList
  }
以之前的WordCount为例,其最后rdd是一个MapPartitionsRDD(saveAsTextFile方法),把其记为finalRDD。getParentStages的运行过程如下:

(1) 把finalRDD先压栈

(2)若栈waitingForVisit不为空,则弹出栈顶元素并使用visit方法进行访问

(3) 在visit方法中,若该rdd没被访问过,则把它添加到已访问的集合visited中,然后遍历该rdd的依赖关系集合。若当前依赖是窄依赖,则通过依赖获取其父rdd并把它添加到waitingForVisit的集合中,之后会在下面while循环中进行访问。若当前的依赖是shuffle依赖(reduceByKey方法),则通过getShuffleMapStage获得父Stage(ShuffleStage)。具体看下getShuffleMapStage方法

  private def getShuffleMapStage(shuffleDep: ShuffleDependency[_, _, _], jobId: Int): Stage = {
    shuffleToMapStage.get(shuffleDep.shuffleId) match {
      case Some(stage) => stage
      case None =>
        // We are going to register ancestor shuffle dependencies
        registerShuffleDependencies(shuffleDep, jobId)
        // Then register current shuffleDep
        val stage =
          newOrUsedStage(
            shuffleDep.rdd, shuffleDep.rdd.partitions.size, shuffleDep, jobId,
            shuffleDep.rdd.creationSite)
        shuffleToMapStage(shuffleDep.shuffleId) = stage
 
        stage
    }
  }
首先通过shuffleToMapStage的映射关系获得stage,若该stage已经存在,则直接返回;若不存在,则先注册其祖辈的shuffle依赖,然后在注册当前的shuffle依赖,下面看下 newOrUsedStage方法 
  private def newOrUsedStage(
      rdd: RDD[_],
      numTasks: Int,
      shuffleDep: ShuffleDependency[_, _, _],
      jobId: Int,
      callSite: CallSite)
    : Stage =
  {
    val stage = newStage(rdd, numTasks, Some(shuffleDep), jobId, callSite)
    if (mapOutputTracker.containsShuffle(shuffleDep.shuffleId)) {
      val serLocs = mapOutputTracker.getSerializedMapOutputStatuses(shuffleDep.shuffleId)
      val locs = MapOutputTracker.deserializeMapStatuses(serLocs)
      for (i <- 0 until locs.size) {
        stage.outputLocs(i) = Option(locs(i)).toList   // locs(i) will be null if missing
      }
      stage.numAvailableOutputs = locs.count(_ != null)
    } else {
      logInfo("Registering RDD " + rdd.id + " (" + rdd.getCreationSite + ")")
      mapOutputTracker.registerShuffle(shuffleDep.shuffleId, rdd.partitions.size)
    }
    stage
  }

第一句代码是核心,val stage = newStage这里又调用了前面的newStage形成了递归,直到某个rdd没有父依赖才跳出递归。用下面以官方的示意图阐述上面的递归步骤:


<1>如上图所示G为finalRDD,Stage3就是第一次调用newStage获得的finalStage,把rdd_G压栈后弹栈调用visit方法进行访问

<2>在 for (dep <- r.dependencies)中获得了2个依赖,一个是窄依赖G->B,此时直接把rdd_B入栈。一个是shuffle依赖G->F,此时调用getShuffleMapStage方法,在其中又调用了newOrUsedStage,通过shuffleDep.rdd(rdd_F)再次调用newStage方法(这里对rdd_f进行了下一层递归)获得了Stage2,并它其加到parents的集合中。

<3>接下来弹出栈顶元素rdd_B,调用visit方法进行访问,其过程和上面类似。最后stageIdToStage保存了3个stageId到stage的映射,而stage中又保存了其父辈stage,其stage依赖关系构建完毕

(4)重复执行步骤2,直至栈为空


在切分完stage后,若是集群环境,在dagScheduler.handleJobSubmitted方法中提交了finalStage即submitStage(finalStage),下面看下submitStage方法

  /** Submits stage, but first recursively submits any missing parents. */
  private def submitStage(stage: Stage) {
    val jobId = activeJobForStage(stage)
    if (jobId.isDefined) {
      logDebug("submitStage(" + stage + ")")
      if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
		//获取没有提交的父stage
        val missing = getMissingParentStages(stage).sortBy(_.id)
        logDebug("missing: " + missing)
        if (missing == Nil) {
          logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
		  //提交最前面的stage
          submitMissingTasks(stage, jobId.get)
        } else {
		  //递归提交父stage
          for (parent <- missing) {
            submitStage(parent)
          }
          waitingStages += stage
        }
      }
    } else {
      abortStage(stage, "No active job for stage " + stage.id)
    }
  }
下面看下 submitMissingTasks的源码
 /** Called when stage's parents are available and we can now do its task. */
  private def submitMissingTasks(stage: Stage, jobId: Int) {
    //只保留关心的代码
    // First figure out the indexes of partition ids to compute.
    val partitionsToCompute: Seq[Int] = {
      if (stage.isShuffleMap) {
        (0 until stage.numPartitions).filter(id => stage.outputLocs(id) == Nil)
      } else {
        val job = stage.resultOfJob.get
        (0 until job.numPartitions).filter(id => !job.finished(id))
      }
    }
    runningStages += stage
    //获得任务的集合
    val tasks: Seq[Task[_]] = if (stage.isShuffleMap) {
      partitionsToCompute.map { id =>
        val locs = getPreferredLocs(stage.rdd, id)
        val part = stage.rdd.partitions(id)
        new ShuffleMapTask(stage.id, taskBinary, part, locs)
      }
    } else {
      val job = stage.resultOfJob.get
      partitionsToCompute.map { id =>
        val p: Int = job.partitions(id)
        val part = stage.rdd.partitions(p)
        val locs = getPreferredLocs(stage.rdd, p)
        new ResultTask(stage.id, taskBinary, part, locs, id)
      }
    }
    if (tasks.size > 0) {
      stage.pendingTasks ++= tasks
	  //把tasts封装成TaskSet,提交给TaskScheduler
      taskScheduler.submitTasks(
        new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, properties))
      stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
    } else {
      markStageAsFinished(stage, None)
    }
  }
Task有2种类型,ShuffleMapTask和ResultTask。ShuffleMapTask和hadoop的reduce类似,拉取上游的数据,ResultTask负责把相关的数据写入到其它的存储介质中。获取任务tasks集合后,把其封装成TaskSet,然后提交给了TaskScheduler


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值