Spark调度机制：3)DAG调度

最新推荐文章于 2024-06-02 13:35:46 发布

Javis486

最新推荐文章于 2024-06-02 13:35:46 发布

阅读量3.9k

点赞数

分类专栏： Spark 文章标签： SPARK

本文链接：https://blog.csdn.net/jiangpeng59/article/details/53191137

版权

Spark 专栏收录该内容

38 篇文章 5 订阅

订阅专栏

1.DAG调度器简介

DAG即Directed Acyclic Graph,有向无环图的意思，Spark会存储RDD之间的依赖广西，依赖关系是有向的，总是由子RDD指向父RDD(平时我们看到的箭头一般是数据流向而不是依赖指向，它们刚好相反)，RDD依赖的有向性导致RDD的计算呈现明显的阶段特征。因此所形成的的计算链也可以被分割为多个阶段，后面的阶段依赖前面的阶段是否完成。由于RDD内部的数据是不可边，阶段直接的依赖关系所形成的的有向图自然就不会出现回路。

DAG调度的目的就是把一个作业分成不同阶段，根据依赖广西构建一张DAG，并进入到阶段内部，把阶段划分为可以并行计算的任务，最后再把一个阶段内的所有任务交付给任务调度器来收尾。

2.DAG调度的通信机制

DAG调度过程对应DAGScheduler类，DAGScheduler在SparkContext类中被实例化，而DAGScheduler在实例化过程中，会实例化一个REF对象 eventProcessActor，用于发送和处理各种调度事件，如提交作业，监控作业、任务完成情况等，对应的信息内容和处理方法在DAGSchedulerEventProcessLoop类的doOnReceive方法中。

private def doOnReceive(event: DAGSchedulerEvent): Unit = event match {
case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
  dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)
//some other code...
}

3.count作业处理流程

如下通过跟踪Spark中一个常用的动作操作count的执行流程，从而理清DAG调度的整个过程，count方法在RDD抽象类的实现如下

def count(): Long = sc.runJob(this, Utils.getIteratorSize _).sum

count方法调用了SparkContext中的runJob方法，runJop方法返回一个包含每个分区内部数据记录个数的整型数组对象，因此sum方法求和即可得到整个RDD内部的数据记录个数。

SparkContext有多个runJob方法的实现，但最后都会DAGScheduler的runJob。如下是SparkContext类的runJob方法

 def runJob[T, U: ClassTag](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      resultHandler: (Int, U) => Unit): Unit = {
    if (stopped.get()) {
      throw new IllegalStateException("SparkContext has been shutdown")
    }
    val callSite = getCallSite
    val cleanedFunc = clean(func)
    logInfo("Starting job: " + callSite.shortForm)
    if (conf.getBoolean("spark.logLineage", false)) {
      logInfo("RDD's recursive dependencies:\n" + rdd.toDebugString)
    }
    dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get)
    progressBar.foreach(_.finishAll())
    rdd.doCheckpoint()
  }

SparkContext.runJob方法首先获取函数的调用位置用于后期日志输出和调试，而后清除func函数闭包以方便函数的序列化处理，调用DAGScheduler.runJob方法，交付作业给DAG调度器。(函数闭包可以理解成一个函数，使其能够读取外部行数的内部变量)

接下来便从SparkContext类转移到了 DAGScheduler类中

def runJob[T, U](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      callSite: CallSite,
      resultHandler: (Int, U) => Unit,
      properties: Properties): Unit = {
    val start = System.nanoTime
    val waiter = submitJob(rdd, func, partitions, callSite, resultHandler, properties)
    waiter.awaitResult() match {
      case JobSucceeded =>
        logInfo("Job %d finished: %s, took %f s".format
          (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
      case JobFailed(exception: Exception) =>
        logInfo("Job %d failed: %s, took %f s".format
          (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
        // SPARK-8644: Include user stack trace in exceptions coming from DAGScheduler.
        val callerStackTrace = Thread.currentThread().getStackTrace.tail
        exception.setStackTrace(exception.getStackTrace ++ callerStackTrace)
        throw exception
    }
  }

runJob函数继续调用了submitJob方法提交一个作业，submitJob函数会返回一个JobWaiter类的实例waiter，其主要被用于两个用途：

1.通过eventProcessLoop发送一个JobCanceled请求消息来取消一个作业的执行

2.阻塞DAGScheduler.runJob所在进程，并等待提交作业执行完成。

JobWaiter类部会监听每一个任务的完成时间，统计任务完成的个数，在每个任务完成之后调用回调函数resultHandle来执行任务得到结果，当作业执行完毕或执行失败后，阻塞停止，返回作业执行结果给runJob方法。

如下是 JobWaiter类下的几个方法

  override def taskSucceeded(index: Int, result: Any): Unit = synchronized {
    if (_jobFinished) {
      throw new UnsupportedOperationException("taskSucceeded() called on a finished JobWaiter")
    }
    resultHandler(index, result.asInstanceOf[T])
    finishedTasks += 1
    if (finishedTasks == totalTasks) {
      _jobFinished = true
      jobResult = JobSucceeded
      this.notifyAll()
    }
  }

  override def jobFailed(exception: Exception): Unit = synchronized {
    _jobFinished = true
    jobResult = JobFailed(exception)
    this.notifyAll()
  }

  def awaitResult(): JobResult = synchronized {
    while (!_jobFinished) {
      this.wait()
    }
    return jobResult
  }

下面继续看 submitJob方法，其代码如下。程序首先确保RDD分区的编号在合法的范围内，并给当前的作业分配了一个编号。对于分区数目为0的RDD，直接返回一个JobWaiter对象，这个对象在阻塞一开始就会立即返回执行成功。对于分区数目不为0的RDD，则新建一个JobWaiter对象，通过eventProcessLoop对象提交一个JobSubmitted，随后也返回一个JobWaiter对象给runJob函数。

def submitJob[T, U](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      callSite: CallSite,
      resultHandler: (Int, U) => Unit,
      properties: Properties): JobWaiter[U] = {
    // Check to make sure we are not launching a task on a partition that does not exist.
    val maxPartitions = rdd.partitions.length
    partitions.find(p => p >= maxPartitions || p < 0).foreach { p =>
      throw new IllegalArgumentException(
        "Attempting to access a non-existent partition: " + p + ". " +
          "Total number of partitions: " + maxPartitions)
    }

    val jobId = nextJobId.getAndIncrement()
    if (partitions.size == 0) {
      // Return immediately if the job is running 0 tasks
      return new JobWaiter[U](this, jobId, 0, resultHandler)
    }

    assert(partitions.size > 0)
    val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]
    val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)
    eventProcessLoop.post(JobSubmitted(
      jobId, rdd, func2, partitions.toArray, callSite, waiter,
      SerializationUtils.clone(properties)))
    waiter
  }

正如前面介绍的，JobSubmitted信号会被DAGSchedulerEventProcessLoop类的doOnReceive接受处理，并调用DAGScheduler.handleJobSubmitted方法处理该信息，具体的 handleJobSubmitted方法代码如下：

 private[scheduler] def handleJobSubmitted(jobId: Int,
      finalRDD: RDD[_],
      func: (TaskContext, Iterator[_]) => _,
      partitions: Array[Int],
      callSite: CallSite,
      listener: JobListener,
      properties: Properties) {
    var finalStage: ResultStage = null
    try {
      // New stage creation may throw an exception if, for example, jobs are run on a
      // HadoopRDD whose underlying HDFS files have been deleted.
      finalStage = newResultStage(finalRDD, func, partitions, jobId, callSite)
    } catch {
      //omission:some code deal with the exception..
    }
    val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
    clearCacheLocs()
    //omission:some code about loginfo
    val jobSubmissionTime = clock.getTimeMillis()
    jobIdToActiveJob(jobId) = job
    activeJobs += job
    finalStage.setActiveJob(job)
    val stageIds = jobIdToStageIds(jobId).toArray
    val stageInfos = stageIds.flatMap(id => stageIdToStage.get(id).map(_.latestInfo))
    listenerBus.post(
      SparkListenerJobStart(job.jobId, jobSubmissionTime, stageInfos, properties))
    submitStage(finalStage)
    submitWaitingStages()
  }

handleJobSubmitted首先做的就是调用newResultStage函数对作业进行阶段划分，得到表示末阶段(Final Stage)的变量finaStage，finaStage内部件存储末阶段的信息，还可能保存了父阶段的信息，而父段又会保存祖父阶段的信息，因此finalStage时间已经保存了希望得到的DAG的信息。(具体细节参看阶段划分)

阶段划分完毕后，程序将当前作业转变为活作业，活作业与普通作业的最大不同在于前者保存了阶段划分的信息(finalState)，此处和Spark1.4的版本有点不同，没有runLocally方法处理仅有一个阶段的action方法，全部统一使用submitStage方法进行处理。

在submitStage方法中，程序会检查传入阶段是否有父阶段尚未执行，如果有，则通过调用submitStage(parent)优先执行父阶段，并将自家放到waitingStages队列中，等待后期被取出执行，如果前面的所有阶段都已经执行完毕，则直接调用submitMissingTasks方法，执行当前阶段内的任务。

  /** Submits stage, but first recursively submits any missing parents. */
  private def submitStage(stage: Stage) {
    val jobId = activeJobForStage(stage)
    if (jobId.isDefined) {
      logDebug("submitStage(" + stage + ")")
      if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
        val missing = getMissingParentStages(stage).sortBy(_.id)
        logDebug("missing: " + missing)
        if (missing.isEmpty) {
          logInfo("Submitting " + stage + " (" + stage.rdd + "), which has no missing parents")
          submitMissingTasks(stage, jobId.get)
        } else {
          for (parent <- missing) {
            submitStage(parent)
          }
          waitingStages += stage
        }
      }
    } else {
      abortStage(stage, "No active job for stage " + stage.id, None)
    }
  }

submitMissingTasks方法负责将一个阶段划分成多个任务并交付集群执行，具体可参看任务调度章节