关闭

Spark源码阅读笔记之任务调度(一)

标签: spark任务调度源码
557人阅读 评论(0) 收藏 举报
分类:

Spark源码阅读笔记之任务调度(一)

主要讨论spark任务调度的对象。

Spark任务调度的对象从大到小可以分为:应用(Application)、作业(Job)、阶段(Stage)、任务(Task)。用户写的spark程序就是一个应用,应用可以提交到集群(yarn、messos或spark原生集群)运行,一个spark应用运行期间可以执行多个spark作业,作业被分割为多个阶段,每个阶段是完成相同功能的任务的集合。

应用(Application)

应用是用户提交的spark程序,例如wordcount程序

object WordCount {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("WordCount")
    val sc= new SparkContext(conf)
    val rdd=sc.textFile("data.txt") //your data path
    val wordWount=rdd.flatMap(_.split("\\s+")).map((_, 1)).reduceByKey(_ + _)).collect
    sc.stop()
  }
}

作业(Job)

一个spark应用可以提交多个spark作业,RDD的Action算子会触发作业的提交,如上例中的collect。Transformation算子只是进行RDD之间的转换,不触发作业的提交,如上例中的map和flatMap都将原RDD转换为MapPartitionsRDD,reduceByKey一般会将原RDD转换为ShuffledRDD(在特殊情况下会转换为MapPartitionsRDD,从而避免洗牌)。关于RDD的详细介绍请参考Spark源码阅读笔记(RDD)

RDD的collect算子:

/**
   * Return an array that contains all of the elements in this RDD.
   */
  def collect(): Array[T] = {
    val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray)
    Array.concat(results: _*)
  }

RDD的Action算子触发作业提交,会调用SparkContext中提交作业(runJob)的方法,SparkContext所有作业提交的方法最终都调用DAGScheduler中的submitJob或runApproximateJob方法。submitJob方法以非阻塞的形式提交作业,并返回JobWaiter,用户可以调用JobWaiter中的awaitResult方法等待作业完成,并取得结果,或者调用cancel方法来取消作业的运行。runApproximateJob方法以阻塞的形式提交作业,并设定等待时间,当等待时间到达时用户会得到近似或完整(作业运行完)的结果。

submitJob方法:

/**
   * Submit a job to the job scheduler and get a JobWaiter object back. The JobWaiter object
   * can be used to block until the the job finishes executing or can be used to cancel the job.
   */
  def submitJob[T, U](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      callSite: CallSite,
      allowLocal: Boolean,
      resultHandler: (Int, U) => Unit,
      properties: Properties): JobWaiter[U] = {
    // Check to make sure we are not launching a task on a partition that does not exist.
    val maxPartitions = rdd.partitions.length
    partitions.find(p => p >= maxPartitions || p < 0).foreach { p =>
      throw new IllegalArgumentException(
        "Attempting to access a non-existent partition: " + p + ". " +
          "Total number of partitions: " + maxPartitions)
    }

    val jobId = nextJobId.getAndIncrement()
    if (partitions.size == 0) {
      return new JobWaiter[U](this, jobId, 0, resultHandler)
    }

    assert(partitions.size > 0)
    val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]
    val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)
    eventProcessLoop.post(JobSubmitted(
      jobId, rdd, func2, partitions.toArray, allowLocal, callSite, waiter, properties))
    waiter
  }

参数:

参数名:类型:描述

rdd:RDD[T]:job运行的RDD

func:(TaskContext, Iterator[T]) => U:作用于RDD上的函数

partitions:Seq[Int]:需要处理的分区索引

callSite:CallSite:用户代码中调用spark接口的堆栈信息

allowLocal:Boolean:是否允许运行在本地

resultHandler:(Int, U) => Unit:结果处理函数

properties: Properties:进程上下文信息

返回值

类型:描述

JobWaiter[U]:监听作业运行情况的对象

runApproximateJob方法:

def runApproximateJob[T, U, R](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      evaluator: ApproximateEvaluator[U, R],
      callSite: CallSite,
      timeout: Long,
      properties: Properties): PartialResult[R] = {
    val listener = new ApproximateActionListener(rdd, func, evaluator, timeout)
    val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]
    val partitions = (0 until rdd.partitions.size).toArray
    val jobId = nextJobId.getAndIncrement()
    eventProcessLoop.post(JobSubmitted(
      jobId, rdd, func2, partitions, allowLocal = false, callSite, listener, properties))
    listener.awaitResult()    // Will throw an exception if the job fails
  }

参数:

参数名:类型:描述

rdd:RDD[T]:job运行的RDD

func:(TaskContext, Iterator[T]) => U:作用于RDD上的函数

evaluator:ApproximateEvaluator[U, R]:结果近似计算与评价的实现

callSite:CallSite:用户代码中调用spark接口的堆栈信息

timeout:Long:超时时间

properties: Properties:进程上下文信息

返回值

类型:描述

PartialResult[R]:部分结果

提交作业时有几个重要的概念:作用于RDD上的函数(func: (TaskContext, Iterator[T]) => U),作业监听器(JobListener),结果处理函数(resultHandler: (Int, U) => Unit)和结果近似计算器(evaluator: ApproximateEvaluator[U, R])。

作用于RDD上的函数func

RDD由许多Partition组成,每个Partition在计算时会生成相应的Iterator[T]对象,因此RDD的并行计算实际上就是对RDD中每个Partition生成的Iterator[T]对象进行处理,然后进行汇总。func函数处理最终RDD的每个分区生成Iterator[T]对象,并生成类型为U的对象,有多少个分区就会有多少个U对象。func对象一般由RDD的Action算子转换而来,如collect算子生成的func为:(iter: Iterator[T]) => iter.toArray。

结果处理函数resultHandler

对func处理各个分区得到的结果进行汇总,该函数会在JobListener中调用。

作业监听器JobListener:若任务成功完成,DAGScheduler会调用作业监听器的taskSucceeded方法;若任务失败则调用jobFailed方法。

  • 接口
 /**
 - Interface used to listen for job completion or failure events after submitting a job to the
 - DAGScheduler. The listener is notified each time a task succeeds, as well as if the whole
 - job fails (and no further taskSucceeded events will happen).
 */
private[spark] trait JobListener {
  def taskSucceeded(index: Int, result: Any)
  def jobFailed(exception: Exception)
}
  • 有两种实现,JobWaiter和ApproximateActionListener,其中JobWaiter的实现如下:
/**
 - An object that waits for a DAGScheduler job to complete. As tasks finish, it passes their
 - results to the given handler function.
 */
private[spark] class JobWaiter[T](
    dagScheduler: DAGScheduler,
    val jobId: Int,
    totalTasks: Int,
    resultHandler: (Int, T) => Unit)
  extends JobListener {

  private var finishedTasks = 0

  // Is the job as a whole finished (succeeded or failed)?
  @volatile
  private var _jobFinished = totalTasks == 0

  def jobFinished = _jobFinished

  // If the job is finished, this will be its result. In the case of 0 task jobs (e.g. zero
  // partition RDDs), we set the jobResult directly to JobSucceeded.
  private var jobResult: JobResult = if (jobFinished) JobSucceeded else null

  /**
 - Sends a signal to the DAGScheduler to cancel the job. The cancellation itself is handled
 - asynchronously. After the low level scheduler cancels all the tasks belonging to this job, it
 - will fail this job with a SparkException.
   */
  def cancel() {
    dagScheduler.cancelJob(jobId)
  }

  override def taskSucceeded(index: Int, result: Any): Unit = synchronized {
    if (_jobFinished) {
      throw new UnsupportedOperationException("taskSucceeded() called on a finished JobWaiter")
    }
    resultHandler(index, result.asInstanceOf[T])
    finishedTasks += 1
    if (finishedTasks == totalTasks) {
      _jobFinished = true
      jobResult = JobSucceeded
      this.notifyAll()
    }
  }

  override def jobFailed(exception: Exception): Unit = synchronized {
    _jobFinished = true
    jobResult = JobFailed(exception)
    this.notifyAll()
  }

  def awaitResult(): JobResult = synchronized {
    while (!_jobFinished) {
      this.wait()
    }
    return jobResult
  }
}
  • ApproximateActionListener代码
/**
 * A JobListener for an approximate single-result action, such as count() or non-parallel reduce().
 * This listener waits up to timeout milliseconds and will return a partial answer even if the
 * complete answer is not available by then.
 *
 * This class assumes that the action is performed on an entire RDD[T] via a function that computes
 * a result of type U for each partition, and that the action returns a partial or complete result
 * of type R. Note that the type R must *include* any error bars on it (e.g. see BoundedInt).
 */
private[spark] class ApproximateActionListener[T, U, R](
    rdd: RDD[T],
    func: (TaskContext, Iterator[T]) => U,
    evaluator: ApproximateEvaluator[U, R],
    timeout: Long)
  extends JobListener {

  val startTime = System.currentTimeMillis()
  val totalTasks = rdd.partitions.size
  var finishedTasks = 0
  var failure: Option[Exception] = None             // Set if the job has failed (permanently)
  var resultObject: Option[PartialResult[R]] = None // Set if we've already returned a PartialResult

  override def taskSucceeded(index: Int, result: Any) {
    synchronized {
      evaluator.merge(index, result.asInstanceOf[U])
      finishedTasks += 1
      if (finishedTasks == totalTasks) {
        // If we had already returned a PartialResult, set its final value
        resultObject.foreach(r => r.setFinalValue(evaluator.currentResult()))
        // Notify any waiting thread that may have called awaitResult
        this.notifyAll()
      }
    }
  }

  override def jobFailed(exception: Exception) {
    synchronized {
      failure = Some(exception)
      this.notifyAll()
    }
  }

  /**
   * Waits for up to timeout milliseconds since the listener was created and then returns a
   * PartialResult with the result so far. This may be complete if the whole job is done.
   */
  def awaitResult(): PartialResult[R] = synchronized {
    val finishTime = startTime + timeout
    while (true) {
      val time = System.currentTimeMillis()
      if (failure.isDefined) {
        throw failure.get
      } else if (finishedTasks == totalTasks) {
        return new PartialResult(evaluator.currentResult(), true)
      } else if (time >= finishTime) {
        resultObject = Some(new PartialResult(evaluator.currentResult(), false))
        return resultObject.get
      } else {
        this.wait(finishTime - time)
      }
    }
    // Should never be reached, but required to keep the compiler happy
    return null
  }
}
  • 结果近似计算器
/**
 * An object that computes a function incrementally by merging in results of type U from multiple
 * tasks. Allows partial evaluation at any point by calling currentResult().
 */
private[spark] trait ApproximateEvaluator[U, R] {
  def merge(outputId: Int, taskResult: U): Unit
  def currentResult(): R
}
0
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:6352次
    • 积分:278
    • 等级:
    • 排名:千里之外
    • 原创:10篇
    • 转载:0篇
    • 译文:0篇
    • 评论:0条
    文章分类
    文章存档