Spark-Core源码学习记录 5 Task的启动及回顾总结

最新推荐文章于 2019-09-19 10:45:24 发布

御街打码

最新推荐文章于 2019-09-19 10:45:24 发布

阅读量174

点赞数

分类专栏： Spark-Core源码学习记录

本文链接：https://blog.csdn.net/u011372108/article/details/89644727

版权

Spark-Core源码学习记录专栏收录该内容

7 篇文章 0 订阅

订阅专栏

Spark-Core源码学习记录

该系列作为Spark源码回顾学习的记录，旨在捋清Spark分发程序运行的机制和流程，对部分关键源码进行追踪，争取做到知其所以然，对枝节部分源码仅进行文字说明，不深入下钻，避免混淆主干内容。
上一篇章最后，我们来到了Executor中的launchTask方法，在本篇我们将继续进入该方法，完成对Task启动流程的追踪，并对最终结果的处理方法进行查看。

TaskRunner

接上篇最后，直接进入TaskRunner初始化的内容，并查看其重写的run方法：

def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
  // 这里会把TaskDescription和ExecutorBackend（默认是：CoarseGrainedExecutorBackend）
  // 封装成继承Runnable的TaskRunner
  val tr = new TaskRunner(context, taskDescription)
  // 放入负责维护所有正在此executor上运行的task的ConcurrentHashMap中
  runningTasks.put(taskDescription.taskId, tr)
  // 执行TaskRunner
  threadPool.execute(tr)
}
class TaskRunner(
    execBackend: ExecutorBackend,
    private val taskDescription: TaskDescription)
  extends Runnable {
  val taskId = taskDescription.taskId
  val threadName = s"Executor task launch worker for task $taskId"
  private val taskName = taskDescription.name
  /** Whether this task has been finished. */
  @GuardedBy("TaskRunner.this")
  private var finished = false

  override def run(): Unit = {
    ... // run内容比较多设置类的代码，省略，仅保留主干的代码展示，可自行查阅
    // 调用CoarseGrainedExecutorBackend的statusUpdate更新状态方法
	execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)
    // 反序列化出task
    task = ser.deserialize[Task[Any]](
          taskDescription.serializedTask, Thread.currentThread.getContextClassLoader)
    // Run the actual task and measure its runtime.
    // def tryWithSafeFinally[T](block: => T)(finallyBlock: => Unit): T = {}
    // scala 用法
    val value = Utils.tryWithSafeFinally {
      // 调用task的run方法，正式启动task
  	  val res = task.run(
    	taskAttemptId = taskId,
    	attemptNumber = taskDescription.attemptNumber,
    	metricsSystem = env.metricsSystem)
     threwException = false
     res
    } {
	  //其实就是经过封装的finally，内容包括 释放task的锁、内存
	}
	// 序列化返回的结果
	val valueBytes = resultSer.serialize(value)
	...
	// Note: accumulator updates must be collected after TaskMetrics is updated
	// 获取task中累加器的结果
    val accumUpdates = task.collectAccumulatorUpdates()
    // 将返回结果和累加器封装
    val directResult = new DirectTaskResult(valueBytes, accumUpdates)
    val serializedDirectResult = ser.serialize(directResult)
    val resultSize = serializedDirectResult.limit()

	// directSend = sending directly back to the driver
	val serializedResult: ByteBuffer = {
	  // maxResultSize默认1G，在配置中更改
	  if (maxResultSize > 0 && resultSize > maxResultSize) {
	    /** IndirectTaskResult：A reference to a DirectTaskResult that has been stored in the worker's BlockManager. */
	    // 超限时，丢弃结果，返回空引用
	 	ser.serialize(new IndirectTaskResult[Any](TaskResultBlockId(taskId), resultSize))
  	  }else if (resultSize > maxDirectResultSize) {
  	    // 如果大于直接结果限制，就存入blockManager，返回一个引用即可
  	    env.blockManager.putBytes(
              blockId,
              new ChunkedByteBuffer(serializedDirectResult.duplicate()),
              StorageLevel.MEMORY_AND_DISK_SER)
       } else {
         serializedDirectResult
       }
	/* Set the finished flag to true and clear the current thread's interrupt status*/
	setTaskFinishedAndClearInterruptStatus()
	// 再次调用CoarseGrainedExecutorBackend的statusUpdate更新状态方法
	execBackend.statusUpdate(taskId, TaskState.FINISHED, serializedResult)
	} finally {
       runningTasks.remove(taskId)
    }
  }
}

我们只需关注task.run()以及statusUpdate方法，先看statusUpdate：

override def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) {
  val msg = StatusUpdate(executorId, taskId, state, data)
  driver match {
    // 可以看到其实就像driver端发送封装的 StatusUpdate 消息
    case Some(driverRef) => driverRef.send(msg)
    case None => logWarning(s"Drop $msg because has not yet connected to driver")
  }
}
// CoarseGrainedSchedulerBackend作为Driver端实例
override def receive: PartialFunction[Any, Unit] = {
  case StatusUpdate(executorId, taskId, state, data) =>
    // 这里调用的是TaskSchedulerImpl的statusUpdate方法
    // 更新一下Task状态相关的容器，内部有一个TaskResultGetter，用于处理传过来的计算结果。不再展开
    scheduler.statusUpdate(taskId, state, data.value)
    if (TaskState.isFinished(state)) {
      executorDataMap.get(executorId) match {
        case Some(executorInfo) =>
          // 更新cpu信息
          executorInfo.freeCores += scheduler.CPUS_PER_TASK
          // 因为Task运行完后，释放了资源，因此可以当做一个新的executor加入调度
          makeOffers(executorId)
        case None =>
          // Ignoring the update since we don't know about the executor.
          logWarning(s"Ignored task status update ($taskId state $state) " +
            s"from unknown executor with ID $executorId")
      }
    }
}

之后task.run()方法：

  /**
   * Called by [[org.apache.spark.executor.Executor]] to run this task.
   * @return the result of the task along with updates of Accumulators.
   */
  final def run(
      taskAttemptId: Long,
      attemptNumber: Int,
      metricsSystem: MetricsSystem): T = {
	// 封装Task相关信息
	val taskContext = new TaskContextImpl(
      stageId,
      stageAttemptId, // stageAttemptId and stageAttemptNumber are semantically equal
      partitionId,
      taskAttemptId,
      attemptNumber,
      taskMemoryManager,
      localProperties,
      metricsSystem,
      metrics)
    try {
      // 最终调用，有不同实现
      runTask(context)
    } catch {...}

最后通过调用Task实现类的runTask方法，具体ShuffleMapTask和ResultTask有不通的实现:

// ShuffleMapTask
override def runTask(context: TaskContext): U = {
  // Deserialize the RDD using the broadcast variable.
  val threadMXBean = ManagementFactory.getThreadMXBean
  val deserializeStartTimeNs = System.nanoTime()
  val deserializeStartCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
    threadMXBean.getCurrentThreadCpuTime
  } else 0L
  val ser = SparkEnv.get.closureSerializer.newInstance()
  // 返回 rdd和dep
  val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
    ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
  _executorDeserializeTimeNs = System.nanoTime() - deserializeStartTimeNs
  _executorDeserializeCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
    threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime
  } else 0L

  dep.shuffleWriterProcessor.write(rdd, dep, partitionId, context, partition)
}
//ResultTask
override def runTask(context: TaskContext): MapStatus = {
   //...
   // 前面都一样，仅两行代码不同
   // 返回值与上面不通，是rdd和func
   val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](
    ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
   func(context, rdd.iterator(partition, context))
}

以ShuffleMapTask为例，看注释内容

/**
   * The write process for particular partition, it controls the life circle of [[ShuffleWriter]]
   * get from [[ShuffleManager]] and triggers rdd compute, finally return the [[MapStatus]] for
   * this task.
   */
  def write(...): MapStatus = {
    var writer: ShuffleWriter[Any, Any] = null
    try {
    val manager = SparkEnv.get.shuffleManager
      writer = manager.getWriter[Any, Any](
        dep.shuffleHandle,
        partitionId,
        context,
        createMetricsReporter(context))
      /** Write a sequence of records to this task's output */
      writer.write(
        rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
      writer.stop(success = true).get
    }catch {...}
  }

回顾核心内容

作为整个系列的最后，回顾一下从Stage划分到Task运行的流程：

1.Action算子触发任务，以count为例，内部调用dagScheduler的runJob方法
2.runJob内部调用submitJob，内部实例化并返回JobWaiter，同时向event循环队列插入JobSubmitted事件消息
3.eventProcessLoop接受消息后，调用handleJobSubmitted方法，开始划分stage

4.直接根据最后的RDD企图创建ResultStage
4.1根据RDD的宽依赖，划分出末尾的Stage，准备返回实例化ResultStage
4.2实例化的过程是从最后一个RDD开始，然后不停遍历自己的父RDD的依赖，并且查看是否之前持久化过（包括缓存，物化，以及Checkp
oint），直到发现持久化或者到第一个RDD为止，从前往后一次计算直到产生ResultStage
4.3关于宽窄依赖是怎么判断的：其实是每个算子内部的方法判断，当前RDD的partitioner和父RDD的partitioner的属性是否相等，相等
就直接转换成MapPartitionsRDD，就不会产生shuffle了,否则就会生成ShuffledRDD，这个信息记录在每个RDD内部，包括每个分区的数
据位置，用于之后的Task划分
4.4拿到了finalStage，在更新和封装了一些属性后，进入提交Job的入口submitStage(finalStage)
4.5然而内部需要先确保所有的父stage均被实例化，因此getMissingAncestorShuffleDependencies方法会循环遍历实例化shuffleMapSt
age，在这个过程中，从后往前，划分Stage，然后从前往后实例化

5.实例化Stage完成后，进入Task提交部分submitMissingTasks(stage: Stage, jobId: Int)
5.1根据partitionId和这个stage的RDD调用Task最佳位置划分算法
a.从内存，磁盘和堆外查找是否有持久化过 b.判断是否checkpoint过 c.从BlockManager中取 d.遍历RDD的窄依赖，每个partition递归调用getPreferredLocsInternal方法，即从第一个窄依赖的第一个partition开始，然后将每个
partition的最佳位置，添加到序列中，最后返回所有partition的最佳位置序列
5.2生成ShuffleMapTask或者ResultTask
5.3Tasks封装成TaskSet交给taskScheduler提交到各个executor上
5.4创建TaskSetManager，加入到调度池中的schedulableQueue队列里，最终通过CoarseGrainedSchedulerBackend.reviveOffers()
5.5最终触发DriverEnpoint的消息ReviveOffers，内部调用makeOffers()
5.6把过滤好的每个executor元数据简单封装成WorkerOffer递交给TaskSchedulerImpl
5.7scheduler.resourceOffers根据每个task的本地级别，指定executor，返回封装好的TaskDescription
5.8最后回到CoarseGrainedSchedulerBackend，开始为每个Executor分发TaskDescription
5.9Driver端会用每个executorEndpoint的引用发送LaunchTask事件消息，事件消息里封装了序列化后的TaskDescription
5.10对应的executor会调用receive接受到并匹配，最后把TaskDescription封装成继承Java线程的Runnable 
调用用线程池去run，内部调用task.run
5.11内部调用runTask(context)，ShuffleMapTask和ResultTask有不通的实现

参考：

Apache Spark 2.3 源码

御街打码

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark-Core源码学习记录 5 Task的启动及回顾总结

该系列作为Spark源码回顾学习的记录，旨在捋清Spark分发程序运行的机制和流程，对部分关键源码进行追踪，争取做到知其所以然，对枝节部分源码仅进行文字说明，不深入下钻，避免混淆主干内容。
复制链接

扫一扫

专栏目录