spark源码(十四)task原理剖析

最新推荐文章于 2021-12-15 21:24:27 发布

Rico-Coding

最新推荐文章于 2021-12-15 21:24:27 发布

阅读量175

点赞数

分类专栏： Spark源码

本文链接：https://blog.csdn.net/weixin_37850264/article/details/112476279

版权

Spark源码专栏收录该内容

15 篇文章 2 订阅

订阅专栏

文章目录

原理

原理

在这里插入图片描述
Executor：

def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
    //实例化一个TaskRunner对象来执行Task
    val tr = new TaskRunner(context, taskDescription)
    //将Task加入到正在运行的Task队列
    runningTasks.put(taskDescription.taskId, tr)

    /**
      * 任务最终运行的地方
      * 在worker的线程池中,
      * 线程池调用org.apache.spark.executor.Executor.TaskRunner.run（）方法运行任务
      * 就是这个文件中的内部类
      */
    threadPool.execute(tr)
  }


    /**
      * 　　这个方法是Executor执行task的主要方法。Task在Executor中执行完成后，会通过向Driver发送StatusUpdate的消息来
      * 通知Driver任务的状态更新为TaskState.FINISHED。
      *
      * 在Executor运行Task时，得到计算结果会存入org.apache.spark.scheduler.DirectTaskResult。在将结果传回Driver时，
      * 会根据结果的大小有不同的策略：对于较大的结果，将其以taskId为key存入org.apache.storage.BlockManager，如果结果不大，
      * 则直接回传给Driver。回传是通过AKKA来实现的，所以能够回传的值会有一个由AKKA限制的大小，这里涉及到一个参数
      * spark.akka.frameSize，默认为128，单位为Byte，在源码中最终转换成了128MB。表示AKKA最大能传递的消息大小为128MB，
      * 但是同时AKKA会保留一部分空间用于存储其他数据，这部分的大小为200KB，那么结果如果小于128MB - 200KB的话就可以直接返回该值，
      * 否则的话，在不大于1G的情况下（可以通过参数spark.driver.maxResultSize来修改，默认为1g），会通过BlockManager来传递。
      * 详细信息会在Executor模块中描述。完整情况如下：
      * （1）如果结果大于1G，直接丢弃
      * （2）如果结果小于等于1G，大于128MB - 200KB，通过BlockManager记录结果的tid和其他信息
      * （3）如果结果小于128MB - 200 KB，直接返回该值
      * */
    override def run(): Unit = {
      threadId = Thread.currentThread.getId
      Thread.currentThread.setName(threadName)
      // 返回Java虚拟机的线程系统的托管bean。
      val threadMXBean = ManagementFactory.getThreadMXBean
      //为我们的Task创建内存管理器 ==》 管理单个任务分配的内存。
      val taskMemoryManager = new TaskMemoryManager(env.memoryManager, taskId)
      //记录反序列化时间
      val deserializeStartTime = System.currentTimeMillis()
      //测试如果Java虚拟机支持当前线程的CPU时间度量。这个貌似很高深，不理解先放着？
      val deserializeStartCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
        threadMXBean.getCurrentThreadCpuTime
      } else 0L
      //加载具体类时需要用到ClassLoader
      Thread.currentThread.setContextClassLoader(replClassLoader)
      //创建序列化器
      val ser = env.closureSerializer.newInstance()
      logInfo(s"Running $taskName (TID $taskId)")
      // 更新任务的 状态
      // 开始执行Task，
      // yarn-client模式下，调用CoarseGrainedExecutorBackend的statusUpdate方法
      // 将该Task的运行状态置为RUNNING
      // 调用ExecutorBackend#statusUpdate向Driver发信息汇报当前状态
      // 以local为例：那么这里调用的是LocalSchedulerBackend中的statusUpdate方法
      execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)
      //记录运行时间和GC信息
      var taskStart: Long = 0
      var taskStartCpu: Long = 0
      startGCTime = computeTotalGcTime()

      try {
        // Must be set before updateDependencies() is called, in case fetching dependencies
        // requires access to properties contained within (e.g. for access control).
        // 必须在updateDependencies()调用之前设置，以防获取依赖关系需要访问包含在(例如访问控制)中的属性。
        Executor.taskDeserializationProps.set(taskDescription.properties)

        //下载Task运行缺少的依赖。
        updateDependencies(taskDescription.addedFiles, taskDescription.addedJars)
        //反序列化Task
        task = ser.deserialize[Task[Any]](
          taskDescription.serializedTask, Thread.currentThread.getContextClassLoader)
        task.localProperties = taskDescription.properties
        //设置Task运行时的MemoryManager
        task.setTaskMemoryManager(taskMemoryManager)

        // If this task has been killed before we deserialized it, let's quit now. Otherwise,
        // continue executing the task.
        // 如果在序列化之前杀掉任务了，那么我们退出，否则继续执行任务
        // 要运行的任务，是不是指定被杀死了，比如，我提交了一个任务，刚刚提交发现错了，直接ctrl+c终止程序了，这时候任务还没运行，相当于指定这个任务被杀死
        val killReason = reasonIfKilled
        // 判断如果该task被kill了，直接抛出异常
        if (killReason.isDefined) {
          // Throw an exception rather than returning, because returning within a try{} block
          // causes a NonLocalReturnControl exception to be thrown. The NonLocalReturnControl
          // exception will be caught by the catch block, leading to an incorrect ExceptionFailure
          // for the task.
          throw new TaskKilledException(killReason.get)
        }

        logDebug("Task " + taskId + "'s epoch is " + task.epoch)
        env.mapOutputTracker.updateEpoch(task.epoch)

        // Run the actual task and measure its runtime.
        //运行的实际任务，并测量它的运行时间。
        taskStart = System.currentTimeMillis()
        taskStartCpu = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
          threadMXBean.getCurrentThreadCpuTime
        } else 0L
        var threwException = true
        val value = try {
          /** 调用Task.run方法，开始运行task */
          val res = task.run(
            taskAttemptId = taskId,
            attemptNumber = taskDescription.attemptNumber,
            metricsSystem = env.metricsSystem)
          threwException = false
          res
        } finally {
          //清理所有分配的内存和分页,并检测是否有内存泄漏
          val releasedLocks = env.blockManager.releaseAllLocksForTask(taskId)
          val freedMemory = taskMemoryManager.cleanUpAllAllocatedMemory()

          if (freedMemory > 0 && !threwException) {
            val errMsg = s"Managed memory leak detected; size = $freedMemory bytes, TID = $taskId"
            if (conf.getBoolean("spark.unsafe.exceptionOnMemoryLeak", false)) {
              throw new SparkException(errMsg)
            } else {
              logWarning(errMsg)
            }
          }

          if (releasedLocks.nonEmpty && !threwException) {
            val errMsg =
              s"${releasedLocks.size} block locks were not released by TID = $taskId:\n" +
                releasedLocks.mkString("[", ", ", "]")
            if (conf.getBoolean("spark.storage.exceptionOnPinLeak", false)) {
              throw new SparkException(errMsg)
            } else {
              logInfo(errMsg)
            }
          }
        }
        task.context.fetchFailed.foreach { fetchFailure =>
          // uh-oh.  it appears the user code has caught the fetch-failure without throwing any
          // other exceptions.  Its *possible* this is what the user meant to do (though highly
          // unlikely).  So we will log an error and keep going.
          logError(s"TID ${taskId} completed successfully though internally it encountered " +
            s"unrecoverable fetch failures!  Most likely this means user code is incorrectly " +
            s"swallowing Spark's internal ${classOf[FetchFailedException]}", fetchFailure)
        }

        //记录Task完成时间
        val taskFinish = System.currentTimeMillis()
        val taskFinishCpu = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
          threadMXBean.getCurrentThreadCpuTime
        } else 0L

        // If the task has been killed, let's fail it.
        task.context.killTaskIfInterrupted()

        //否则序列化得到的Task执行的结果
        val resultSer = env.serializer.newInstance()
        val beforeSerialization = System.currentTimeMillis()
        val valueBytes = resultSer.serialize(value)
        val afterSerialization = System.currentTimeMillis()


        //记录相关的metrics
        // Deserialization happens in two parts: first, we deserialize a Task object, which
        // includes the Partition. Second, Task.run() deserializes the RDD and function to be run.
        task.metrics.setExecutorDeserializeTime(
          (taskStart - deserializeStartTime) + task.executorDeserializeTime)
        task.metrics.setExecutorDeserializeCpuTime(
          (taskStartCpu - deserializeStartCpuTime) + task.executorDeserializeCpuTime)
        // We need to subtract Task.run()'s deserialization time to avoid double-counting
        task.metrics.setExecutorRunTime((taskFinish - taskStart) - task.executorDeserializeTime)
        task.metrics.setExecutorCpuTime(
          (taskFinishCpu - taskStartCpu) - task.executorDeserializeCpuTime)
        task.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime)
        task.metrics.setResultSerializationTime(afterSerialization - beforeSerialization)

        // Note: accumulator updates must be collected after TaskMetrics is updated
        val accumUpdates = task.collectAccumulatorUpdates()

        //创建直接返回给Driver的结果对象DirectTaskResult
        // 生成DirectTaskResult对象，并序列化Task的运行结果
        // TODO: do not serialize value twice
        val directResult = new DirectTaskResult(valueBytes, accumUpdates)
        val serializedDirectResult = ser.serialize(directResult)
        val resultSize = serializedDirectResult.limit

        // directSend = sending directly back to the driver
        // 如果序列化后的结果比spark.driver.maxResultSize配置的还大，直接丢弃该结果
        val serializedResult: ByteBuffer = {
          //对直接返回的结果对象大小进行判断
          if (maxResultSize > 0 && resultSize > maxResultSize) {
            // 大于最大限制1G，直接丢弃ResultTask
            logWarning(s"Finished $taskName (TID $taskId). Result is larger than maxResultSize " +
              s"(${Utils.bytesToString(resultSize)} > ${Utils.bytesToString(maxResultSize)}), " +
              s"dropping it.")
            ser.serialize(new IndirectTaskResult[Any](TaskResultBlockId(taskId), resultSize))
            // 如果序列化后的结果小于上面的配置，而大于spark.akka.frameSize - 200KB
            // 结果通过BlockManager回传
          } else if (resultSize > maxDirectResultSize) {
            // 结果大小大于设定的阀值，则放入BlockManager中
            val blockId = TaskResultBlockId(taskId)
            env.blockManager.putBytes(
              blockId,
              new ChunkedByteBuffer(serializedDirectResult.duplicate()),
              StorageLevel.MEMORY_AND_DISK_SER)
            logInfo(
              s"Finished $taskName (TID $taskId). $resultSize bytes result sent via BlockManager)")
            // 返回非直接返回给Driver的对象TaskResultTask
            ser.serialize(new IndirectTaskResult[Any](blockId, resultSize))
          } else {
            // 结果不大，直接传回给Driver
            // 如果结果小于spark.akka.frameSize - 200KB，则可通过AKKA直接返回Task的该执行结果
            logInfo(s"Finished $taskName (TID $taskId). $resultSize bytes result sent to driver")
            serializedDirectResult
          }
        }

        setTaskFinishedAndClearInterruptStatus()

        /**
          * 更新当前Task的状态为finished    //通知Driver Task已完成
          * 调用ExecutorBackend.statusUpdate（） ==》 org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate（）
          */
        execBackend.statusUpdate(taskId, TaskState.FINISHED, serializedResult)

      } catch {
        case t: Throwable if hasFetchFailure && !Utils.isFatalError(t) =>
          val reason = task.context.fetchFailed.get.toTaskFailedReason
          if (!t.isInstanceOf[FetchFailedException]) {
            // there was a fetch failure in the task, but some user code wrapped that exception
            // and threw something else.  Regardless, we treat it as a fetch failure.
            val fetchFailedCls = classOf[FetchFailedException].getName
            logWarning(s"TID ${taskId} encountered a ${fetchFailedCls} and " +
              s"failed, but the ${fetchFailedCls} was hidden by another " +
              s"exception.  Spark is handling this like a fetch failure and ignoring the " +
              s"other exception: $t")
          }
          setTaskFinishedAndClearInterruptStatus()
          execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))

        case t: TaskKilledException =>
          logInfo(s"Executor killed $taskName (TID $taskId), reason: ${t.reason}")
          setTaskFinishedAndClearInterruptStatus()
          execBackend.statusUpdate(taskId, TaskState.KILLED, ser.serialize(TaskKilled(t.reason)))

        case _: InterruptedException | NonFatal(_) if
        task != null && task.reasonIfKilled.isDefined =>
          val killReason = task.reasonIfKilled.getOrElse("unknown reason")
          logInfo(s"Executor interrupted and killed $taskName (TID $taskId), reason: $killReason")
          setTaskFinishedAndClearInterruptStatus()
          execBackend.statusUpdate(
            taskId, TaskState.KILLED, ser.serialize(TaskKilled(killReason)))

        case CausedBy(cDE: CommitDeniedException) =>
          val reason = cDE.toTaskFailedReason
          setTaskFinishedAndClearInterruptStatus()
          execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))

        case t: Throwable =>
          // Attempt to exit cleanly by informing the driver of our failure.
          // If anything goes wrong (or this was a fatal exception), we will delegate to
          // the default uncaught exception handler, which will terminate the Executor.
          logError(s"Exception in $taskName (TID $taskId)", t)

          // Collect latest accumulator values to report back to the driver
          val accums: Seq[AccumulatorV2[_, _]] =
            if (task != null) {
              task.metrics.setExecutorRunTime(System.currentTimeMillis() - taskStart)
              task.metrics.setJvmGCTime(computeTotalGcTime() - startGCTime)
              task.collectAccumulatorUpdates(taskFailed = true)
            } else {
              Seq.empty
            }

          val accUpdates = accums.map(acc => acc.toInfo(Some(acc.value), None))

          val serializedTaskEndReason = {
            try {
              ser.serialize(new ExceptionFailure(t, accUpdates).withAccums(accums))
            } catch {
              case _: NotSerializableException =>
                // t is not serializable so just send the stacktrace
                ser.serialize(new ExceptionFailure(t, accUpdates, false).withAccums(accums))
            }
          }
          setTaskFinishedAndClearInterruptStatus()
          execBackend.statusUpdate(taskId, TaskState.FAILED, serializedTaskEndReason)

          // Don't forcibly exit unless the exception was inherently fatal, to avoid
          // stopping other tasks unnecessarily.
          if (Utils.isFatalError(t)) {
            uncaughtExceptionHandler.uncaughtException(Thread.currentThread(), t)
          }

      } finally {
        // 总runnint状态的task列表中将该task移除 //将Task从运行队列中去除
        runningTasks.remove(taskId)
      }
    }

    private def hasFetchFailure: Boolean = {
      task != null && task.context != null && task.context.fetchFailed.isDefined
    }
  }

 *   代码逻辑非常简单，概述如下：
        1、需要创建一个Task上下文实例，即TaskContextImpl类型的context，这个TaskContextImpl主要包括以下内容：Task所属Stage的stageId、Task对应数据分区的partitionId、Task执行的taskAttemptId、Task执行的序号attemptNumber、Task内存管理器taskMemoryManager、指标度量系统metricsSystem、内部累加器internalAccumulators、是否本地运行的标志位runningLocally（为false）；
        2、将context放入TaskContext的taskContext变量中，这个taskContext变量为ThreadLocal[TaskContext]；
        3、在任务上下文context中设置主机名localHostName、内部累加器internalAccumulators等Metrics信息；
        4、设置task线程为当前线程；
        5、如果需要杀死task，调用kill()方法，且调用的方式为不中断线程；
        6、调用runTask()方法，传入Task上下文信息context，执行Task，并调用Task上下文的collectAccumulators()方法，收集累加器；
        7、最后，任务上下文context标记Task完成，为unrolling块释放当前线程使用的内存，清楚任务上下文等。
   */
  final def run(
      taskAttemptId: Long,
      attemptNumber: Int,
      metricsSystem: MetricsSystem): T = {
    SparkEnv.get.blockManager.registerTask(taskAttemptId)

    // 创建一个Task上下文实例：TaskContextImpl类型的context
    context = new TaskContextImpl(
      stageId,
      partitionId,
      taskAttemptId,
      attemptNumber,
      taskMemoryManager,
      localProperties,
      metricsSystem,
      metrics)

    //  将context放入TaskContext的taskContext变量中
    // taskContext变量为ThreadLocal[TaskContext]
    TaskContext.setTaskContext(context)

    // task线程为当前线程
    taskThread = Thread.currentThread()

    if (_reasonIfKilled != null) {
      // 如果需要杀死task，调用kill()方法，且调用的方式为不中断线程
      kill(interruptThread = false, _reasonIfKilled)
    }

    // 这个不知道是干嘛的？
    new CallerContext(
      "TASK",
      SparkEnv.get.conf.get(APP_CALLER_CONTEXT),
      appId,
      appAttemptId,
      jobId,
      Option(stageId),
      Option(stageAttemptId),
      Option(taskAttemptId),
      Option(attemptNumber)).setCurrentContext()

    try {
      // 调用runTask()方法，传入Task上下文信息context，执行Task，并调用Task上下文的collectAccumulators()方法，收集累加器
      runTask(context)
    } catch {
      case e: Throwable =>
        // Catch all errors; run task failure callbacks, and rethrow the exception.
        try {
          context.markTaskFailed(e)
        } catch {
          case t: Throwable =>
            e.addSuppressed(t)
        }
        // 上下文标记Task完成
        context.markTaskCompleted(Some(e))
        throw e
    } finally {
      try {
        // Call the task completion callbacks. If "markTaskCompleted" is called twice, the second
        // one is no-op.
        context.markTaskCompleted(None)
      } finally {
        try {
          Utils.tryLogNonFatalError {
            // Release memory used by this thread for unrolling blocks
            // 为unrolling块释放当前线程使用的内存
            SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(MemoryMode.ON_HEAP)
            SparkEnv.get.blockManager.memoryStore.releaseUnrollMemoryForThisTask(
              MemoryMode.OFF_HEAP)
            // Notify any tasks waiting for execution memory to be freed to wake up and try to
            // acquire memory again. This makes impossible the scenario where a task sleeps forever
            // because there are no other tasks left to notify it. Since this is safe to do but may
            // not be strictly necessary, we should revisit whether we can remove this in the
            // future.
            val memoryManager = SparkEnv.get.memoryManager
            memoryManager.synchronized { memoryManager.notifyAll() }
          }
        } finally {
          // Though we unset the ThreadLocal here, the context member variable itself is still
          // queried directly in the TaskRunner to check for FetchFailedExceptions.
          // 释放TaskContext
          TaskContext.unset()
        }
      }
    }
  }

runTask(context: TaskContext)是接口由shufflemaptask和resulttask实现

1个shuffleMapTask会将一个RDD的元素，切分为多个 bucket
*基于一个在shuffledEpendency中指定的 partitioner，默认呢，就是 HashPartitioner

 /**
    *
    * 运行的主要逻辑其实只有两步，如下：
        1、通过使用广播变量反序列化得到RDD和ShuffleDependency：
              1.1、获得反序列化的起始时间deserializeStartTime；
              1.2、通过SparkEnv获得反序列化器ser；
              1.3、调用反序列化器ser的deserialize()进行RDD和ShuffleDependency的反序列化，数据来源于taskBinary，得到rdd、dep；
              1.4、计算Executor进行反序列化的时间_executorDeserializeTime；
         2、利用shuffleManager的writer进行数据的写入：
               2.1、通过SparkEnv获得shuffleManager；
               2.2、通过shuffleManager的getWriter()方法，获得shuffle的writer，其中的partitionId表示的是当前RDD的某个partition,也就是说write操作作用于partition之上；
               2.3、针对RDD中的分区partition，调用rdd的iterator()方法后，再调用writer的write()方法，写数据；
               2.4、停止writer，并返回标志位。
    */
  override def runTask(context: TaskContext): MapStatus = {
    // Deserialize the RDD using the broadcast variable.
    val threadMXBean = ManagementFactory.getThreadMXBean
    // 反序列化的起始时间
    //对task要处理的d相关的数据，做一些反序列化操作
	//这个RDD，关键问题是，你是怎么拿到的？？？
	//因为大家知道，多个task运行在多个 executor中，都是并行运行，或者并发运行的
	//可能都不在一个地方
	//但是呢？一个stage的task，其实要处理的RDD是一样的
	//所以task怎么拿到自己要处理的那个rdd的数据呢？
	//这里呢，会通过 broadcast variable，直接拿到
    val deserializeStartTime = System.currentTimeMillis()
    val deserializeStartCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
      threadMXBean.getCurrentThreadCpuTime
    } else 0L

    // 获得反序列化器closureSerializer
    val ser = SparkEnv.get.closureSerializer.newInstance()
    // 调用反序列化器closureSerializer的deserialize()进行RDD和ShuffleDependency的反序列化，数据来源于taskBinary
    val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
    // 计算Executor进行反序列化的时间
    _executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime
    _executorDeserializeCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
      threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime
    } else 0L

    var writer: ShuffleWriter[Any, Any] = null
    try {
      // 获得shuffleManager
      val manager = SparkEnv.get.shuffleManager
      // 通过shuffleManager的getWriter()方法，获得shuffle的writer
      // 启动的partitionId表示的是当前RDD的某个partition,也就是说write操作作用于partition之上
      writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
      // 针对RDD中的分区<span style="font-family: Arial, Helvetica, sans-serif;">partition</span>
      // <span style="font-family: Arial, Helvetica, sans-serif;">，调用rdd的iterator()方法后，再调
      // 用writer的write()方法，写数据</span>
      writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
      // 停止writer，并返回标志位
      writer.stop(success = true).get
    } catch {
      case e: Exception =>
        try {
          if (writer != null) {
            writer.stop(success = false)
          }
        } catch {
          case e: Exception =>
            log.debug("Could not stop writer", e)
        }
        throw e
    }
  }

//最最重要的一行代码就在这里
//首先调用了，r的 iterator（）方法，并且传入了，当前tagk要处理哪个 partition
//所以，核心的逻辑，就在xc的 Lterator（）方法中，在这里，就实现了针对rd的某个 partition，执行我们自己定义的
//算子，或者是函数
//执行完了我们自己定义的算子，或者函数，是不是相当于是，针对rdd的 partition执行了处理，那么，是不是会有返回
//的数据？？
//ok，返回的数据，都是通过shuffleWriter，经过H手 partitions进行分区之后，写入自己对应的分区bucket
writer.write（rdd iterator （partition context）.asInstanceof[Iterator] < product2 [Any， Any]]]）
//最后，返回结果， Mapstatus
// Mapstatus里面封装了 ShuffleMapTask计算后的数据，存储在哪里，其实就是s。 maNager相关的信息
//B1 mAnager，是 Spark底层的内存、数据、磁盘数据管理的组件
//讲完shuf1e之后，我们就来剖析B1 cranage

 /**
    * Internal method to this RDD; will read from cache if applicable, or otherwise compute it.
    * This should ''not'' be called by users directly, but is available for implementors of custom
    * subclasses of RDD.
    *
    * RDD的内部方法；将从缓存中读取如果适用，否则计算。这应该''not''被直接用户，但可用于RDD的自定义类实现。
    *
    *   MappedRDD的iterator方法实际上是父类RDD的iterator方法。如果分区任务初次执行，此时还没有缓存，所以会调用
    * computeOrReadCheckpoint方法。
    *   这里需要说明一下iterator方法的容错处理过程；如果某个分区任务执行失败，但是其他分区任务执行成功，可以利用DAG重新调度，
    * 失败的分区任务将从检查点恢复状态，而那些执行成功的分区任务由于其他执行结果已经缓存到存储体系，所以调用CacheManager的
    * getOrCompute方法获取即可，不需要再次执行。
    */
  final def iterator(split: Partition, context: TaskContext): Iterator[T] = {
    if (storageLevel != StorageLevel.NONE) {
      // 如果存储级别不是NONE，那么先检查是否有缓存，没有缓存则需要进行计算
      getOrCompute(split, context)
    } else {
      // 如果有checkpoint,那么直接读取结果，否则直接进行计算
      computeOrReadCheckpoint(split, context)
    }
  }

/**
    * Gets or computes an RDD partition. Used by RDD.iterator() when an RDD is cached.
    *
    * 得到或者计算一个RDD分区，使用 RDD.iterator() 当一个RDD是被缓存中
    *
    * 在任务迭代计算的过程中，当判断村粗级别使用了缓存，就会调用CacheManager的getOrCompute方法。
    *
    * 处理逻辑：
    *   1.从存储体系获取Block;
    *   2.如果确实获取到了Block，那么将它封装为InterruptibleIterator并且返回。如果还没有缓存Block，
    *   则重新计算或者从CheckPoint中获取数据，调用putInBlockManager方法将数据写入缓存后封装为InterruptibleIterator并且返回。
    *
    */
  private[spark] def getOrCompute(partition: Partition, context: TaskContext): Iterator[T] = {
    // 获取RDD的BlockId
    val blockId = RDDBlockId(id, partition.index)
    var readCachedBlock = true
    // This method is called on executors, so we need call SparkEnv.get instead of sc.env.
    // 这种方法被executors调用，所以我们需要调用sparkenv.get代替sc.env方法。
    SparkEnv.get.blockManager.getOrElseUpdate(blockId, storageLevel, elementClassTag, () => {
      readCachedBlock = false
      // 在存在检查点时直接获取中间结果，否则需要调用compute继续计算
      computeOrReadCheckpoint(partition, context)
    }) match {
      // 向BlockManager查询是否有缓存
      case Left(blockResult) =>
        if (readCachedBlock) {
          val existingMetrics = context.taskMetrics().inputMetrics
          existingMetrics.incBytesRead(blockResult.bytes)
          new InterruptibleIterator[T](context, blockResult.data.asInstanceOf[Iterator[T]]) {
            override def next(): T = {
              existingMetrics.incRecordsRead(1)
              delegate.next()
            }
          }
        } else {
          new InterruptibleIterator(context, blockResult.data.asInstanceOf[Iterator[T]])
        }
      case Right(iter) =>
        new InterruptibleIterator(context, iter.asInstanceOf[Iterator[T]])
    }
  }

private[spark] def computeOrReadCheckpoint(split: Partition, context: TaskContext): Iterator[T] =
  {
    // 判断这个RDD是否建立检查点和实现，是可靠的或局部的。
    if (isCheckpointedAndMaterialized) {
      firstParent[T].iterator(split, context)
    } else {
      // MappedRDD的compute
      compute(split, context)
    }
  }

这里就很有含义到了
compute实际上，什么意思？
就是，针对RDD中的某个 partition执行我们给这个RDD定义的算子和函数
我们定义的算子和函数，是什么东东？？我们是不是在这里没有看到啊！！
这个f，你可以理解成我们自己定义的算子和函数，但是呢， Spark内部进行了封装的，还实现了一些其他的逻辑
调用到这里为止，其实就是在针对rdd的 partition，执行自定义的计算操作:
MapPartitionsRDD类

  override def compute(split: Partition, context: TaskContext): Iterator[U] =
    f(context, split.index, firstParent[T].iterator(split, context))

继续Task的run方法

        /**
          * 更新当前Task的状态为finished    //通知Driver Task已完成
          * 调用ExecutorBackend.statusUpdate（） ==》 org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate（）
          */
        execBackend.statusUpdate(taskId, TaskState.FINISHED, serializedResult)

  /**
    * exxecutor状态改变事件
    *
    * @param taskId
    * @param state
    * @param data
    */
  override def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) {
    val msg = StatusUpdate(executorId, taskId, state, data)
    driver match {
      /**
        * 往DriverEndpoint发送状态更新消息StatusUpdate,
        * StandLone模式下由org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.DriverEndpoint.receive方法处理
        */
      case Some(driverRef) => driverRef.send(msg)
      case None => logWarning(s"Drop $msg because has not yet connected to driver")
    }
  }

Rico-Coding

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark源码(十四)task原理剖析

文章目录原理原理Executor：def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = { //实例化一个TaskRunner对象来执行Task val tr = new TaskRunner(context, taskDescription) //将Task加入到正在运行的Task队列 runningTasks.put(taskDescription.t
复制链接

扫一扫