Spark源码-任务提交流程-8-DAGScheduler任务切分

26 篇文章 0 订阅
25 篇文章 2 订阅

1.概述

​1、用户类的main方法是开发人员开发的用户应用入口,以main方法为入口定后后续计算逻辑;定义应用计算逻辑从实例化SparkContext对象开始;在SparkContext类中,定义由_dagScheduler属性,该属性在实例化SparkContext对象时初始化(实例化DAGScheduler对象);

​2、在通过spark-submit命令提交应用后,由driver线程执行应用的用户类main方法;及由driver线程完成DAGScheduler对象实例化工作;

2.DAGScheduler实例化

1、在DAGScheduler实例化过程中,会初始化一个事件循环处理器;

2、循环处理器中维护了一个事件阻塞队列;并在run方法中定义了事件处理入口;

3、在实例化的最后,会将事件循环处理器启动起来,执行处理器run方法;

4、以循环处理器run方法为入口,从处理器的事件阻塞队列中提取事件进行处理;

6、针对job提交事件,交个dagscheduler的handleJobSubmitted方法处理;

7、在handleJobSubmitted中,根据事件构建ResultStage对象finalStage,然后构建job,将job绑定到finalStage;

8、从finalStage向上追溯,对真个action操作的计算链路进行stage划分(遇到宽依赖划分一个stage);

9、针对划分后的stage依赖链,从上向下处理:根据分区封装task,一个分区一个task,一个stage一个taskSet;然后哦以taskSet为单位,发送个taskScheduler进行后续调度处理;

注意:

​ 由于事件处理器中维护的队列实阻塞队列,事件处理器的线程在从阻塞独立提取事件时,如果队列中没有事件,无法提取,会阻塞线程向后执行,知道新的事件加入阻塞队列被线程提取出来,线程才会向后继续执行;在线程阻塞期间,线程不会关闭;

spark源码-任务提交流程之-DAGScheduler实例化.png

2.1.入口

​ 实例化SparkContext对象过程中,实例化DAGScheduler对象赋值到SparkContext的_dagScheduler属性上,晚点_dagScheduler属性初始化;

class SparkContext(config: SparkConf) extends Logging {
  
  //。。。。。其他代码。。。。。。。
    
  //有向无环图调度器
  @volatile private var _dagScheduler: DAGScheduler = _
 
  //。。。。。其他代码。。。。。。。
    
  try {
  	//。。。。。其他代码。。。。。。。
    
    //初始化dagScheduler
    _dagScheduler = new DAGScheduler(this)
    
    //。。。。。其他代码。。。。。。。
    
  } catch {
    //。。。。。其他代码。。。。。。。
  }
}

2.2.实例化DAGScheduler逻辑

2.2.1.初始化各种属性

​ 1、初始化各类映射关系缓存器:jobIdToStageIds,stageIdToStage,shuffleIdToMapStage,jobIdToActiveJob,cacheLocs,failedEpoch;都是以HashMap方式进行缓存;

​ 2、初始化各类stage缓存器:父stage还没有处理结束的stage、正在进行处理的stage、处理失败的stage;都是以HashSet方式进行缓存;

​ 3、初始化消息调度线程池;

​ 4、事件循环处理器:这是一个事件处理线程(阻塞线程);

​ 5、将dagScheduler对象绑定到taskScheduler对象中;

​ 6、启动事件循环处理器,执行处理器run方法:从阻塞队列获取待处理事件,调用onReceive方法处;

private[spark] class DAGScheduler(
    private[scheduler] val sc: SparkContext,
    private[scheduler] val taskScheduler: TaskScheduler,
    listenerBus: LiveListenerBus,
    mapOutputTracker: MapOutputTrackerMaster,
    blockManagerMaster: BlockManagerMaster,
    env: SparkEnv,
    clock: Clock = new SystemClock())
  extends Logging {

  def this(sc: SparkContext, taskScheduler: TaskScheduler) = {
    this(
      sc,
      taskScheduler,
      sc.listenerBus,
      sc.env.mapOutputTracker.asInstanceOf[MapOutputTrackerMaster],
      sc.env.blockManager.master,
      sc.env)
  }

  def this(sc: SparkContext) = this(sc, sc.taskScheduler)

  private[spark] val metricsSource: DAGSchedulerSource = new DAGSchedulerSource(this)

  //job、stage的计数
  private[scheduler] val nextJobId = new AtomicInteger(0)
  private[scheduler] def numTotalJobs: Int = nextJobId.get()
  private val nextStageId = new AtomicInteger(0)

  //job与stage的映射关系
  private[scheduler] val jobIdToStageIds = new HashMap[Int, HashSet[Int]]
  
  //stage缓存
  private[scheduler] val stageIdToStage = new HashMap[Int, Stage]

  //映射关系:shuffleID ---> 依赖该shuffle生成数据的ShuffleMapStage 
  //只包括当前运行job的stage(当job完成后,映射将被删除,shuffle数据的唯一记录将在MapOutputTracker中)
  private[scheduler] val shuffleIdToMapStage = new HashMap[Int, ShuffleMapStage]
    
  //映射关系:jobId ---> 活动job
  private[scheduler] val jobIdToActiveJob = new HashMap[Int, ActiveJob]

  // stage缓存:父stage还没有处理结束的stage
  private[scheduler] val waitingStages = new HashSet[Stage]

  // stage缓存:正在进行处理的stage
  private[scheduler] val runningStages = new HashSet[Stage]

  // stage缓存:处理失败的stage
  private[scheduler] val failedStages = new HashSet[Stage]

  //job缓存:活跃的job
  private[scheduler] val activeJobs = new HashSet[ActiveJob]

  //rdd ID ---> 分区位置索引数组
  //IndexedSeq[Seq[TaskLocation]]:每个rdd的分区所在位置的集合(数组)
  private val cacheLocs = new HashMap[Int, IndexedSeq[Seq[TaskLocation]]]

 
  //MapOutputTracker的epoch编号 ---> 失败的executor
	//检测到节点失败时,缓存失败信息
  //根据此缓存添加新的任务进行再次执行
  private val failedEpoch = new HashMap[String, Long]

  //决定任务是否可以向HDFS提交输出的权限
  private [scheduler] val outputCommitCoordinator = env.outputCommitCoordinator

  // 闭包序列化器
  // DAGScheduler是单线程运行,所以是线程安全的
  private val closureSerializer = SparkEnv.get.closureSerializer.newInstance()

  private val disallowStageRetryForTest = sc.getConf.getBoolean("spark.test.noStageRetry", false)
  private[scheduler] val unRegisterOutputOnHostOnFetchFailure =
    sc.getConf.get(config.UNREGISTER_OUTPUT_ON_HOST_ON_FETCH_FAILURE)
  private[scheduler] val maxConsecutiveStageAttempts =
    sc.getConf.getInt("spark.stage.maxConsecutiveAttempts",
      DAGScheduler.DEFAULT_MAX_CONSECUTIVE_STAGE_ATTEMPTS)
  private[scheduler] val barrierJobIdToNumTasksCheckFailures = new ConcurrentHashMap[Int, Int]
  private val timeIntervalNumTasksCheck = sc.getConf
    .get(config.BARRIER_MAX_CONCURRENT_TASKS_CHECK_INTERVAL)
  private val maxFailureNumTasksCheck = sc.getConf
    .get(config.BARRIER_MAX_CONCURRENT_TASKS_CHECK_MAX_FAILURES)

  //消息调度线程池
  private val messageScheduler =
    ThreadUtils.newDaemonSingleThreadScheduledExecutor("dag-scheduler-message")

  //事件循环处理器:这是一个事件处理线程(阻塞线程)
  private[spark] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this)
  //将dagScheduler对象绑定到taskScheduler对象中
  taskScheduler.setDAGScheduler(this)
    
  //启动事件循环处理器  
  eventProcessLoop.start()
}

2.2.2.事件循环处理器实例化逻辑

1、定义一个dag调度事件阻塞队列;

2、定义一个事件处理线程:

​ 从阻塞队列中获取heap事件;

​ 调用onReceive方法处理事件;

//DAGSchedulerEventProcessLoop是EventLoop的子类
private[scheduler] class DAGSchedulerEventProcessLoop(dagScheduler: DAGScheduler)
  extends EventLoop[DAGSchedulerEvent]("dag-scheduler-event-loop") with Logging {
	
  //跟踪DAGScheduler事件循环中处理消息的时间的计时器
  private[this] val timer = dagScheduler.metricsSource.messageProcessingTimer
}

private[spark] abstract class EventLoop[E](name: String) extends Logging {

  //dag调度事件阻塞队列
  private val eventQueue: BlockingQueue[E] = new LinkedBlockingDeque[E]()

  private val stopped = new AtomicBoolean(false)

  // 事件处理线程
  private[spark] val eventThread = new Thread(name) {
    setDaemon(true)

    override def run(): Unit = {
      try {
        while (!stopped.get) {
          //从阻塞队列获取heap事件
          val event = eventQueue.take()
          try {
            //调用onReceive方法处理
            onReceive(event)
          } catch {
            case NonFatal(e) =>
              try {
                onError(e)
              } catch {
                case NonFatal(e) => logError("Unexpected error in " + name, e)
              }
          }
        }
      } catch {
        case ie: InterruptedException => // exit even if eventQueue is not empty
        case NonFatal(e) => logError("Unexpected error in " + name, e)
      }
    }

  }
}

2.2.3.启动事件循环处线程

2.2.3.1.线程执行逻辑

执行事件循环处理线程的run方法:从阻塞队列提取待处理事件,将事件交个dagScheduler处理;

private[spark] class DAGScheduler(
    private[scheduler] val sc: SparkContext,
    private[scheduler] val taskScheduler: TaskScheduler,
    listenerBus: LiveListenerBus,
    mapOutputTracker: MapOutputTrackerMaster,
    blockManagerMaster: BlockManagerMaster,
    env: SparkEnv,
    clock: Clock = new SystemClock())
  extends Logging {

  //启动事件循环处理器  
  eventProcessLoop.start()
}

private[spark] abstract class EventLoop[E](name: String) extends Logging {
  
  def start(): Unit = {
    if (stopped.get) {
      throw new IllegalStateException(name + " has already been stopped")
    }
    // Call onStart before starting the event thread to make sure it happens before onReceive
    onStart()
    //启动事件循环处理线程
    eventThread.start()
  }
  
  //dag调度事件阻塞队列
  private val eventQueue: BlockingQueue[E] = new LinkedBlockingDeque[E]()

  // 事件处理线程
  private[spark] val eventThread = new Thread(name) {
    setDaemon(true)

    override def run(): Unit = {
      try {
        while (!stopped.get) {
          //从阻塞队列获取heap事件
          val event = eventQueue.take()
          try {
            //调用onReceive方法处理
            onReceive(event)
          } catch {
            case NonFatal(e) =>
              try {
                onError(e)
              } catch {
                case NonFatal(e) => logError("Unexpected error in " + name, e)
              }
          }
        }
      } catch {
        case ie: InterruptedException => // exit even if eventQueue is not empty
        case NonFatal(e) => logError("Unexpected error in " + name, e)
      }
    }

  }
}

private[scheduler] class DAGSchedulerEventProcessLoop(dagScheduler: DAGScheduler)
  extends EventLoop[DAGSchedulerEvent]("dag-scheduler-event-loop") with Logging {
	override def onReceive(event: DAGSchedulerEvent): Unit = {
    val timerContext = timer.time()
    try {
      //调用doOnReceive
      doOnReceive(event)
    } finally {
      timerContext.stop()
    }
  }

  private def doOnReceive(event: DAGSchedulerEvent): Unit = event match {
    //job提交事件
    case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
    	//交个dagScheduler处理
      dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)

    //其他事件。。。。
  }    
}

2.2.3.2.dagScheduler进行事件处理
2.2.3.2.1.handleJobSubmitted-从事件信息中找到finalStage

1、根据finalRdd构建finalStage;

2、构建活动job,并对job进行缓存;

3、将job绑定到stage上;

4、调用submitStage方法从最后一个stage开始进行处理;

private[spark] class DAGScheduler(
    private[scheduler] val sc: SparkContext,
    private[scheduler] val taskScheduler: TaskScheduler,
    listenerBus: LiveListenerBus,
    mapOutputTracker: MapOutputTrackerMaster,
    blockManagerMaster: BlockManagerMaster,
    env: SparkEnv,
    clock: Clock = new SystemClock())
  extends Logging {
    
	private[scheduler] def handleJobSubmitted(jobId: Int,
      finalRDD: RDD[_],
      func: (TaskContext, Iterator[_]) => _,
      partitions: Array[Int],
      callSite: CallSite,
      listener: JobListener,
      properties: Properties) {
    var finalStage: ResultStage = null
    try {
      // 根据finalRdd构建finalStage:finalRdd-action操作的最后一个rdd;finalStage-最后一个stage
      //finalStage是ResultStage类的实例化,最后计算结果的stage,处理这个stage,其他stage都是ShuffleMapStage类的实例化
      finalStage = createResultStage(finalRDD, func, partitions, jobId, callSite)
    } catch {
      //异常策略。。。
    }
    
    // Job submitted, clear internal data.
    barrierJobIdToNumTasksCheckFailures.remove(jobId)

    //构建一个活动job
    val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
    //清空rdd ID ---> 分区位置索引数组映射缓存
    clearCacheLocs()

    //缓存job信息
    jobIdToActiveJob(jobId) = job
    activeJobs += job
    
    //将job信息绑定到stage上
    finalStage.setActiveJob(job)
    val stageIds = jobIdToStageIds(jobId).toArray
    val stageInfos = stageIds.flatMap(id => stageIdToStage.get(id).map(_.latestInfo))
    
    //stage信息注册到消息总线
    listenerBus.post(
      SparkListenerJobStart(job.jobId, jobSubmissionTime, stageInfos, properties))
    
    //从最后一个stage开始处理
    submitStage(finalStage)
  }    
}
2.2.3.2.2.submitStage-stage处理

1、从stage中获取stage绑定的job id;只处理绑定了job的stage;

2、切分stage,获取当前stage的父stage列表;

3、如果当前stage没有父stage,调用submitMissingTasks提交task集合;

4、如果当前stage存在父stage(1到n个),轮询父stage列表,分别对付stage调用submitStage向上递归处理;直至没有父stage的stage,然后对该stage走第3步;

private[spark] class DAGScheduler(
    private[scheduler] val sc: SparkContext,
    private[scheduler] val taskScheduler: TaskScheduler,
    listenerBus: LiveListenerBus,
    mapOutputTracker: MapOutputTrackerMaster,
    blockManagerMaster: BlockManagerMaster,
    env: SparkEnv,
    clock: Clock = new SystemClock())
  extends Logging {
	  
  private def submitStage(stage: Stage) {
    //获取job id
    val jobId = activeJobForStage(stage)
    if (jobId.isDefined) {
      //要求:没有依赖stage在处理、不是正在处理的stage、不是处理失败的stage
      if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
        //stage切分,获取父stage列表
        val missing = getMissingParentStages(stage).sortBy(_.id)
        logDebug("missing: " + missing)
        
        if (missing.isEmpty) {//当前的stage没有任何父stage
          //则提交task集合
          submitMissingTasks(stage, jobId.get)
        } else {//当前stage存在依赖stage
          for (parent <- missing) {//存在多个父stage,分别递归处理
            //多层rdd宽依赖,层层划分:递归调用
            submitStage(parent)
          }
          //将当前stage标识为等待依赖stage处理
          waitingStages += stage
        }
      }
    } else {
      abortStage(stage, "No active job for stage " + stage.id, None)
    }
  }
}
2.2.3.2.2.1.getMissingParentStages-stage切分获取父stage

1、通过维护的ArrayStack对象,维护需要继续向上遍历的rdd;遍历开始时从ArrayStack对象heap位置取出当前rdd,遍历处理后,如果当前rdd的依赖关系为窄依赖,将当前rdd的依赖rdd再次放入ArrayStack对象heap位置;这种情况下再次遍历时ArrayStack对象不为空,新放入的rdd继续参与下次遍历,实现stage中rdd向上遍历的能力。

2、遍历的结束:当前rdd的依赖关系是宽依赖,则构建mapStage,且当前rdd的依赖rdd不在放入ArrayStack对象heap位置;这种情况下ArrayStack对象为空,不在遍历。

3、stage的划分条件:当某个rdd的依赖关系是宽依赖时,构建mapStage,实现stage划分。

4、在rdd遍历遇到宽依赖划分出父stage后,将父stage返回;

private[spark] class DAGScheduler(
    private[scheduler] val sc: SparkContext,
    private[scheduler] val taskScheduler: TaskScheduler,
    listenerBus: LiveListenerBus,
    mapOutputTracker: MapOutputTrackerMaster,
    blockManagerMaster: BlockManagerMaster,
    env: SparkEnv,
    clock: Clock = new SystemClock())
  extends Logging {
	  
  private def getMissingParentStages(stage: Stage): List[Stage] = {
    val missing = new HashSet[Stage]
    val visited = new HashSet[RDD[_]]
    
    //初始化等待划分stage的rdd堆栈数组
    val waitingForVisit = new ArrayStack[RDD[_]]
    //rdd链的stage划分
    def visit(rdd: RDD[_]) {
      if (!visited(rdd)) {//没有参与划分的rdd
        //标识已经参与划分的rdd
        visited += rdd
        val rddHasUncachedPartitions = getCacheLocs(rdd).contains(Nil)
        if (rddHasUncachedPartitions) {
          //遍历当前rdd的所有依赖关系
          for (dep <- rdd.dependencies) {
            dep match {
              //宽依赖:构建ShuffleMapStage的实例化对象mapStage
              case shufDep: ShuffleDependency[_, _, _] =>
                val mapStage = getOrCreateShuffleMapStage(shufDep, stage.firstJobId)
                if (!mapStage.isAvailable) {
                  missing += mapStage
                }
              //窄依赖:将当前rdd的依赖rdd放入堆栈数组heap位置
              case narrowDep: NarrowDependency[_] =>
                waitingForVisit.push(narrowDep.rdd)
            }
          }
        }
      }
    }
    //将stage的最后一个rdd放入堆栈数组
    waitingForVisit.push(stage.rdd)
    //从后向前便利stage中所有rdd
    while (waitingForVisit.nonEmpty) {
      //rdd链的stage划分:从堆栈数组总取出heap元素===>实现rdd向上遍历能力
      visit(waitingForVisit.pop())
    }
    missing.toList
  }
}
2.2.3.2.3.submitMissingTasks-提交task集合

1、获取stage的分区id的集合;

2、将分区id与所在位置信息进行映射;

3、将stage进行序列号并广播出去;

4、将stage按照分区,一个分区封装一个task,最后将所有task存放在一个task集合中;

5、根据task集合生产taskSet,将taskSet发送到taskScheduler进行后续调度处理;

private[spark] class DAGScheduler(
    private[scheduler] val sc: SparkContext,
    private[scheduler] val taskScheduler: TaskScheduler,
    listenerBus: LiveListenerBus,
    mapOutputTracker: MapOutputTrackerMaster,
    blockManagerMaster: BlockManagerMaster,
    env: SparkEnv,
    clock: Clock = new SystemClock())
  extends Logging {
	  
  private def submitMissingTasks(stage: Stage, jobId: Int) {
    logDebug("submitMissingTasks(" + stage + ")")

    //获取stage的分区数量
    val partitionsToCompute: Seq[Int] = stage.findMissingPartitions()

    // 从job中获取当前stage的配置信息:scheduling pool, job group, description, etc.
    val properties = jobIdToActiveJob(jobId).properties

    //将当前stage标识为处理中
    runningStages += stage
    
    // 初始化stage状态
    stage match {
      case s: ShuffleMapStage =>
        outputCommitCoordinator.stageStart(stage = s.id, maxPartitionId = s.numPartitions - 1)
      case s: ResultStage =>
        outputCommitCoordinator.stageStart(
          stage = s.id, maxPartitionId = s.rdd.partitions.length - 1)
    }
    
    //分区id ---> 位置信息集合的映射
    val taskIdToLocations: Map[Int, Seq[TaskLocation]] = try {
      stage match {
        case s: ShuffleMapStage =>
          partitionsToCompute.map { id => (id, getPreferredLocs(stage.rdd, id))}.toMap
        case s: ResultStage =>
          partitionsToCompute.map { id =>
            val p = s.partitions(id)
            (id, getPreferredLocs(stage.rdd, p))
          }.toMap
      }
    } catch {
      case NonFatal(e) =>
        stage.makeNewStageAttempt(partitionsToCompute.size)
        listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))
        abortStage(stage, s"Task creation failed: $e\n${Utils.exceptionString(e)}", Some(e))
        runningStages -= stage
        return
    }

    stage.makeNewStageAttempt(partitionsToCompute.size, taskIdToLocations.values.toSeq)

    if (partitionsToCompute.nonEmpty) {
      stage.latestInfo.submissionTime = Some(clock.getTimeMillis())
    }
    listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties))
 
    //序列化stage并广播出去
    var taskBinary: Broadcast[Array[Byte]] = null
    var partitions: Array[Partition] = null
    try {
      var taskBinaryBytes: Array[Byte] = null
        
      RDDCheckpointData.synchronized {
        taskBinaryBytes = stage match {
          case stage: ShuffleMapStage =>
            JavaUtils.bufferToArray(
              closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef))
          case stage: ResultStage =>
            JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, stage.func): AnyRef))
        }

        partitions = stage.rdd.partitions
      }

      taskBinary = sc.broadcast(taskBinaryBytes)
    } catch {
      // In the case of a failure during serialization, abort the stage.
      case e: NotSerializableException =>
        abortStage(stage, "Task not serializable: " + e.toString, Some(e))
        runningStages -= stage

        // Abort execution
        return
      case e: Throwable =>
        abortStage(stage, s"Task serialization failed: $e\n${Utils.exceptionString(e)}", Some(e))
        runningStages -= stage

        // Abort execution
        return
    }

    //将同一个stage中的按照分区封装成一个task集合(一个分区一个task)
    val tasks: Seq[Task[_]] = try {
      val serializedTaskMetrics = closureSerializer.serialize(stage.latestInfo.taskMetrics).array()
      stage match {
        //ShuffleMapStage构建shuffleTask
        case stage: ShuffleMapStage =>
          stage.pendingPartitions.clear()
          partitionsToCompute.map { id =>
            val locs = taskIdToLocations(id)
            val part = partitions(id)
            stage.pendingPartitions += id
            new ShuffleMapTask(stage.id, stage.latestInfo.attemptNumber,
              taskBinary, part, locs, properties, serializedTaskMetrics, Option(jobId),
              Option(sc.applicationId), sc.applicationAttemptId, stage.rdd.isBarrier())
          }

        //ResultStage构建resultTask
        case stage: ResultStage =>
          partitionsToCompute.map { id =>
            val p: Int = stage.partitions(id)
            val part = partitions(p)
            val locs = taskIdToLocations(id)
            new ResultTask(stage.id, stage.latestInfo.attemptNumber,
              taskBinary, part, locs, id, properties, serializedTaskMetrics,
              Option(jobId), Option(sc.applicationId), sc.applicationAttemptId,
              stage.rdd.isBarrier())
          }
      }
    } catch {
      case NonFatal(e) =>
        abortStage(stage, s"Task creation failed: $e\n${Utils.exceptionString(e)}", Some(e))
        runningStages -= stage
        return
    }

    if (tasks.size > 0) {
      //将task集合封装为taskSet,然后提交到taskSchedulee
      taskScheduler.submitTasks(new TaskSet(
        tasks.toArray, stage.id, stage.latestInfo.attemptNumber, jobId, properties))
    } else {
      // Because we posted SparkListenerStageSubmitted earlier, we should mark
      // the stage as completed here in case there are no tasks to run
      markStageAsFinished(stage, None)

      stage match {
        case stage: ShuffleMapStage =>
          logDebug(s"Stage ${stage} is actually done; " +
              s"(available: ${stage.isAvailable}," +
              s"available outputs: ${stage.numAvailableOutputs}," +
              s"partitions: ${stage.numPartitions})")
          markMapStageJobsAsFinished(stage)
        case stage : ResultStage =>
          logDebug(s"Stage ${stage} is actually done; (partitions: ${stage.numPartitions})")
      }
      submitWaitingChildStages(stage)
    }
  }
}

3.DAGScheduler任务提交

3.1.常用action算子-DAGScheduler入口

1、在spark中,对应一些列的计算逻辑,都是通过transform算子进行组合实现定义,这些组合逻辑并不会立即执行,需要等待action算子的调用,才会进行逻辑计算的实施;
2、在action算子中,都会有sc.runJob()这样一段代码,该代码实现action算子生成job的功能,完成计算逻辑的实现;每遇到action算子都会生成一个job;

ActionMeaning
reduce(func)Aggregate the elements of the dataset using a function func (which takes two arguments and returns one). The function should be commutative and associative so that it can be computed correctly in parallel.
collect()Return all the elements of the dataset as an array at the driver program. This is usually useful after a filter or other operation that returns a sufficiently small subset of the data.
count()Return the number of elements in the dataset.
first()Return the first element of the dataset (similar to take(1)).
take(n)Return an array with the first n elements of the dataset.
takeSample(withReplacement, num, [seed])Return an array with a random sample of num elements of the dataset, with or without replacement, optionally pre-specifying a random number generator seed.
takeOrdered(n, [ordering])Return the first n elements of the RDD using either their natural order or a custom comparator.
saveAsTextFile(path)Write the elements of the dataset as a text file (or set of text files) in a given directory in the local filesystem, HDFS or any other Hadoop-supported file system. Spark will call toString on each element to convert it to a line of text in the file.
saveAsSequenceFile(path) (Java and Scala)Write the elements of the dataset as a Hadoop SequenceFile in a given path in the local filesystem, HDFS or any other Hadoop-supported file system. This is available on RDDs of key-value pairs that implement Hadoop’s Writable interface. In Scala, it is also available on types that are implicitly convertible to Writable (Spark includes conversions for basic types like Int, Double, String, etc).
saveAsObjectFile(path) (Java and Scala)Write the elements of the dataset in a simple format using Java serialization, which can then be loaded using SparkContext.objectFile().
countByKey()Only available on RDDs of type (K, V). Returns a hashmap of (K, Int) pairs with the count of each key.
foreach(func)Run a function func on each element of the dataset. This is usually done for side effects such as updating an Accumulator or interacting with external storage systems. Note: modifying variables other than Accumulators outside of the foreach() may result in undefined behavior. See Understanding closures for more details.

3.2.SparkContext#runJob()

在SparkContext#runJob()中,将任务提交到dag调度器,完成后将所有阶段标记已完成,最后备份结果;

class SparkContext(config: SparkConf) extends Logging {
  
  def runJob[T, U: ClassTag](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      resultHandler: (Int, U) => Unit): Unit = {
    if (stopped.get()) {
      throw new IllegalStateException("SparkContext has been shutdown")
    }
    val callSite = getCallSite
    val cleanedFunc = clean(func)
    logInfo("Starting job: " + callSite.shortForm)
    if (conf.getBoolean("spark.logLineage", false)) {
      logInfo("RDD's recursive dependencies:\n" + rdd.toDebugString)
    }
    
		//向dag调度器提交任务job
    dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get)
    //将所有阶段标记为已完成,如果显示则清除进度条,则进度不会与作业输出交织。
    progressBar.foreach(_.finishAll())
    //备份结果:保存检查点、物化存储
    rdd.doCheckpoint()
  }
}

3.3.DAGScheduler#runJob()

​1、将计算任务信息封装到任务提交事件中,然后将事件发送到事件循环处理器并存储到处理器的事件阻塞队列中;
​2、后续由事件循环处理器从事件阻塞队列提取事件进行后续处理;参考【2.2.3.启动事件循环处理线程】;

private[spark] class DAGScheduler(
    private[scheduler] val sc: SparkContext,
    private[scheduler] val taskScheduler: TaskScheduler,
    listenerBus: LiveListenerBus,
    mapOutputTracker: MapOutputTrackerMaster,
    blockManagerMaster: BlockManagerMaster,
    env: SparkEnv,
    clock: Clock = new SystemClock())
  extends Logging {
    
  def runJob[T, U](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      callSite: CallSite,
      resultHandler: (Int, U) => Unit,
      properties: Properties): Unit = {
    val start = System.nanoTime
    
    //提交任务
    val waiter = submitJob(rdd, func, partitions, callSite, resultHandler, properties)
    ThreadUtils.awaitReady(waiter.completionFuture, Duration.Inf)
    
    //处理任务提交响应
    waiter.completionFuture.value.get match {
      case scala.util.Success(_) =>
        logInfo("Job %d finished: %s, took %f s".format
          (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
      case scala.util.Failure(exception) =>
        logInfo("Job %d failed: %s, took %f s".format
          (waiter.jobId, callSite.shortForm, (System.nanoTime - start) / 1e9))
        // SPARK-8644: Include user stack trace in exceptions coming from DAGScheduler.
        val callerStackTrace = Thread.currentThread().getStackTrace.tail
        exception.setStackTrace(exception.getStackTrace ++ callerStackTrace)
        throw exception
    }
  } 
    
  def submitJob[T, U](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      callSite: CallSite,
      resultHandler: (Int, U) => Unit,
      properties: Properties): JobWaiter[U] = {
    // 确保启动任务的分区存在
    val maxPartitions = rdd.partitions.length
    partitions.find(p => p >= maxPartitions || p < 0).foreach { p =>
      throw new IllegalArgumentException(
        "Attempting to access a non-existent partition: " + p + ". " +
          "Total number of partitions: " + maxPartitions)
    }

    //确定jobId
    val jobId = nextJobId.getAndIncrement()
    if (partitions.size == 0) {
      // Return immediately if the job is running 0 tasks
      return new JobWaiter[U](this, jobId, 0, resultHandler)
    }

    assert(partitions.size > 0)
    //确定封装计算逻辑的函数
    val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]
    //等待DAGScheduler作业完成的对象。当任务完成时,它将结果传递给给定的处理程序函数
    val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)
    //封装任务提交事件,将事件发送到事件循环处理器中
    eventProcessLoop.post(JobSubmitted(
      jobId, rdd, func2, partitions.toArray, callSite, waiter,
      SerializationUtils.clone(properties)))
    waiter
  }
}

private[spark] abstract class EventLoop[E](name: String) extends Logging {
  //事件阻塞队列
  private val eventQueue: BlockingQueue[E] = new LinkedBlockingDeque[E]()
  //将事件加入事件阻塞队列
  def post(event: E): Unit = {
    eventQueue.put(event)
  }
}

4.总结

​1、总体来看,DAGScheduler的能力在DAGScheduler被实例化是被定义,DAGScheduler的能力在应用计算逻辑中使用action算子时被调用;
2、在用户类定义计算逻辑是,实例化SparkContext过程中, 完成了DAGScheduler的实例化,也定义了DAGScheduler的能力:定义事件循环处理器并启动,该处理器是一个处理事件阻塞队列中事件的线程;启动后,事件阻塞队列后事件就处理事件,没事件就阻塞线程等待队列中新加入事件在处理;
3、在应用计算逻辑汇总调用action算子时,通过DAGScheduler将任务封装为任务提交事件(JobSubmitted)发送添加到事件循环处理器的事件阻塞队列,等待事件循环处理器线程处理;
4、事件循环处理器的处理逻辑:将job根据快窄依赖不同划分为不同stage(遇到宽依赖划分一个stage),然后将stage根据分区封装为taskSet,最后将taskSet推动到taskScheduler进行后续处理;

5.参考资料

Spark源码-任务提交流程之-6-sparkContext初始化

spark源码-任务提交流程之-5-CoarseGrainedExecutorBackend

spark源码-任务提交流程之-7-流程梳理总结

Spark分析之DAGScheduler

Spark技术内幕读书笔记:Spark核心——DAGScheduler任务切分调度与TaskScheduler任务执行调度计算过程详解

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值