spark2.3源码分析之submitTasks的流程

TaskSchedulerImpl

概述

不同类型的集群对应于不同的SchedulerBackend:YarnSchedulerBackend、StandaloneSchedulerBackend、LocalSchedulerBackend等。TaskSchedulerImpl为不同的SchedulerBackend处理相同的逻辑,例如决定任务之间的调度顺序等。

client端必须先调用TaskSchedulerImpl的initialize()和start()方法,才能再调用TaskSchedulerImpl的submitTasks()方法提交TaskSet。

TaskSchedulerImpl#submitTasks()方法

提交一个TaskSet(stage)内的所有task去运行。

流程如下:

1、首先创建TaskSetManager。TaskSetManager负责管理TaskSchedulerImpl中一个单独TaskSet,跟踪每一个task,如果task失败,负责重试task直到达到task重试次数的最多次数。并且通过延迟调度来执行task的位置感知调度。

2、将TaskSetManger加入schedulableBuilder。schedulableBuilder的类型是 SchedulerBuilder,SchedulerBuilder是一个trait,有两个实现FIFOSchedulerBuilder和 FairSchedulerBuilder,并且默认采用的是FIFO方式。schedulableBuilder是TaskScheduler中一个重要成员,他根据调度策略决定了TaskSetManager的调度顺序。

3、调用SchedulerBackend的riviveOffers方法对Task进行调度,决定task具体运行在哪个Executor中。

override def submitTasks(taskSet: TaskSet) {
//获取TaskSet中Task数组
    val tasks = taskSet.tasks
    logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
    this.synchronized {
//创建TaskSetManager,TaskSetManager将会管理一个TaskSet内所有Task的生命周期
      val manager = createTaskSetManager(taskSet, maxTaskFailures)
//获取TaskSet的stageId
      val stage = taskSet.stageId
      val stageTaskSets =
        taskSetsByStageIdAndAttempt.getOrElseUpdate(stage, new HashMap[Int, TaskSetManager])

      // Mark all the existing TaskSetManagers of this stage as zombie, as we are adding a new one.
      // This is necessary to handle a corner case. Let's say a stage has 10 partitions and has 2
      // TaskSetManagers: TSM1(zombie) and TSM2(active). TSM1 has a running task for partition 10
      // and it completes. TSM2 finishes tasks for partition 1-9, and thinks he is still active
      // because partition 10 is not completed yet. However, DAGScheduler gets task completion
      // events for all the 10 partitions and thinks the stage is finished. If it's a shuffle stage
      // and somehow it has missing map outputs, then DAGScheduler will resubmit it and create a
      // TSM3 for it. As a stage can't have more than one active task set managers, we must mark
      // TSM2 as zombie (it actually is).
      stageTaskSets.foreach { case (_, ts) =>
        ts.isZombie = true
      }
      stageTaskSets(taskSet.stageAttemptId) = manager
//将TaskSetManger加入schedulableBuilder
      schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)

      if (!isLocal && !hasReceivedTask) {
        starvationTimer.scheduleAtFixedRate(new TimerTask() {
          override def run() {
            if (!hasLaunchedTask) {
              logWarning("Initial job has not accepted any resources; " +
                "check your cluster UI to ensure that workers are registered " +
                "and have sufficient resources")
            } else {
              this.cancel()
            }
          }
        }, STARVATION_TIMEOUT_MS, STARVATION_TIMEOUT_MS)
      }
      hasReceivedTask = true
    }
//schedulerBackEnd#reviveOffers()方法
    backend.reviveOffers()
  }

 运行spark(on yarn)任务时典型的日志如下:

[dag-scheduler-event-loop] INFO cluster.YarnScheduler:58: Adding task set 0.0 with 6 tasks
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 0.0 in stage 0.0 (TID 0, hadoop5, partition 0,PROCESS_LOCAL, 9549 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 1.0 in stage 0.0 (TID 1, hadoop4, partition 1,PROCESS_LOCAL, 5974 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 2.0 in stage 0.0 (TID 2, hadoop9, partition 2,PROCESS_LOCAL, 4826 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 3.0 in stage 0.0 (TID 3, hadoop5, partition 3,PROCESS_LOCAL, 6937 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 4.0 in stage 0.0 (TID 4, hadoop4, partition 4,PROCESS_LOCAL, 5587 bytes)
[dispatcher-event-loop-11] INFO scheduler.TaskSetManager:58: Starting task 5.0 in stage 0.0 (TID 5, hadoop9, partition 5,PROCESS_LOCAL, 6734 bytes)

CoarseGrainedExecutorBackEnd#reviveOffers()方法

该方法给driverEndpoint发送ReviveOffer消息

override def reviveOffers() {
    driverEndpoint.send(ReviveOffers)
  }

driverEndpoint收到ReviveOffer消息后调用makeOffers方法

case ReviveOffers =>
        makeOffers()

DriverEndpoint#makeOffers()方法

 上面代码中的executorDataMap,在客户端向Master注册Application的时候,Master已经为Application分配并启动好Executor,然后注册给CoarseGrainedSchedulerBackend,注册信息就是存储在executorDataMap数据结构中。

// Make fake resource offers on all executors
    private def makeOffers() {
      // Make sure no executor is killed while some task is launching on it
      val taskDescs = withLock {
        // Filter out executors under killing
//过滤掉处于被killing状态的Executor
        val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
        val workOffers = activeExecutors.map {
          case (id, executorData) =>
将Executor封装成 WorkerOffer 对象
            new WorkerOffer(id, executorData.executorHost, executorData.freeCores,
              Some(executorData.executorAddress.hostPort))
        }.toIndexedSeq
        scheduler.resourceOffers(workOffers)
      }
      if (!taskDescs.isEmpty) {
        launchTasks(taskDescs)
      }
    }

下文阅 :spark3.2源码分析之launchTask的流程

参考:Spark 源码解析:彻底理解TaskScheduler的任务提交和task最佳位置算法

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值