Spark作业流源码剖析

1.详细流程:

1.    用户编写的Spark应用程序,只有使用Action算子操作才能提交一次Job给ClusterManager

2.    Action算子->sc.runJob()->runJob(rdd,func,partitions) //三个参数分别为rdd,处理函数,分区数->runJob[T, U](rdd, func,partitions, (index, res) => results(index) = res) //对RDD中的每个分区处理(最后一个参数resultHandler)

3.    dagScheduler.runJob(rdd,cleanedFunc, partitions, callSite, resultHandler, localProperties.get) //参数分别为rdd,清理函数,分区数,调用地址,结果处理器,本地配置项

4.    val waiter = submitJob(rdd,func, partitions, callSite, resultHandler, properties)//提交Job,参数分别为rdd,算子函数,分区数,调用地址,结果处理器,本地配置项

5.    val jobId =nextJobId.getAndIncrement() //向资源管理器注册Job

6.    eventProcessLoop.post(JobSubmitted(jobId,rdd, func2, partitions.toArray, callSite,waiter,SerializationUtils.clone(properties))) //创建JobSubmitted的event事件发送给内嵌的eventProcessActor,其中JobSubmitted的信息包括jobId,rdd,func2,分区数组,调用地址,JobWaiter(作业等待器),序列化配置项

即该Job的提交信息会被eventProcessLoop添加到事件队列中

7.    DAGSchedulerEventProcessLoop.onReceive(event:DAGSchedulerEvent)//接受DAGSchedulerEvent事件->doOnReceive(event)-> case JobSubmitted(jobId, rdd, func,partitions, callSite, listener, properties) =>

dagScheduler.handleJobSubmitted(jobId,rdd, func, partitions, callSite, listener, properties)//DAGSchedulerEventProcessLoop接收到JobSubmitted事件后,调用dagScheduler.handleJobSubmitted()处理

8.    finalStage =createResultStage(finalRDD, func, partitions, jobId, callSite) //为Job创建一个ResultStage(最后阶段),参数分别为finalRDD(最后一个RDD),算子函数,分区数,JobId,调用地址-> val parents =getOrCreateParentStages(rdd, jobId) 根据finalRDD创建父Stage(宽依赖划分)-> val stage = newResultStage(id, rdd, func, partitions, parents, jobId, callSite) 根据ParentStages以及StageId创建ResultStage->submitStage(finalStage)//提交finalStage

9.    if (!waitingStages(stage)&& !runningStages(stage) && !failedStages(stage))判断该stage不是等待stage,运行stage,故障stage->val missing = getMissingParentStages(stage).sortBy(_.id) 找到Stage的丢失父Stage并根据StageId进行排序-> submitMissingTasks(stage, jobId.get)若无丢失父Stage则提交其丢失Tasks;否则for(parent <- missing) {submitStage(parent)}递归提交丢失的父Stage

10.  DAGScheduler.submitMissingTasks(stage: Stage, jobId: Int)-> val partitionsToCompute:Seq[Int] = stage.findMissingPartitions() // 找到该stage丢失的Partitions的标号,判断该stage属于ShuffeMapStage还是ResultStage->stage match{

        case s:ShuffleMapStage =>

                   partitionsToCompute.map { id => (id,getPreferredLocs(stage.rdd, id))}.toMap //寻找丢失的partition,partition的位置可能在Cache中,也可能在CheckPoint中,或者若该RDD的依赖为窄依赖,则直接返回第一个窄依赖的第一个Partition

       case s: ResultStage =>

         partitionsToCompute.map { id =>

           val p = s.partitions(id) //同理

           (id, getPreferredLocs(stage.rdd, p))

         }.toMap

}->

case stage:ShuffleMapStage =>

          partitionsToCompute.map { id =>

            val locs = taskIdToLocations(id)

            val part = stage.rdd.partitions(id)

            new ShuffleMapTask(stage.id,stage.latestInfo.attemptId,

              taskBinary, part, locs,stage.latestInfo.taskMetrics, properties, Option(jobId), //根据partition的位置和内容创建ShuffleMapTask,并指明其属于哪个job

              Option(sc.applicationId),sc.applicationAttemptId)

          }

case stage:ResultStage =>

          partitionsToCompute.map { id =>

            val p: Int = stage.partitions(id)

            val part = stage.rdd.partitions(p)

            val locs = taskIdToLocations(id)

            new ResultTask(stage.id,stage.latestInfo.attemptId,

              taskBinary, part, locs, id,properties, stage.latestInfo.taskMetrics,//创建ResultTask,

              Option(jobId),Option(sc.applicationId), sc.applicationAttemptId)

          }

 做完上述两步后stage->tasks

11.taskScheduler.submitTasks(newTaskSet(

        tasks.toArray, stage.id, stage.latestInfo.attemptId,jobId, properties)) //TaskScheduler.submitTasks提交TaskSet(实际上由TaskSchedulerImpl.submitTasks提交任务,它会创建一个TaskSetManager->schedulableBuilder.addTaskSetManager->backend.reviveOffers()//backend会根据不同的运行模式创建相应的backend,如果在单机下运行则创建LocalBackend,LocalBackend.receive()会收到ReceiveOffers事件调用LocalBackend.receiveOffers()

defreviveOffers() {

    val offers = IndexedSeq(newWorkerOffer(localExecutorId, localExecutorHostname, freeCores)) //获得Executor

    for (task <-scheduler.resourceOffers(offers).flatten) {//TaskScheduler调用resourceOffers(offers)指定Task给每个Executor

      freeCores -= scheduler.CPUS_PER_TASK

      executor.launchTask(executorBackend,taskId = task.taskId, attemptNumber = task.attemptNumber,executor启动任务

        task.name, task.serializedTask)

    }

  }

}

)

12.Executor. launchTask(

     context: ExecutorBackend,

     taskId: Long,

     attemptNumber: Int,

     taskName: String,

     serializedTask: ByteBuffer): Unit = {

   val tr = new TaskRunner(context, taskId = taskId, attemptNumber = attemptNumber,taskName, //创建TaskRunner

     serializedTask)

   runningTasks.put(taskId, tr) //将此TaskId和TaskRunner放入并发哈希表,标记正在运行的任务

   threadPool.execute(tr) //启动线程,Executor.run()

  }


2.简化过程:

1)用户编写好应用程序启动Driver进程,启动Driver SecurityManager,RpcEnv等等

2)Driver进程创建Spark Context(sc)

3)sc创建DAGScheduler和TaskScheduler

创建DAGScheduler实际上是创建eventProcessLoop;

创建TaskScheduler实际上是:

private def createTaskScheduler(
    sc: SparkContext,
    master: String,
    deployMode: String): (SchedulerBackend, TaskScheduler) = {
master match {
  case "local" =>
    val scheduler = new TaskSchedulerImpl(sc, MAX_LOCAL_TASK_FAILURES, isLocal = true)
    val backend = new LocalSchedulerBackend(sc.getConf, scheduler, 1)
    scheduler.initialize(backend)
    (backend, scheduler)
  case SPARK_REGEX(sparkUrl) =>
  val scheduler = new TaskSchedulerImpl(sc)
  val masterUrls = sparkUrl.split(",").map("spark://" + _)
  val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
  scheduler.initialize(backend)
  (backend, scheduler)

TaskScheduler.initialize(backend:SchedulerBackend){
      this.backend = backend
    // temporarily set rootPool name to empty
    rootPool = new Pool("", schedulingMode, 0, 0)
    schedulableBuilder = {
      schedulingMode match {
        case SchedulingMode.FIFO =>
          new FIFOSchedulableBuilder(rootPool)
        case SchedulingMode.FAIR =>
          new FairSchedulableBuilder(rootPool, conf)
        case _ =>
          throw new IllegalArgumentException(s"Unsupported spark.scheduler.mode: $schedulingMode")
      }
    }
    schedulableBuilder.buildPools()
}

即首先创建TaskSchedulerImpl对象,然后根据不同的模式,创建LocalSchedulerBackend/StandaloneSchedulerBackend,之后便是scheduler.initialize(backend)初始化backend,包括根据SchedulerMode创建rootPool(调度池),以及根据SchedulerMode创建FIFOSchedulableBuilder(优先调度JobId小的task,相同JobId下优先调度Stage小的task)/FairSchedulableBuilder(首先会按照minShare分配给每个调度池资源,然后按照weight分配给每个调度池资源),最后scheduleBuilder.buildPools()

然后启动TaskScheduler.start()的过程实际上是启动backend.start()的过程,backend.start()的核心是创建DriverEndPoint等待Executor的连接

4)一旦遇到Action算子,则开始Spark的作业流,每个Action算子的实质是sc.runJob(rdd,func,partition) //参数分别为rdd,算子操作,分区数

5)DAGScheduler.runJob(rdd,cleanedFunc, partitions, callSite, resultHandler, localProperties.get)//多了调用地址,结果处理器,本地配置项

6)val waiter=DAGScheduler.submitJob(rdd,func, partitions, callSite, resultHandler, properties) //创建waiter

7)  val jobId =nextJobId.getAndIncrement() //创建JobID

8)DagSchedulerEventProcessLoop.post(new SubmittedJob(jobId,rdd, func2, partitions.toArray, callSite,waiter,SerializationUtils.clone(properties))) //根据waiter,jobId创建SubmittedJob事件对象加入到DAGSchedulerEventProcessLoop中的队列中

9)DAGSchedulerEventProcessLoop.onReceive(event:DAGSchedulerEvent)//DAGSchedulerEventProcessLoop接收到SubmittedJob事件对象,通过模式匹配调用DAGScheduler.handlerSubmittedJob(jobId,rdd, func, partitions, callSite, listener, properties)//处理作业提交事件

10)val finalStage=DAGScheduler.createResultStage(finalRDD, func, partitions, jobId, callSite) //根据finalRDD创建finalStage,submitStage(finalStage)//提交finalStage

11)DAGScheduler判断提交的Stage不为等待,运行,故障的Stage,val missing = getMissingParentStages(stage).sortBy(_.id) //根据stage找到其丢失的Parent Stage;若无丢失的Parent Stage则调用submitMissTasks(stage),否则对每个丢失的Parent Stage递归调用submitStage(parent)

12)val partitionsToCompute:Seq[Int] = stage.findMissingPartitions() // 找到该stage丢失的Partitions;判断Stage若为ShuffleMapStage则调用partitionsToCompute.map { id => (id,getPreferredLocs(stage.rdd, id))}.toMap //找到每个Partition的保存位置(依次寻找缓存,放置策略指定位置Checkpoint,第一个窄依赖的第一个Partition),若为ResultStage则同理处理;判断Stage若为ShuffleMapStage则调用new ShuffleMapTask(stage.id,stage.latestInfo.attemptId,taskBinary, part, locs,stage.latestInfo.taskMetrics, properties, Option(jobId)根据每个Partition创建每个ShuffleMapTask,同理创建ResultTask

13)TaskScheduler.submitTasks(newTaskSet(tasks.toArray, stage.id, stage.latestInfo.attemptId,jobId, properties)) //将tasks封装为TaskSet,并创建TaskSetManager进行管理

即将TaskSet封装成TaskSetManager后,通过schedulerBuilder.addTaskSetManager()将TaskSetManager添加到调度池后,backend.reviveOffers(){实际上是调用DriverEndPoint.send(ReviveOffers)发送ReviveOffers消息}

14)backend.receiveOffers()//由于不同的模式会创建不同的Backend,比如本地模式,则会创建LocalBackend;LocalBackend接收到ReceiveOffers事件,会为每个Task分配合适的Executor,executor.launch(task)

即Netty的Dispathcer线程接收到ReviveOffers消息后,调用CoarseGrainedSchedulerBackend.makeOffers(){由于CoarseGrainedSchedulerBackend保存有Executor状态信息的executorDataMap,根据此结构首先过滤出活跃的executorData,再根据executorData.host以及executorData.freeCores信息创建WorkOffer对象,最后调用launchTasks(scheduler.resourceOffers(workOffers))返回TaskDescription对象集,每个TaskDescription对象都保存有每个调度的Task以及对应的Executor信息,根据TaskDesription中保存的executorID信息找到对应的executorData,从而得到executorData.executorEndPoint通信对象.send(LaunchTask)发送给指定Executor LaunchTask信息,其中包括序列化后的Task}

15)Executor.launch(task)会经过三步,首先为task创建一个TaskRunner对象,然后将TaskRunner对象加入并发哈希表中(标记正在运行),最后threadPool.execute(tr)启动线程,内部实际上是执行了Executor.run()

后续调用关系如下:

TaskRunner,run()->Task.run()->ShuffleMapTask/ResultTask.runTask()->RDD.iterator()->RDD.computeOrCheckpoint()->RDD.compute()


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值