一个Spark Application在Spark集群中的运行源码分析

1. Spark DAG引擎介绍  

DAG,中文含义是有向无环图,主要是用来描述任务之间的先后关系。spark中的DAGScheduler主要是负责任务的逻辑调度。负责将job拆分成不同阶段的具有依赖关系的多批次任务,并且指定调度逻辑。具体特点如下

  • DAG图有向无环,无循环依赖
  • 多个没有关系的stage之间可以并行调度
  • 支持基于血缘的任务恢复
  • spark可以通过将中间结果持久化到内存中用于提升程序的效率

2. Spark Runtime中的重要组件

2.1 DAG Scheduler

工作机制:

DAGScheduler 接收job , 会切分成多个 Stage ,从 job 的后面往前寻找 shuffle 算子,如果找到一个 shuffle 算子,就切开,已经找到的 RDD 的执行链就自成一个 Stage ,放入到一个栈中。将来 DAGScheduler 要把这个栈中的每个 stage 拿出来,提交给 TaskScheduler 从 RDD 的算子调用链中进行从后往前寻找 shuffle 算子,找到一个就切断,形成为一个 Stage ,然后放入栈(先进后出)中。

DAG Scheduler会维护waiting jobs active jobs 两个队列,

维护 waiting stagesactive stages failed stages,以及与 jobs 的映射关系

2.2 TaskScheduler

工作机制:

TaskScheduler的主要任务是将Task和集群中的资源进行有效结合。内部维护了一个任务队列,用于存放需要执行的任务。

TaskScheduler 本身是个接口,Spark 里只实现了一个 TaskSchedulerImpl,理论上任务调度可以定制

2.3 SchedulerBackend & ExecutorBackend

两者都是通信组件,负责driver和任务executor之间的通信。SchedulerBackend存在于Driver中,ExecutorBackend存在于Executor中。

2.4 SparkContext 

SparkContext是Spark Application运行的上下文对象。包含了很多必要的组件

  • SparkContext 是用户通往 Spark 集群的唯一入口,可以用来在 Spark 集群中创建 RDD 、累加器 Accumulator 和广播变量 Braodcast Variable
  • SparkContext 在实例化的过程中会初始化 DAGScheduler TaskScheduler SchedulerBackend
  • SparkContext 会调用 DAGScheduler 将整个 Job 划分成几个小的阶段 (Stage) TaskScheduler 会调度每个 Stage 的任务 (Task)给到合适的executor执行 。另外,SchedulerBackend 管理整个集群中为这个当前的应用分配的计算资源 (Executor)。

2.5 总结

spark driver中最重要的三个组件分别是DAGScheducer、TaskScheduler、SchedulerBackend

三者之间的关系可以这样理解。如果将spark比作一家公司。DAGScheducer就是总设计师,负责将一个大的任务分别成一个个的阶段,把控整体进度;SchedulerBackend就是人事和行政经理,负责统筹整个公司的人力和行政资源;TaskScheduler就是具体运作的经理,将总设计师的各个阶段分解成一个个的子任务,并从人事经理中获取相应的人力资源和行政资源,将活派给合适的人。

3. Spark Application提交消息流程

  1. spark通过spark-submit命令提交代码,并启动client,创建clientEndpoint,通过clientEndpoint组件发送RequestSubmitDriver消息到master 
  2. master接收到RequestSubmitDriver消息后,发送launchDriver消息到worker,启动driver  
  3. driver接收到launchDriver消息后开始启动driver,启动完毕后给master返回DriverStateChanged消息,这是倒数第二的消息  
  4. master接收到DriverStateChanged消息后创建SubmitDriverResponse消息发送给client,这是最后一步消息
  5. 在driver中初始化clientEndpoint负责和master进行通信,通过clientEndpoint的onstart方法发送registerApplication消息到master
  6. master接收到 registerApplication消息后做相关处理并返回registeredApplication消息给到driver所在的worker 
  7. master同时又根据registerApplication消息创建lanuchExecutor消息去启动executor
  8. executor启动后会把executor的相关信息注册到driver中,发送registerExecutor消息给driver的driverEndpoint
  9. driver的driverEndpoint收到registeredExecutor消息后做相关处理发送registeredExecutor给到executor通信组件  
  10. executor上报自己的状态给到worker的endpoint组件 ,发送executorStateChanged消息
  11. worker上报executorStateChanged消息给到master   
  12. master会将可以使用的executor信息再发送给driver的clientEndpoint组件
  13.   driver的driverEndpoint组件启动后会发送lauchTask消息给到executor用于任务的启动  

4. Spark Application 源码执行流程

4.1 spark-submit发送RequestDriverSubmit消息到master

  • 通过spark summit脚本提交一个app应用,核心是SparkSubmit类,通过这个类去启动Client,Client的核心组件是ClientEndPoint。
  • ClientEndpoint类中重点关注onStart方法,初始化的时候会封装RequestDriverSubmit消息发送到master
/**
   1.首先执行sparkSubmit的main方法  
*/
object SparkSubmit extends CommandLineUtils with Logging {

// main方法执行入口  

 override def main(args: Array[String]): Unit = {
//创建SparkSubmit对象
    val submit = new SparkSubmit() {
      self =>

      //继承父类的doSubmit方法
      override def doSubmit(args: Array[String]): Unit = {
        try {
          super.doSubmit(args)
        } catch {
          case e: SparkUserAppException =>
            exitFn(e.exitCode)
        }
      }

    }
    //执行doSubmit方法  
    submit.doSubmit(args)
  }

}


//点进doSubmit方法中  
def doSubmit(args: Array[String]): Unit = {
    //获取解析后的参数对象   
    val appArgs = parseArguments(args)
    if (appArgs.verbose) {
      logInfo(appArgs.toString)
    }
    //对配置中的actions参数进行参数匹配  
    appArgs.action match {
     //submit  任务提交
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
     //kill 杀死程序  
      case SparkSubmitAction.KILL => kill(appArgs)
      case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
      case SparkSubmitAction.PRINT_VERSION => printVersion()
    }
  }
}

//从submit(appArgs, uninitLog)方法进入 
  private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {

     //根据spark-submit的参数决定怎样去做app的初始化工作  
      val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)   
}

//进入prepareSubmitEnvironment方法中  
private[deploy] def prepareSubmitEnvironment(

//这里选取yarn-cluster模式做例子 
 if (isYarnCluster) {
   //yarn集群运行主类是YARN_CLUSTER_SUBMIT_CLASS
// org.apache.spark.deploy.yarn.YarnClusterApplication
      childMainClass = YARN_CLUSTER_SUBMIT_CLASS
      if (args.isPython) {
        childArgs += ("--primary-py-file", args.primaryResource)
        childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
      } else if (args.isR) {
        val mainFile = new Path(args.primaryResource).getName
        childArgs += ("--primary-r-file", mainFile)
        childArgs += ("--class", "org.apache.spark.deploy.RRunner")
      } else {
        if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
          childArgs += ("--jar", args.primaryResource)
        }
//常规程序运行主类加到childArgs变量中
        childArgs += ("--class", args.mainClass)
      }
      if (args.childArgs != null) {
        args.childArgs.foreach { arg => childArgs += ("--arg", arg) }
      }
    }

  //返回四个参数  
 (childArgs, childClasspath, sparkConf, childMainClass)

}
//继续在submit中看下面的domain方法  
  private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
def doRunMain(): Unit = {
//最重要的是runMain方法,借助prepareSubmitEnvironment方法确定的参数执行runMain
         runMain(childArgs, childClasspath, sparkConf, childMainClass, args.verbose)
}

}
//在runMain方法中启动RPC客户端   
  private def runMain(
      childArgs: Seq[String],
      childClasspath: Seq[String],
      sparkConf: SparkConf,
      childMainClass: String,
      verbose: Boolean): Unit = {
app.start(childArgs.toArray, sparkConf)
}

//进入Client类的start方法中  
 override def start(args: Array[String], conf: SparkConf): Unit = {
    val driverArgs = new ClientArguments(args)

    //启动rpcEnv,通信组件的运行环境   
    val rpcEnv =
      RpcEnv.create("driverClient", Utils.localHostName(), 0, conf, new SecurityManager(conf))
    //创建和master通信的组件ClientEndpoint
    rpcEnv.setupEndpoint("client", new ClientEndpoint(rpcEnv, driverArgs, masterEndpoints, conf))

    rpcEnv.awaitTermination()
  }


//进入clientEndPoint类中,重点关注onStart方法和onReceive方法  
private class ClientEndpoint(
    override val rpcEnv: RpcEnv,
    driverArgs: ClientArguments,
    masterEndpoints: Seq[RpcEndpointRef],
    conf: SparkConf)
  extends ThreadSafeRpcEndpoint with Logging {
//onStart方法,初始化的时候执行一次  
  override def onStart(): Unit = {
 //封装DriverDescription消息,里面保存了driver的相关信息,包括jar的url,所需的资源cpu和momery等  
     val driverDescription = new DriverDescription(
          driverArgs.jarUrl,
          driverArgs.memory,
          driverArgs.cores,
          driverArgs.supervise,
          command)
   //发送driverDescription 消息给master,并且等待SubmitDriverResponse回复   
        asyncSendToMasterAndForwardReply[SubmitDriverResponse](
          RequestSubmitDriver(driverDescription))
}
}

4.2  Master接收RequestDriverSubmit消息,并发送lauchDriver到worker 

  • 进入Master类中的receiveAndReply方法中,找到RequestDriverSubmit的相关内容
  • Master接收RequestDriverSubmit对象中的driver相关信息,封装到driverInfo中
  • 将driverInfo的信息加入到waitingDrivers数组中,waitingDrivers保存了等待启动的所有drivers信息
  • receiveAndReply方法中的schedule方法有两个作用:1. 给waitingDrivers找合适的worker进行启动 2. 给waitingApp找合适的executor启动(后续解释)
  • 给waitingDrivers找合适的worker的主要方式是通过对活着的worker进行随机洗牌,然后从中抽取一个worer和waitingDrivers中的driver的cpu和men进行对比,如果合适就放在该worker中启动,启动driver是发送lauchDriver消息给到worker,里面封装了worker信息driver信息。 
private[deploy] class Master(
    override val rpcEnv: RpcEnv,
    address: RpcAddress,
    webUiPort: Int,
    val securityMgr: SecurityManager,
    val conf: SparkConf)
  extends ThreadSafeRpcEndpoint with Logging with LeaderElectable {

//关注receiveAndReply方法中的RequestSubmitDriver消息相关内容 

 override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {

 case RequestSubmitDriver(description) =>
    
      //创建一个driverInfo对象
        val driver: DriverInfo = createDriver(description)
     //将driverInfo对象加入到持久化引擎中
        persistenceEngine.addDriver(driver)
//将driverInfo对象加入到等待启动driver的waitingDrivers数组中
        waitingDrivers += driver
        drivers.add(driver)
   // schedule()方法,用于调度driver和executor
        schedule()
    //对RequestSubmitDriver消息做一个回复,回复是SubmitDriverResponse对象,一般是最后进行回复
        context.reply(SubmitDriverResponse(self, true, Some(driver.id),
          s"Driver successfully submitted as ${driver.id}"))
      }


}

// 进入schedule方法中  
  private def schedule(): Unit = {
    //shuffledAliveWorkers 是对所有活着的worker进行随机洗牌
    val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
  //统计能使用的worker的数量
    val numWorkersAlive = shuffledAliveWorkers.size
    var curPos = 0
//对waitingDrivers数组中等待启动的driver进行遍历  
    for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
 
      var launched = false
      var numWorkersVisited = 0
   //numWorkersVisited 小于numWorkersAlive 的数量并且launched为false的时候进入
      while (numWorkersVisited < numWorkersAlive && !launched) {
//获取某个worker
        val worker = shuffledAliveWorkers(curPos)
        numWorkersVisited += 1
 //当worker的内存和cores的空闲数量都符合启动driver的要求时
        if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
       //发送launchDriver到对应到worker中去启动driver 
          launchDriver(worker, driver)
     //从待启动的driver数组中移除该driver 
          waitingDrivers -= driver
          launched = true
        }
        curPos = (curPos + 1) % numWorkersAlive
      }
    }
  //在worker中启动executor,下面会有详解 
    startExecutorsOnWorkers()
  }

}


 //发送launchDriver给到对应的worker 
  private def launchDriver(worker: WorkerInfo, driver: DriverInfo) {
    logInfo("Launching driver " + driver.id + " on worker " + worker.id)
    worker.addDriver(driver)
    driver.worker = Some(worker)
//发送launchDriver给到对应的worker 
    worker.endpoint.send(LaunchDriver(driver.id, driver.desc))
    driver.state = DriverState.RUNNING
  }

4.3 Worker接收launchDriver消息并启动Driver  

  • 当一个 Worker 接收到 LaunchDriver 消息的时候,就要启动一个 Driver JVM 进程,核心入口是launchDriver
  • Worker接收到launchDriver消息后将driver信息封装成DriverRunner,并执行start方法去启动driver   
  • 关注DriverRunner的start方法,这个方法用于启动driver,主要关注两个方法
    • prepareAndRunDriver()用于启动driver
    • worker.send(DriverStateChanged(...))用于通知worker,driver已经启动  
  • 关注DriverStateChanged消息在worker中处理,worker会将该消息转发给master,sendToMaster(driverStateChanged)
private[deploy] class Worker() extends ThreadSafeRpcEndpoint with Logging {
  //已经使用的cores数量
  var coresUsed = 0
  //已经使用到的内存量
  var memoryUsed = 0
   
   case LaunchDriver(driverId, driverDesc) =>
      logInfo(s"Asked to launch driver $driverId")
    //将driver信息封装到DriverRunner中  
      val driver = new DriverRunner(
        conf,
        driverId,
        workDir,
        sparkHome,
        driverDesc.copy(command = Worker.maybeUpdateSSLSettings(driverDesc.command, conf)),
        self,
        workerUri,
        securityMgr)
      drivers(driverId) = driver
  
     //启动driver 
      driver.start()
     //记录已经使用的内存量和cpu核数
      coresUsed += driverDesc.cores
      memoryUsed += driverDesc.mem


}

//进入到DriverRunner类中的start方法  
private[deploy] class DriverRunner( conf: SparkConf,
    val driverId: String,
    val workDir: File,
    val sparkHome: File,
    val driverDesc: DriverDescription,
    val worker: RpcEndpointRef,
    val workerUrl: String,
    val securityManager: SecurityManager)
  extends Logging {


private[worker] def start() = {
    new Thread("DriverRunner for " + driverId) {
      override def run() {
        var shutdownHook: AnyRef = null
        try {
            //添加钩子函数用于杀死driver进程  
          shutdownHook = ShutdownHookManager.addShutdownHook { () =>
            logInfo(s"Worker shutting down, killing driver $driverId")
            kill()
          }

          // 为driver启动做前置准备并启动driver    
          val exitCode = prepareAndRunDriver()

        //  driver给worker发送driverStateChanged消息用于通知driver已经启动   
        worker.send(DriverStateChanged(driverId, finalState.get, finalException))
      }
    }.start()
  }

 //进入方法prepareAndRunDriver
  private[worker] def prepareAndRunDriver(): Int = {
   
    // TODO: If we add ability to submit multiple jars they should also be added here
    val builder = CommandUtils.buildProcessBuilder(driverDesc.command, securityManager,
      driverDesc.mem, sparkHome.getAbsolutePath, substituteVariables)
    //真正启动driver的方法  
    runDriver(builder, driverDir, driverDesc.supervise)
  }

}


// 进入runDriver方法中
 private def runDriver(builder: ProcessBuilder, baseDir: File, supervise: Boolean): Int = {
   // 启动driver   
    runCommandWithRetry(ProcessBuilderLike(builder), initialize, supervise)
  }
// 进入runCommandWithRetry方法  
  private[worker] def runCommandWithRetry(
      command: ProcessBuilderLike, initialize: Process => Unit, supervise: Boolean): Int = {
    var exitCode = -1
    // Time to wait between submission retries.
    var waitSeconds = 1
    // A run of this many seconds resets the exponential back-off.
    val successfulRunDuration = 5
    var keepTrying = !killed

    while (keepTrying) {
      logInfo("Launch Command: " + command.command.mkString("\"", "\" \"", "\""))

      synchronized {
        if (killed) { return exitCode }
     //启动方法start  
        process = Some(command.start())
        initialize(process.get)
      }

      val processStart = clock.getTimeMillis()
      exitCode = process.get.waitFor()

   
      }
    }
    exitCode
  }
}

//关注driverStateChanged消息在worker中的处理  
    case driverStateChanged @ DriverStateChanged(driverId, state, exception) =>
      handleDriverStateChanged(driverStateChanged)


//进入handleDriverStateChanged方法中  
private[worker] def handleDriverStateChanged(driverStateChanged: DriverStateChanged): Unit = {
    val driverId = driverStateChanged.driverId
    val exception = driverStateChanged.exception
    val state = driverStateChanged.state
    //给master返回driver已经启动的消息driverStateChanged
    sendToMaster(driverStateChanged)
    val driver = drivers.remove(driverId).get
    finishedDrivers(driverId) = driver
    trimFinishedDriversIfNecessary()
    memoryUsed -= driver.driverDesc.mem
    coresUsed -= driver.driverDesc.cores
  }

4.4 启动Driver,执行DriverWrapper的main方法

  • 启动Drirver并启动通信组件rpcEndPoint
  • 通过反射获取运行程序的主类 ,然后运行我们自己编写的spark代码  
//接上文中的 driver启动命令process = Some(command.start())
//真实启动的主类是DriverWrapper 
object DriverWrapper extends Logging {
  def main(args: Array[String]) {
    args.toList match {
     //一些变量的赋值操作  
      case workerUrl :: userJar :: mainClass :: extraArgs =>
        val conf = new SparkConf()
        val host: String = Utils.localHostName()
        val port: Int = sys.props.getOrElse("spark.driver.port", "0").toInt



        //创建rpcEnv
        val rpcEnv = RpcEnv.create("Driver", host, port, conf, new SecurityManager(conf))
        logInfo(s"Driver address: ${rpcEnv.address}")
        //创建rpcEndpoint  
        rpcEnv.setupEndpoint("workerWatcher", new WorkerWatcher(rpcEnv, workerUrl))


        //通过反射启动程序主类  
        val clazz = Utils.classForName(mainClass)
        val mainMethod = clazz.getMethod("main", classOf[Array[String]])
        mainMethod.invoke(null, extraArgs.toArray[String])

        rpcEnv.shutdown()
    }
  }

4.5 sparkContext的初始化

  • 运行自己代码的第一件事就是初始化sparkContext 
  • 初始化sparkContext 这里只重点介绍DAGScheduler、TaskScheduler、SchedulerBackend
  • 首先是TaskScheduler对象的创建 ,通过SparkContext.createTaskScheduler入口进入,主要是初始化taskScheduler和StandaloneSchedulerBackend
  • 接下来是DAGScheduler的初始化,重点是DAGSchedulerEventProcessLoop事件处理器的初始化
  • _taskScheduler.start()
    • backend.start() 方法主要是启动driverEndpoint和clientEndpoint 
      • driverEndpoint负责和其他worker进行通信,关注onStart方法,主要是启动定时任务去lanuchTasks,用于将任务发送给合适的executor执行
      • clientEndpoint 负责和master通信 ,关注onStart方法,会向master进行注册,发送registerApplication消息,并且启动executor
//核心入口  
new SparkContext(sparkconf)


class SparkContext(config: SparkConf) extends Logging {
//sparkConf变量  
private var _conf: SparkConf = _
private var _schedulerBackend: SchedulerBackend = _
private var _taskScheduler: TaskScheduler = _
private var _dagScheduler: DAGScheduler = _

val (sched, ts) = SparkContext.createTaskScheduler(this, master, deployMode)
    _schedulerBackend = sched
    _taskScheduler = ts
   //创建DAGScheduler   
    _dagScheduler = new DAGScheduler(this)


 _taskScheduler.start()


}

//进入createTaskScheduler方法查看taskScheduler对象的创建和scheduleBackend对象创建  
private def createTaskScheduler(
      sc: SparkContext,
      master: String,
      deployMode: String): (SchedulerBackend, TaskScheduler) = {
    import SparkMasterRegex._

// 根据传入的master参数去进行匹配,这里匹配上的是spark集群的方式  
    master match {

      case SPARK_REGEX(sparkUrl) =>
//创建TaskScheduler对象
        val scheduler = new TaskSchedulerImpl(sc)
        val masterUrls = sparkUrl.split(",").map("spark://" + _)
//创建backend对象
        val backend = new StandaloneSchedulerBackend(scheduler, sc, masterUrls)
        scheduler.initialize(backend)
        (backend, scheduler)
  }
  }


//进入_dagScheduler = new DAGScheduler(this)中,重点关注DAGSchedulerEventProcessLoop
private[spark] class DAGScheduler(
private[scheduler] val sc: SparkContext,
    private[scheduler] val taskScheduler: TaskScheduler,
    listenerBus: LiveListenerBus,
    mapOutputTracker: MapOutputTrackerMaster,
    blockManagerMaster: BlockManagerMaster,
    env: SparkEnv,
    clock: Clock = new SystemClock())
  extends Logging {

  // 初始化一个事件处理器DAGSchedulerEventProcessLoop,是一个异步事件驱动模型
  private[spark] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this)
  taskScheduler.setDAGScheduler(this)

  //并启动异步事件驱动模型   
  eventProcessLoop.start()

}


//分析_taskScheduler.start() 
 override def start() {
   //schedulerBackend的启动
    backend.start()

    if (!isLocal && conf.getBoolean("spark.speculation", false)) {
      logInfo("Starting speculative execution thread")
      speculationScheduler.scheduleWithFixedDelay(new Runnable {
        override def run(): Unit = Utils.tryOrStopSparkContext(sc) {
          checkSpeculatableTasks()
        }
      }, SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
    }
  }


//分析backend.start()  进入的是StandaloneSchedulerBackend类的start方法
override def start() {
    super.start()

    val sparkJavaOpts = Utils.sparkJavaOpts(conf, SparkConf.isExecutorStartupConf)
    val javaOpts = sparkJavaOpts ++ extraJavaOpts
    val command = Command("org.apache.spark.executor.CoarseGrainedExecutorBackend",
      args, sc.executorEnvs, classPathEntries ++ testingClassPath, libraryPathEntries, javaOpts)


    val appDesc = ApplicationDescription(sc.appName, maxCores, sc.executorMemory, command,
      webUrl, sc.eventLogDir, sc.eventLogCodec, coresPerExecutor, initialExecutorLimit)
//创建StandaloneAppClient
    client = new StandaloneAppClient(sc.env.rpcEnv, masters, appDesc, this, conf)
    client.start()
    launcherBackend.setState(SparkAppHandle.State.SUBMITTED)
    waitForRegistration()
    launcherBackend.setState(SparkAppHandle.State.RUNNING)
  }

// StandaloneSchedulerBackend.start方法的父类CoarseGrainedSchedulerBackend中的start方法
 override def start() {
 //将spark的配置信息加入到properties类中  
    val properties = new ArrayBuffer[(String, String)]
    for ((key, value) <- scheduler.sc.conf.getAll) {
      if (key.startsWith("spark.")) {
        properties += ((key, value))
      }
    }

    // 创建driverEndpoint ,负责和其他的worker进行通信   
    driverEndpoint = createDriverEndpointRef(properties)
  }


//进入client.start()方法 
  def start() {
    // 创建了clientEndPoint对象,负责和master进行通信
    endpoint.set(rpcEnv.setupEndpoint("AppClient", new ClientEndpoint(rpcEnv)))
  }


//进入clientEndPoint对象的onStart方法,看初始化过程
private[spark] class StandaloneAppClient(
    rpcEnv: RpcEnv,
    masterUrls: Array[String],
    appDescription: ApplicationDescription,
    listener: StandaloneAppClientListener,
    conf: SparkConf)
  extends Logging {

   override def onStart(): Unit = {
      try {
         //向master进行注册  
        registerWithMaster(1)
      } catch {
        case e: Exception =>
          logWarning("Failed to connect to master", e)
          markDisconnected()
          stop()
      }
    }

// 进入 registerWithMaster(1)方法  
private def registerWithMaster(nthRetry: Int) {
 //向所有的master进行注册 
      registerMasterFutures.set(tryRegisterAllMasters())
     
    }
// 进入tryRegisterAllMasters
private def tryRegisterAllMasters(): Array[JFuture[_]] = {
      for (masterAddress <- masterRpcAddresses) yield {
        registerMasterThreadPool.submit(new Runnable {
          override def run(): Unit = try {
            if (registered.get) {
              return
            }
           //创建rpcEnvRef
            val masterRef = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
           //向master发送RegisterApplication消息并携带app信息    
            masterRef.send(RegisterApplication(appDescription, self))
          } catch {
            case ie: InterruptedException => // Cancelled
            case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)
          }
        })
      }
    }

}

// 进入master类的关于RegisterApplication消息的处理代码  
 case RegisterApplication(description, driver) =>
      if (state == RecoveryState.STANDBY) {
        // ignore, don't send response
      } else {
        logInfo("Registering app " + description.name)
        //封装app信息到ApplicationInfo 类中
        val app: ApplicationInfo = createApplication(description, driver)
       //向 waitingApps数组中加入该待启动的app
        registerApplication(app)
      
        persistenceEngine.addApplication(app)
        // 向driver发送RegisteredApplication消息  
        driver.send(RegisteredApplication(app.id, self))
        schedule()
      }

// 进入和master交互的StandaloneAppClient类中关于RegisteredApplication消息的处理代码  
 override def receive: PartialFunction[Any, Unit] = {
      case RegisteredApplication(appId_, masterRef) =>
        //建立和master之间的联系   
        appId.set(appId_)
        registered.set(true)
        master = Some(masterRef)
        listener.connected(appId.get)
}

// 进入RegisterApplication中的schedule方法 并关注startExecutorsOnWorkers()方法 
// 这个方法其实就是给app任务找合适的worker去启动executor 
 private def startExecutorsOnWorkers(): Unit = {
 //遍历waitingApps中的待启动的app
    for (app <- waitingApps) {
      val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)
      // 当剩余的core能够满足app任务的启动的时候   
      if (app.coresLeft >= coresPerExecutor) {

       //寻找能用的worker并根据core的数量进行排序,从大到小排序 
        val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
          .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
            worker.coresFree >= coresPerExecutor)
          .sortBy(_.coresFree).reverse

      //分配executor的方法 ,具体进入scheduleExecutorsOnWorkers方法中
        val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

        // Now that we've decided how many cores to allocate on each worker, let's allocate them
        for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
          allocateWorkerResourceToExecutors(
            app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos))
        }
      }
    }
  }

// 进入scheduleExecutorsOnWorkers方法中  
//返回的结果是每个worker中分配的cores数量  
private def scheduleExecutorsOnWorkers(
      app: ApplicationInfo,
      usableWorkers: Array[WorkerInfo],
      spreadOutApps: Boolean): Array[Int] = {
    val coresPerExecutor = app.desc.coresPerExecutor
    val minCoresPerExecutor = coresPerExecutor.getOrElse(1)
    val oneExecutorPerWorker = coresPerExecutor.isEmpty
    val memoryPerExecutor = app.desc.memoryPerExecutorMB
    val numUsable = usableWorkers.length
    val assignedCores = new Array[Int](numUsable) // Number of cores to give to each worker
    val assignedExecutors = new Array[Int](numUsable) // Number of new executors on each worker
    var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)

    /** Return whether the specified worker can launch an executor for this app. */
    def canLaunchExecutor(pos: Int): Boolean = {
      val keepScheduling = coresToAssign >= minCoresPerExecutor
      val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor

      val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0
      if (launchingNewExecutor) {
        val assignedMemory = assignedExecutors(pos) * memoryPerExecutor
        val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor
        val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit
        keepScheduling && enoughCores && enoughMemory && underLimit
      } else {
        keepScheduling && enoughCores
      }
    }
    var freeWorkers = (0 until numUsable).filter(canLaunchExecutor)
    while (freeWorkers.nonEmpty) {
      freeWorkers.foreach { pos =>
        var keepScheduling = true
        while (keepScheduling && canLaunchExecutor(pos)) {
          coresToAssign -= minCoresPerExecutor
          assignedCores(pos) += minCoresPerExecutor

          // If we are launching one executor per worker, then every iteration assigns 1 core
          // to the executor. Otherwise, every iteration assigns cores to a new executor.
          if (oneExecutorPerWorker) {
            assignedExecutors(pos) = 1
          } else {
            assignedExecutors(pos) += 1
          }

          if (spreadOutApps) {
            keepScheduling = false
          }
        }
      }
      freeWorkers = freeWorkers.filter(canLaunchExecutor)
    }
    assignedCores
  }

// 进入allocateWorkerResourceToExecutors方法 
// 方法本质是给对应的worker发送launchExecutor消息启动executor并将对应的app状态改为running
  private def allocateWorkerResourceToExecutors(
      app: ApplicationInfo,
      assignedCores: Int,
      coresPerExecutor: Option[Int],
      worker: WorkerInfo): Unit = {
    val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
    val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
    for (i <- 1 to numExecutors) {
      val exec = app.addExecutor(worker, coresToAssign)
// 向对应worker发送lanuchExecutor消息
      launchExecutor(worker, exec)
      app.state = ApplicationState.RUNNING
    }
  }


// 进入launchExecutor方法  
 private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
    logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
    worker.addExecutor(exec)
   //给对应的worker发送LaunchExecutor消息启动executor
    worker.endpoint.send(LaunchExecutor(masterUrl,
      exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
//给driver发送executorAdd消息 
    exec.application.driver.send(
      ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
  }

// 进入worker类中的LaunchExecutor相关消息代码处理  
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
... 
//封装一个executorRunner对象,里面保存了app的相关信息和相关配置   
 val manager = new ExecutorRunner(
            appId,
            execId,
            appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
            cores_,
            memory_,
            self,
            workerId,
            host,
            webUi.boundPort,
            publicAddress,
            sparkHome,
            executorDir,
            workerUri,
            conf,
            appLocalDirs, ExecutorState.RUNNING)
          executors(appId + "/" + execId) = manager

       // 启动executor 
          manager.start()
          coresUsed += cores_
          memoryUsed += memory_
   //给master发送ExecutorStateChanged状态变更消息  
          sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
        } catch {
          case e: Exception =>
            logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
            if (executors.contains(appId + "/" + execId)) {
              executors(appId + "/" + execId).kill()
              executors -= appId + "/" + execId
            }
            sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
              Some(e.toString), None))
        }
      }

//进入 manager.start()方法 
  private[worker] def start() {
 // 创建一个worker线程用于启动executor 
    workerThread = new Thread("ExecutorRunner for " + fullId) {
      override def run() { 
// 真正的启动方法  
fetchAndRunExecutor() }
    }
    workerThread.start()
    // Shutdown hook that kills actors on shutdown.
    shutdownHook = ShutdownHookManager.addShutdownHook { () =>
      // It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will
      // be `ExecutorState.RUNNING`. In this case, we should set `state` to `FAILED`.
      if (state == ExecutorState.RUNNING) {
        state = ExecutorState.FAILED
      }
      killProcess(Some("Worker shutting down")) }
  }

// 进入fetchAndRunExecutor方法 
private def fetchAndRunExecutor() {
    val builder = CommandUtils.buildProcessBuilder(subsCommand, new SecurityManager(conf),
        memory, sparkHome.getAbsolutePath, substituteVariables)
//启动executor  
process = builder.start()
//等待启动结束   
 val exitCode = process.waitFor()
// 启动完毕后向worker发送ExecutorStateChanged消息  
    worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))                

}


//进入worker类中的关于 executorStateChanged 消息的处理  
   case executorStateChanged @ ExecutorStateChanged(appId, execId, state, message, exitStatus) =>
      handleExecutorStateChanged(executorStateChanged)

  private[worker] def handleExecutorStateChanged(executorStateChanged: ExecutorStateChanged):
    Unit = {
// 向master发送executorStateChanged消息
    sendToMaster(executorStateChanged)
    val state = executorStateChanged.state
}

//进入master类中的ExecutorStateChanged消息处理
case ExecutorStateChanged(appId, execId, state, message, exitStatus) =>
...
//给driver发送ExecutorUpdated消息  
 exec.application.driver.send((execId, state, message, exitStatus, false))
}


//进入standaloneAppclient类中找到ExecutorUpdated相关的消息处理代码 

      case ExecutorUpdated(id, state, message, exitStatus, workerLost) =>
        val fullId = appId + "/" + id
        val messageText = message.map(s => " (" + s + ")").getOrElse("")
        logInfo("Executor updated: %s is now %s%s".format(fullId, state, messageText))
        if (ExecutorState.isFinished(state)) {
          listener.executorRemoved(fullId, message.getOrElse(""), exitStatus, workerLost)
        }



//接下来就是driverEndpoint对象的初始化过程
// 进入CoarseGrainedSchedulerBackend类中的driverEndpoint的onStart方法中 
   override def onStart() {
      // Periodically revive offers to allow delay scheduling to work
      val reviveIntervalMs = conf.getTi meAsMs("spark.scheduler.revive.interval", "1s")
      //定时调度任务,重点关注send方法,参数是ReviveOffers,类似于任务  
      reviveThread.scheduleAtFixedRate(new Runnable {
        override def run(): Unit = Utils.tryLogNonFatalError {
       //自己给自己发送ReviveOffers消息
          Option(self).foreach(_.send(ReviveOffers))
        }
      }, 0, reviveIntervalMs, TimeUnit.MILLISECONDS)
    }


    case ReviveOffers =>
        makeOffers()  
// 进入makeOffers方法中,这个方法的主要作用就是将任务发送到指定的executor中执行  
    private def makeOffers() {
      // Make sure no executor is killed while some task is launching on it
      val taskDescs = withLock {
        // Filter out executors under killing
        val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
        val workOffers = activeExecutors.map {
          case (id, executorData) =>
            new WorkerOffer(id, executorData.executorHost, executorData.freeCores,
              Some(executorData.executorAddress.hostPort))
        }.toIndexedSeq
        scheduler.resourceOffers(workOffers)
      }
      if (!taskDescs.isEmpty) {
      //发送任务给到worker中的executor,参数是任务描述  
        launchTasks(taskDescs)
      }
    }

// 进入到launchTasks 方法中  
    private def launchTasks(tasks: Seq[Seq[TaskDescription]]) {
...
//核心代码是给worker端发送相关的启动任务消息  
   executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))
}


// 进入到CoarseGrainedExecutorBackend类中的LaunchTask相关消息处理  
    case LaunchTask(data) =>
      if (executor == null) {
        exitExecutor(1, "Received LaunchTask command but executor was null")
      } else {
//任务描述解码  
        val taskDesc = TaskDescription.decode(data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
 //执行相关任务  
        executor.launchTask(this, taskDesc)
      }


//进入launchTask,任务调度是通过线程池调度任务,线程池是newCachedThreadPool
  def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
    val tr = new TaskRunner(context, taskDescription)
    runningTasks.put(taskDescription.taskId, tr)
    threadPool.execute(tr)
  }

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值