Spark之Launch Executor

最新推荐文章于 2023-03-18 08:41:21 发布

pre_tender

最新推荐文章于 2023-03-18 08:41:21 发布

阅读量890

点赞数

分类专栏： Saprk

本文链接：https://blog.csdn.net/pre_tender/article/details/100739621

版权

Saprk 专栏收录该内容

47 篇文章 8 订阅

订阅专栏

概要

在前面，我们已经介绍过了，Master注册机制，明白了App，Driver，Worker如何向Master注册，并且从源码上解析了Driver的启动过程。在Driver的启动过程中，我们知道了，Schedule()方法中，会调用LaunchDriver来在指定的Worker上启动Driver的运行。但是Worker相当于一个项目经理的角色，本身其实不是执行具体任务的，这些都交给了Worke上的Executor去执行。因此在Schedule()方法的最后，其实还有一个startExecutorsOnWorkers()来规划并启动Workers上的Executors

在这里插入图片描述
本节，我们就来介绍一下Executor的启动过程。

1. 启动流程之Master端

1.1 Executor启动之Master执行startExecutorsOnWorkers()

在这里插入图片描述

1.2 Executor启动之Master确定Worker上的Core数量

这里主要上图中的scheduleExecutorsOnWorkers()函数，主要是确定每个Worker上分配的Core数量
官方说明如下：

Schedule executors to be launched on the workers.。
返回一个数组，其中包含分配给每个Worker的内核数。

有两种启动执行器的模式。

第一种会尝试将应用程序的执行器分布在尽可能多的Worker上
第二种则相反（即在尽可能少的Worker上启动它们）。

前者通常更适合用于数据位置，并且是默认的。

分配给每个Executor的Core数量是可配置的:

如果显式设置此值，那么如果同一个Worker上有具有足够的内核和内存，则可以在同一个Worker上启动同一Application中的多个Executor。
否则，默认情况下，每个executor将获取worker上可用的所有核心，在这种情况下，在一次Schedule迭代期间，每个Worker上只能启动一个Applicastion Executor。

请注意，如果未设置“spark.executor.cores”，则我们仍可以从同一个Application在同一个Worker上启动多个Executor。
假设appa和appb都有一个Executor在worker1上运行，appa.coresleft>0，接着appb完成并释放了worker1上的所有核心。那么，对于下一个调度迭代，appa将启动一个新的执行器，该执行器将获取worker1上的所有空闲核心，因此我们将从运行在worker1上的appa获得多个执行器。

一次性的在每个Worker上按coresPerExecutor分配非常重要（而不是一次分配一个core）。
考虑下面的例子：集群有4个Worker，每个Worker有16个Core。
*用户请求3个Executor（spark.cores.max=48，spark.executor.cores=16）。如果一次分配一个Core，那么每个Worker的12个Core (48/3) 将分配给每个Executor。
*由于12<16，没有Executor会启动（没看懂。。。）

定义&变量：
在这里插入图片描述
定义了一个内部方法用于确定是否可以启动一个Executor:

具体调度逻辑：

可以看到，>最后会返回计算出来的HashMap–assignedCores：内容是每个Worker上分配的Core数量
接下来，就会按照这里确定好的数量，在每个Worker上分配资源，然后在启动Executor。

1.3 Executor启动之Master分配Worker资源给Executor

在这里插入图片描述

1.4 Executor启动之Master发送LaunchExecutor消息

在这里插入图片描述

Worker上启动：就像LaunchDriver一样，都是在Worker上启动，因此，>先由Master向Worker发送LaunchExecutor消息。（参数：masterUrl, ExecutorDesc.application.id, ExecutorDesc.id, ExecutorDesc.application.desc, ExecutorDesc.cores, ExecutorDesc.memory）
通知Driver添加Executor：Driver需要知道自己的Application是由那些Executor执行的。

2. 启动流程之Worker端

在这里插入图片描述

2.1 启动流程之Worker端receive消息

Receive主要逻辑：

在WorkDir下创建Executor的工作目录executorDir
为Executor创建本地文件夹集appLocalDirs：这个文件夹集主要是为了？。这些文件夹的Paths是通过环境变量SPARK_EXECUTOR_DIRS传递给Executor的，当Application完成后会由Worker删除（这里没有搞懂两个文件夹的作用和区别？）
将appLocalDirs添加到appDirectories这个HashMap中，<appId，appLocalDirs>
创建一个ExecutorRunner，传入任务相关信息以及资源相关信息
调用ExecutorRunner.start()
- 启动一个Thread，run(){ fetchAndRunExecutor() }
- Thread.start()开始执行线程
- ShutdownHookManager用于处理终止事项
通知Master，ExecutorStateChanged。

简化非核心代码：
在这里插入图片描述

在这里面主要是做了一些准备工作，最主要是创建了ExecutorRunner并调用其Start()方法，接下来看看这个方法

2.2 启动流程之Worker启动Executor

前面提到的，封装之后，

1. 执行ExecutorRunner.start()来启动Executor：

(省略了很多代码，完整的可以参考附录）

1.1 启动一个Thread
Thread的run()中调用fetchAndRunExecutor()，下载并运行AppDesc中描述的Executor。此外创建了一个ShutdownHook来处理终止。

2. 执行fetchAndRunExecutor()

2.1 从appDesc获取此Executor执行的相关参数信息，从而生成ProcessBuilder

2.2 设置ProcessBuilder目录

2.3 添加WebUI，并设置日志路径urls

2.4 执行ProcessBuilder.start()启动进程的执行

2.5 & 2.6 proccess输出重定向并处理退出。

3. ProcessBuilder执行Linux命令–>CoarseGrainedExecutorBackend

大致如下

上面的java命令会调用CoarseGrainedExecutorBackend的main方法，main方法中处理命令行传入的参数，然后创建RpcEnv，并为CoarseGrainedExecutorBackend注册一个ExecutorRpcEndpoint。
此外，在main方法中，还会:

创建一个CoarseGrainedExecutorBackend（ExecutorBackend）对象
到CoarseGrainedSchedulerBackend（DriverBackend）去注册此Executor。
接收注册返回消息后创建并启动Executor.

限于篇幅，我们将CoarseGrainedExecutorBackend另起一文来详细说明。详情见Spark 任务调度之启动CoarseGrainedExecutorBackend

既然CoarseGrainedExecutorBackend会向Driver注册Executor，那么接下看看Driver是怎么接收注册消息并执行注册的

3. 启动流程之Driver端

在Spark 任务调度之启动CoarseGrainedExecutorBackend的2.3 new CoarseGrainedExecutorBackend之onStart()中，我们提到了：
onStart()方法会去向Driver注册：向Driver发送ask(RegisterExecutor)。
那么,我们看看Driver对于消息RegisterExecutor的接收和处理。(这部分底层的通信是基于RPC，如果不熟悉RPC通信，参考RPC概述)

3.1 接收消息并进行Executor可用性检测：

这里主要是判断此executor是否能够被注册到Driver，也就是判断它能否用来执行Task.
在这里插入图片描述

3.2 执行注册，维护Executor信息

接上面的190行，主要分为两步：

将Executor相关信息添加到对应的列表中，做相应的变更
创建一个ExecutorData，并以放入ExecutorDataMap中<ExecutorId,ExecutorData>

3.3 通知Executor已注册

向ExecutorRpcEndpoint(通过传入一个CoarseGrainedExecutorBackend来创建的)发送RegisteredExecutor的消息，通知它已经注册
在这里插入图片描述
ExecutorRpcEndpoint中，CoarseGrainedExecutorBackend接收到这个RegisteredExecutor消息之后，会进行处理，主要是New Executor。这样Executor就能为Driver服务了。
这个时候，Executor才被真正的创建出来。创建这部分参见Spark 任务调度之启动CoarseGrainedExecutorBackend中的2.4 处理返回消息RegisteredExecutor.

3.4 将Executor加入监听队列

在这里插入图片描述
到这里为止，Executor的启动与注册均已完成，接下来，Driver皆可以利用这些注册了的Executor来执行Task，完成计算任务。

3.5 开始调度执行Task

在这里插入图片描述

总结

本文是继承前面Driver启动部分（Driver启动之后就要启动Executor来执行Task)。在前文中，Master的Schedule()中LaunchDriver()执行完之后，就马上执行了startExecutorsOnWorkers()。这个就是本文的起点。

本文主要讲解了从startExecutorsOnWorkers()开始Launch Executor的整个过程。主要涉及了

Master : 负责资源调度，规划Executor的分布等
Worker : 负责Executor的启动
Driver : Executor要到Driver注册，并由Driver分配Task在Executor上执行。

后面，我们就来看看Task是如何提交给Executor并执行的。

致谢

附录

Master端：

-------------------------Master.scala : startExecutorsOnWorkers()-----------
   /**
   * 调度并启动worker的Executors
   */
  private def startExecutorsOnWorkers(): Unit = {
    // 现在这是一个非常简单的fifo调度程序。我们持续尝试调度队列中的第一个应用程序，然后是第二个应用程序，等等。

    // 每次从waitingApps中取出一个app,为它启动Executors
    for (app <- waitingApps) {
      val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)
      
      //  private[master] def coresLeft: Int = requestedCores - coresGranted
      // 如果剩余的cores小于coresPerExecutor，则不会分配剩余的cores
      if (app.coresLeft >= coresPerExecutor) {
        
        // 获得有足够资源启动一个Executor的Workers集合
        val usableWorkers = workers.toArray
          .filter(_.state == WorkerState.ALIVE)
          .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
            worker.coresFree >= coresPerExecutor)
          .sortBy(_.coresFree)
          .reverse
        
        // 确定每个Worker上分配的core数量，这是一个数组。
        val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

        // 一旦决定了每个Worker上分配的Cores数量之后，分配他们。
        for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
          allocateWorkerResourceToExecutors(
            app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos))
        }
      }
    }
  }
-------------------------Master.scala : scheduleExecutorsOnWorkers()----------
  private def scheduleExecutorsOnWorkers(
      app: ApplicationInfo,
      usableWorkers: Array[WorkerInfo],
      spreadOutApps: Boolean): Array[Int] = {
    val coresPerExecutor = app.desc.coresPerExecutor
    val minCoresPerExecutor = coresPerExecutor.getOrElse(1)
    val oneExecutorPerWorker = coresPerExecutor.isEmpty
    val memoryPerExecutor = app.desc.memoryPerExecutorMB
    val numUsable = usableWorkers.length
    val assignedCores = new Array[Int](numUsable) // 分配给每个Worker的Core数量
    val assignedExecutors = new Array[Int](numUsable) // 每个Worker上将启动的新Executor数量
    
    var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)

    /** 返回此Worker是否可以为此App启动一个Executor */
    def canLaunchExecutor(pos: Int): Boolean = {
      val keepScheduling = coresToAssign >= minCoresPerExecutor
      val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor

      // 如果允许一个Worker上多个Executor,那么启动新的Executor,否则如果只是对已经存在的Executor增加Core.
      val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0
      if (launchingNewExecutor) {
        val assignedMemory = assignedExecutors(pos) * memoryPerExecutor
        val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor
        val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit
        keepScheduling && enoughCores && enoughMemory && underLimit
      } else {
        // 如果是向现有的Executor添加core,那么不用管memory和executor限制。
        // 如果app需要core，并且Worker还有，就返回true去让worker为Executor添加core.
        keepScheduling && enoughCores
      }
    }

    // 继续启动执Executor，直到没有更多的Worker可以容纳更多的Executor，或者如果我们已经达到了此App的限制数量
    var freeWorkers = (0 until numUsable).filter(canLaunchExecutor)
    while (freeWorkers.nonEmpty) {
      freeWorkers.foreach { pos =>
        var keepScheduling = true
        while (keepScheduling && canLaunchExecutor(pos)) {
          // 因为每次都是按照每个Executor需要的最小Core数量来分配的，
          // 因此，如果确定了Worker上可以分配一个Executor,那么对待分配和已分配Core数量的增减也是按这个数量来
          coresToAssign -= minCoresPerExecutor
          assignedCores(pos) += minCoresPerExecutor

          // 如果我们确定为每个Worker只启动一个Executor，那么每次迭代都会为Executor分配一个Core。
          // 否则，每次迭代都会将Core分配给一个新的Executor。
          if (oneExecutorPerWorker) {
            assignedExecutors(pos) = 1
          } else {
            assignedExecutors(pos) += 1
          }

          // 分散Application意味着将其Executor扩展到尽可能多的Worker中。
          // 如果我们不进行Spread，那么我们应该继续在这个worker上调度执行器，直到我们使用Worker的所有资源。
          // 否则，就转到下一个Worker。
          if (spreadOutApps) {
            keepScheduling = false
          }
        }
      }
      freeWorkers = freeWorkers.filter(canLaunchExecutor)
    }
    assignedCores
  }
------------Master.scala : allocateWorkerResourceToExecutors()---------------

  // 将一个Worker上的资源分配给一个或多个Executor
  private def allocateWorkerResourceToExecutors(
      app: ApplicationInfo,
      assignedCores: Int,
      coresPerExecutor: Option[Int],
      worker: WorkerInfo): Unit = {
    //如果指定了每个Executor的Core数，我们将此Worker的Core平均分配给Executor，不留余数。
    //否则，我们将启动一个执行器来获取此工作进程上所有已分配的核心。
    val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
    val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
    for (i <- 1 to numExecutors) {
      val exec = app.addExecutor(worker, coresToAssign)
      launchExecutor(worker, exec)
      app.state = ApplicationState.RUNNING
    }
  }
------------Master.scala : launchExecutor()----------------
  private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
    logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
    worker.addExecutor(exec)
    worker.endpoint.send(LaunchExecutor(masterUrl,
      exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
    exec.application.driver.send(
      ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
  }

Worker端：


----------------------------Worker.scala:receive()-----------
    case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
      if (masterUrl != activeMasterUrl) {
        logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
      } else {
        try {
          logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))

          // 1.创建Executor的工作目录
          val executorDir = new File(workDir, appId + "/" + execId)
          if (!executorDir.mkdirs()) {
            throw new IOException("Failed to create directory " + executorDir)
          }

          // 2.为Executor创建本地文件夹集appLocalDirs:这个文件夹集主要是为了？。
          //   这些文件夹的Paths是通过环境变量SPARK_EXECUTOR_DIRS传递给Executor的，当Application完成后会由Worker删除（这里没有搞懂两个文件夹的作用和区别？）
          val appLocalDirs = appDirectories.getOrElse(appId, {
            val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
            val dirs = localRootDirs.flatMap { dir =>
              try {
                val appDir = Utils.createDirectory(dir, namePrefix = "executor")
                Utils.chmod700(appDir)
                Some(appDir.getAbsolutePath())
              } catch {
                case e: IOException =>
                  logWarning(s"${e.getMessage}. Ignoring this directory.")
                  None
              }
            }.toSeq
            if (dirs.isEmpty) {
              throw new IOException("No subfolder can be created in " +
                s"${localRootDirs.mkString(",")}.")
            }
            dirs
          })
          appDirectories(appId) = appLocalDirs
          // 4. 创建一个ExecutorRunner，传入任务相关信息以及资源相关信息
          val manager = new ExecutorRunner(
            appId,
            execId,
            appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
            cores_,
            memory_,
            self,
            workerId,
            host,
            webUi.boundPort,
            publicAddress,
            sparkHome,
            executorDir,
            workerUri,
            conf,
            appLocalDirs, ExecutorState.RUNNING)
          executors(appId + "/" + execId) = manager
          manager.start()
          coresUsed += cores_
          memoryUsed += memory_
          sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
        } catch {
          case e: Exception =>
            logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
            if (executors.contains(appId + "/" + execId)) {
              executors(appId + "/" + execId).kill()
              executors -= appId + "/" + execId
            }
            sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
              Some(e.toString), None))
        }
      }



---------------------ExecutorRunner.scala:start()-------------

  private[worker] def start() {
    workerThread = new Thread("ExecutorRunner for " + fullId) {
      override def run() { fetchAndRunExecutor() }
    }
    workerThread.start()
    // Shutdown hook that kills actors on shutdown.
    shutdownHook = ShutdownHookManager.addShutdownHook { () =>
      // It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will
      // be `ExecutorState.RUNNING`. In this case, we should set `state` to `FAILED`.
      if (state == ExecutorState.RUNNING) {
        state = ExecutorState.FAILED
      }
      killProcess(Some("Worker shutting down")) }
  }
---------------------ExecutorRunner.scala:fetchAndRunExecutor()-------------
  /**
   * Download and run the executor described in our ApplicationDescription
   * 下载并运行AppDesc中描述的Executor
   */
  private def fetchAndRunExecutor() {
    try {
      // 启动此进程
      // 1. 从appDesc获取此Executor执行的相关参数信息，从而生成ProcessBuilder
      val subsOpts = appDesc.command.javaOpts.map {
        Utils.substituteAppNExecIds(_, appId, execId.toString)
      }
      val subsCommand = appDesc.command.copy(javaOpts = subsOpts)
      val builder = CommandUtils.buildProcessBuilder(subsCommand, new SecurityManager(conf),
        memory, sparkHome.getAbsolutePath, substituteVariables)
        
      // formattedCommand这个是为了格式化输出日志或者其他，不是核心代码
      val command = builder.command()
      val formattedCommand = command.asScala.mkString("\"", "\" \"", "\"")
      logInfo(s"Launch command: $formattedCommand")
      
      // 2. 设置ProcessBuilder的目录（executorDir、appLocalDirs）
      builder.directory(executorDir)
      builder.environment.put("SPARK_EXECUTOR_DIRS", appLocalDirs.mkString(File.pathSeparator))//路径分离器

      // 如果我们是在spark shell中运行这个命令，请避免为executor命令创建“scala”父进程
      builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")

      // 3. 添加 webUI，设置日志路径 urls
      val baseUrl =
        if (conf.getBoolean("spark.ui.reverseProxy", false)) {
          s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType="
        } else {
          s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
        }
      builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
      builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")
      
      // 4. 执行ProcessBuilder.start()启动进程的执行。
      process = builder.start()
      val header = "Spark Executor Command: %s\n%s\n\n".format(
        formattedCommand, "=" * 40)

      // 5. 将stdout and stderr重定向到files中（路径为前面设置的executorDir）
      val stdout = new File(executorDir, "stdout")
      stdoutAppender = FileAppender(process.getInputStream, stdout, conf)

      val stderr = new File(executorDir, "stderr")
      Files.write(header, stderr, StandardCharsets.UTF_8)
      stderrAppender = FileAppender(process.getErrorStream, stderr, conf)

      // 6. 等待它退出；执行器可以使用代码0（当驱动程序指示它关闭时）或非零退出代码退出
      //    如果退出了，向所属的Worker报告状态改变。
      val exitCode = process.waitFor()
      state = ExecutorState.EXITED
      val message = "Command exited with code " + exitCode
      worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
    } catch {
      case interrupted: InterruptedException =>
        logInfo("Runner thread for executor " + fullId + " interrupted")
        state = ExecutorState.KILLED
        killProcess(None)
      case e: Exception =>
        logError("Error running executor", e)
        state = ExecutorState.FAILED
        killProcess(Some(e.toString))
    }
  }
}

Driver端：

----------------------package org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.scala-----------------
    override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {

      case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>
        if (executorDataMap.contains(executorId)) {
          executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
          context.reply(true)
        } else if (scheduler.nodeBlacklist.contains(hostname)) {
          // 如果cluster manager在黑名单节点上给我们一个executor
          //（因为它在我们通知我们的黑名单之前已经开始分配这些资源，或者它忽略了我们的黑名单），那么我们立即拒绝该executor。
          logInfo(s"Rejecting $executorId as it has been blacklisted.")
          executorRef.send(RegisterExecutorFailed(s"Executor is blacklisted: $executorId"))
          context.reply(true)
        } else {
          //如果执行程序的rpc env没有侦听传入连接，则`hostPort`将为null，并且应该使用client连接来联系executor。
          val executorAddress = if (executorRef.address != null) {
              executorRef.address
            } else {
              context.senderAddress
            }
          // 执行到这里，说明ExecutorId在当前维护的ExecutorRef中,并且不在Scheduler的黑名单中
          // 1. 将Executor相关信息添加到对应的列表中
          logInfo(s"Registered executor $executorRef ($executorAddress) with ID $executorId")
          addressToExecutorId(executorAddress) = executorId
          totalCoreCount.addAndGet(cores)
          totalRegisteredExecutors.addAndGet(1)
          
          // 2. 创建一个ExecutorData，并以放入executorDataMap中<ExecutorId,ExecutorData>
          val data = new ExecutorData(executorRef, executorAddress, hostname,
            cores, cores, logUrls)
            // 这里必须同步，因为在请求Executor时，会读取在此块中发生变化的变量
          CoarseGrainedSchedulerBackend.this.synchronized {
            executorDataMap.put(executorId, data)
            if (currentExecutorIdCounter < executorId.toInt) {
              currentExecutorIdCounter = executorId.toInt
            }
            if (numPendingExecutors > 0) {
              numPendingExecutors -= 1
              logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")
            }
          }
          
          // 3. 通知Executor(CoarseGrainedExecutorBackend)已注册。
          executorRef.send(RegisteredExecutor)
          // Note: some tests expect the reply to come after we put the executor in the map
          context.reply(true)
          
          // 4. 将Executor发送到监听队列中
          listenerBus.post(
            SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))
            
          // 5. Task执行开始 ： 获取资源  
          makeOffers()
        }
}