Spark Executor的启动

最新推荐文章于 2023-03-18 08:41:21 发布

啥都不会的硕士

最新推荐文章于 2023-03-18 08:41:21 发布

阅读量594

点赞数 3

分类专栏： Spark 文章标签： spark Executor启动 lauchExecutor Spark 大数据

本文链接：https://blog.csdn.net/fengshaungme/article/details/87346783

版权

Spark 专栏收录该内容

10 篇文章 2 订阅

订阅专栏

1.简介

上一篇博客我们讲到了Application的注册，注册完成后，需要在相应的worker上启动Executor，用来执行分发的任务。所以本篇博客我们就来了解一下Executor的启动流程。源码版本为Spark-2.4.0

2.代码详解

在完成Worker ，Driver，Application的注册后，都会调用Schedule的方法，进入到Master的schedule的方法里面：

private def schedule(): Unit = {
  if (state != RecoveryState.ALIVE) {
    return
  }
  // Drivers take strict precedence over executors
  val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
  val numWorkersAlive = shuffledAliveWorkers.size
  var curPos = 0
  for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
    // We assign workers to each waiting driver in a round-robin fashion. For each driver, we
    // start from the last worker that was assigned a driver, and continue onwards until we have
    // explored all alive workers.
    var launched = false
    var numWorkersVisited = 0
    while (numWorkersVisited < numWorkersAlive && !launched) {
      val worker = shuffledAliveWorkers(curPos)
      numWorkersVisited += 1
      if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
        launchDriver(worker, driver)
        waitingDrivers -= driver
        launched = true
      }
      curPos = (curPos + 1) % numWorkersAlive
    }
  }
//启动Executor
  startExecutorsOnWorkers()
}

上面的一段代码，我们前面已经讲解过，主要是在特定的Worker上启动Driver的过程，而本篇博客的重点是最后一行代码，startExecutorsOnWorkers()，进入到这个方法里面：

private def startExecutorsOnWorkers(): Unit = {
  // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
  // in the queue, then the second app, etc.
  for (app <- waitingApps) {
//遍历在等待队列中的Application,根据ApplicationDescription得到每个核的core数量
    val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)
    // If the cores left is less than the coresPerExecutor,the cores left will not be allocated
//判断剩余的core数量是否大于每个Executor所需的数量
    if (app.coresLeft >= coresPerExecutor) {
      // Filter out workers that don't have enough resources to launch an executor
//筛选出处于Alive状态的Worker,并且空闲的内存和core的数量也要满足条件
      val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
        .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
          worker.coresFree >= coresPerExecutor)
        .sortBy(_.coresFree).reverse
//计算每个符合条件并选中的worker分配资源
      val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

      // Now that we've decided how many cores to allocate on each worker, let's allocate them
//确定了每个worker分配的资源后，就开始给相应的worker分配资源
      for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
        allocateWorkerResourceToExecutors(
          app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos))
      }
    }
  }
}

上述代码主要完成以下工作：
1，遍历Allpication队列，
2，筛选满足条件的worker,
3，为每个选中的worker计算需要分配的资源
4，开始资源分配
下面看一下是如何计算每个worker分配的资源。进入到scheduleExecutorsOnWorkers的算法：

private def scheduleExecutorsOnWorkers(
    app: ApplicationInfo,
    usableWorkers: Array[WorkerInfo],
    spreadOutApps: Boolean): Array[Int] = {
  //每个Executor的core数量
val coresPerExecutor = app.desc.coresPerExecutor
//每个Executor分配的最少core数量
  val minCoresPerExecutor = coresPerExecutor.getOrElse(1)
//每个worker一个Executor
  val oneExecutorPerWorker = coresPerExecutor.isEmpty
//每个Executor的内存大小
  val memoryPerExecutor = app.desc.memoryPerExecutorMB
//可用worker的数量
  val numUsable = usableWorkers.length
//每个worker已经分配的core数
  val assignedCores = new Array[Int](numUsable) // Number of cores to give to each worker
//每个worker的Excutor的数量
  val assignedExecutors = new Array[Int](numUsable) // Number of new executors on each worker
//总共需要分配的core数量
  var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)

  /** Return whether the specified worker can launch an executor for this app. */
  def canLaunchExecutor(pos: Int): Boolean = {
//能够分配的Core数量大于没个Executork可得到的最小core数量
    val keepScheduling = coresToAssign >= minCoresPerExecutor
//判断worker是否还有可分配的资源
    val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor

    // If we allow multiple executors per worker, then we can always launch new executors.
    // Otherwise, if there is already an executor on this worker, just give it more cores.
    val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0
    if (launchingNewExecutor) {
//计算每个worker上已经使用的内存大小
      val assignedMemory = assignedExecutors(pos) * memoryPerExecutor
//判断是否还有可分配的内存
      val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor
//判断是否能够满足application所需要的Executor数量需求
      val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit
      keepScheduling && enoughCores && enoughMemory && underLimit
    } else {
      // We're adding cores to an existing executor, so no need
      // to check memory and executor limits
      keepScheduling && enoughCores
    }
  }

  // Keep launching executors until no more workers can accommodate any
  // more executors, or if we have reached this application's limits
//筛选出满足条件的workers
  var freeWorkers = (0 until numUsable).filter(canLaunchExecutor)
  while (freeWorkers.nonEmpty) {
    freeWorkers.foreach { pos =>
      var keepScheduling = true
      while (keepScheduling && canLaunchExecutor(pos)) {
//更新分配的core数量
        coresToAssign -= minCoresPerExecutor
        assignedCores(pos) += minCoresPerExecutor

        // If we are launching one executor per worker, then every iteration assigns 1 core
        // to the executor. Otherwise, every iteration assigns cores to a new executor.
//  如果只允许每个worker上启动一个executor，那么worker分配的executor数
//1
  //  如果允许worker启动多个executor，就是在原来的executor上加1
        if (oneExecutorPerWorker) {
          assignedExecutors(pos) = 1
        } else {
          assignedExecutors(pos) += 1
        }

        // Spreading out an application means spreading out its executors across as
        // many workers as possible. If we are not spreading out, then we should keep
        // scheduling executors on this worker until we use all of its resources.
        // Otherwise, just move on to the next worker.
        if (spreadOutApps) {
          keepScheduling = false
        }
      }
    }
    freeWorkers = freeWorkers.filter(canLaunchExecutor)
  }
  assignedCores
}

上面主要完成以下几件事情：
1，计算相关的数据，包括每个Executor的core数，每个worker上需要分配的Executor数量以及每个Executor的内存大小等
2，采用Spreadout的资源调度算法，为每个满足条件的worker分配资源
3，返回给每个worker分配好的core的数组
这样就计算好了每个worke需要的资源，再回到前面，具体的开始执行资源分配的方法里面：
allocateWorkerResourceToExecutors

private def allocateWorkerResourceToExecutors(
    app: ApplicationInfo,
    assignedCores: Int,
    coresPerExecutor: Option[Int],
    worker: WorkerInfo): Unit = {
  // If the number of cores per executor is specified, we divide the cores assigned
  // to this worker evenly among the executors with no remainder.
  // Otherwise, we launch a single executor that grabs all the assignedCores on this worker.
//计算这个worker要分配的Executor数量
  val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
//worker上每一次为executor需要分配的cores的数量
  val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
  for (i <- 1 to numExecutors) {
//  调用addExecutor方法,为application添加executor
    val exec = app.addExecutor(worker, coresToAssign)
//  启动executor
    launchExecutor(worker, exec)
//  将application的状态改为Running
    app.state = ApplicationState.RUNNING
  }
}

上面主要完成的事情：
1.计算这个worker需要分配executor数量
2.获取worker上每一次为executor需要分配的cores的数量
3.根据需要分配的executor数量，调用app.addExecutor为application添加executor信息
4.调用launchExecutor，启动executor
5.将application的状态修改为Running
接下来看一下如何启动Executor的,进入到LaunchExecutor:

private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
  logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
//把需要启动的Executor信息添加到worker中
  worker.addExecutor(exec)
//向worker发送启动Executor的消息
  worker.endpoint.send(LaunchExecutor(masterUrl,
    exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
//向driver发送已经添加executor的消息
  exec.application.driver.send(
    ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}

上面主要完成：launchExecutor方法中，先向worker发送launchExecutor消息，然后向driver发送消息，告诉driver，executor已添加。
接下来看一下workers收到启动Executor的消息后是如何处理的：

case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
//判断master是否是active状态的
  if (masterUrl != activeMasterUrl) {
    logWarning("Invalid Master (" + masterUrl + ") attempted to launch executor.")
  } else {
    try {
      logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name))

      // Create the executor's working directory
//新建Executor的工作目录
      val executorDir = new File(workDir, appId + "/" + execId)
      if (!executorDir.mkdirs()) {
        throw new IOException("Failed to create directory " + executorDir)
      }

      // Create local dirs for the executor. These are passed to the executor via the
      // SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker when the
      // application finishes.
//创建application的本地目录
      val appLocalDirs = appDirectories.getOrElse(appId, {
        val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
        val dirs = localRootDirs.flatMap { dir =>
          try {
            val appDir = Utils.createDirectory(dir, namePrefix = "executor")
            Utils.chmod700(appDir)
            Some(appDir.getAbsolutePath())
          } catch {
            case e: IOException =>
              logWarning(s"${e.getMessage}. Ignoring this directory.")
              None
          }
        }.toSeq
        if (dirs.isEmpty) {
          throw new IOException("No subfolder can be created in " +
            s"${localRootDirs.mkString(",")}.")
        }
        dirs
      })
      appDirectories(appId) = appLocalDirs
//worker讲接收到的信息封装成ExecutorRunner对象
      val manager = new ExecutorRunner(
        appId,
        execId,
        appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
        cores_,
        memory_,
        self,
        workerId,
        host,
        webUi.boundPort,
        publicAddress,
        sparkHome,
        executorDir,
        workerUri,
        conf,
        appLocalDirs, ExecutorState.RUNNING)
      executors(appId + "/" + execId) = manager
//启动Executor
      manager.start()
//更新core和内存的状态
      coresUsed += cores_
      memoryUsed += memory_
//向Master发送Excutor的状态改变
      sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
    } catch {
      case e: Exception =>
        logError(s"Failed to launch executor $appId/$execId for ${appDesc.name}.", e)
        if (executors.contains(appId + "/" + execId)) {
          executors(appId + "/" + execId).kill()
          executors -= appId + "/" + execId
        }
        sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
          Some(e.toString), None))
    }
  }

上面主要完成的事情如下：
1.先会判断消息发送的master是否为alive
2.接着创建executor的工作目录和本地临时目录
3.将master发送来的信息封装为ExecutorRunner对象，ExecutorRunner用来管理一个executor进程的执行
4.调用ExecutorRunner的start方法
5.更新core和内存的状态
6.向Master发送消息，报告当前executor的状态
下面看一下start方法：

private[worker] def start() {
//创建一个线程
  workerThread = new Thread("ExecutorRunner for " + fullId) {
    override def run() { fetchAndRunExecutor() }
  }
//启动线程
  workerThread.start()
  // Shutdown hook that kills actors on shutdown.
  shutdownHook = ShutdownHookManager.addShutdownHook { () =>
    // It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will
    // be `ExecutorState.RUNNING`. In this case, we should set `state` to `FAILED`.
    if (state == ExecutorState.RUNNING) {
//启动失败就杀掉进程
      state = ExecutorState.FAILED
    }
    killProcess(Some("Worker shutting down")) }
}

再继续看一下fetchAndRunExecutor的方法：

private def fetchAndRunExecutor() {
  try {
    // Launch the process
    val subsOpts = appDesc.command.javaOpts.map {
      Utils.substituteAppNExecIds(_, appId, execId.toString)
    }
    val subsCommand = appDesc.command.copy(javaOpts = subsOpts)
//创建ProcessBuilder的执行命令
    val builder = CommandUtils.buildProcessBuilder(subsCommand, new SecurityManager(conf),
      memory, sparkHome.getAbsolutePath, substituteVariables)
    val command = builder.command()
    val formattedCommand = command.asScala.mkString("\"", "\" \"", "\"")
    logInfo(s"Launch command: $formattedCommand")
//创建执行目录
    builder.directory(executorDir)
//设置环境变量
    builder.environment.put("SPARK_EXECUTOR_DIRS", appLocalDirs.mkString(File.pathSeparator))
    // In case we are running this from within the Spark Shell, avoid creating a "scala"
    // parent process for the executor command
    builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")

    // Add webUI log urls
    val baseUrl =
      if (conf.getBoolean("spark.ui.reverseProxy", false)) {
        s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType="
      } else {
        s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
      }
    builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
    builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")
//启动ProcessBuilder
    process = builder.start()
    val header = "Spark Executor Command: %s\n%s\n\n".format(
      formattedCommand, "=" * 40)

    // Redirect its stdout and stderr to files
//  重定向进程输出流文件
    val stdout = new File(executorDir, "stdout")
    stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
//  重定向进程错误流文件
    val stderr = new File(executorDir, "stderr")
    Files.write(header, stderr, StandardCharsets.UTF_8)
    stderrAppender = FileAppender(process.getErrorStream, stderr, conf)

    // Wait for it to exit; executor may exit with code 0 (when driver instructs it to shutdown)
    // or with nonzero exit code
    val exitCode = process.waitFor()
//  如果executor的状态为退出
    state = ExecutorState.EXITED
    val message = "Command exited with code " + exitCode
//  向worker发送executor状态改变
    worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
  } catch {
    case interrupted: InterruptedException =>
      logInfo("Runner thread for executor " + fullId + " interrupted")
      state = ExecutorState.KILLED
      killProcess(None)
    case e: Exception =>
      logError("Error running executor", e)
// ExecutorState是FAILED就杀掉进程
      state = ExecutorState.FAILED
      killProcess(Some(e.toString))
  }
}

主要完成的事情：
1.创建ProcessBuilder，用于在本地执行命令或者执行脚本
2.为ProcessBuilder创建执行目录，该目录为executorDir目录，即worker创建的executor工作目录
3.为ProcessBuilder设置环境变量
4.启动ProcessBuilder，生成进程，
5.重定向进程输出流文件
6.重定向进程错误流文件
7.等待获取executor进程的退出状态码，等到executor的状态为已退出，向worker发送消息，executor状态改变
至此整个Executor的启动流程就完成了。

啥都不会的硕士

关注

3
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Spark Executor的启动

1.简介上一篇博客我们讲到了Application的注册，注册完成后，需要在相应的worker上启动Executor，用来执行分发的任务。所以本篇博客我们就来了解一下Executor的启动流程。源码版本为Spark-2.4.02.代码详解在完成Worker ，Driver，Application的注册后，都会调用Schedule的方法，进入到Master的schedule的方法里面：pri...
复制链接

扫一扫