大数据：Spark Standalone 集群调度（二）如何创建、分配Executors的资源

本文详细解析了Spark Standalone模式下任务调度的具体流程，包括Driver如何提交任务给Master，Master如何根据资源情况分配任务给Worker节点，以及Worker节点如何启动Executor执行任务。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Standalone 的整体架构

在Spark集群中的3个角色Client, Master, Worker, 下面的图是Client Submit 一个任务的流程图：

完整的流程：Driver 提交任务给Master, 由Master节点根据任务的参数对进行Worker的Executor的分配，Worker节点获取到具体的分配信息启动executor 的子进程

Master分配Executor的策略

Master 接收到从Client发送的RegiterApplication 的消息后，开始进行worker资源的分配和调度

1. 寻找有效的Worker

val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
        .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
          worker.coresFree >= coresPerExecutor.getOrElse(1))
        .sortBy(_.coresFree).reverse

在worker列表中，寻找有效的worker

A. 剩余内存大于单个Executor需要的内存

B. 剩余的内核数大于单个Executor所需的内核数

在Worker的分配中剩余的内核最多的（最空闲）的Worker，优先分配Executor

2. 分配Executor

Executor 和核数的关系? 可以简单的理解为进程和线程的关系，所以在分配一个新Executor的时候不仅要考虑核数同时还需要考虑内存是否足够。

几个控制参数

a. 每个executor的核数

当没有设置executor的核数

默认认为每个executor的核数是1
一个Worker上只能分配一个Executor（在这种情况下，一个Executor可以启动多个cores直到Worker的最大能分配的核数）

b. 每个Executor的需要的内存数

d. Application 所需要的内核数(total-executor-cores)

每个运行的Application所设置的最大内核数，如果没有设置，取默认的内核数

如何判断能在Worker上分配Executor?

def canLaunchExecutor(pos: Int): Boolean = {
      val keepScheduling = coresToAssign >= minCoresPerExecutor
      val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor

      // If we allow multiple executors per worker, then we can always launch new executors.
      // Otherwise, if there is already an executor on this worker, just give it more cores.
      val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0
      if (launchingNewExecutor) {
        val assignedMemory = assignedExecutors(pos) * memoryPerExecutor
        val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor
        val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit
        keepScheduling && enoughCores && enoughMemory && underLimit
      } else {
        // We're adding cores to an existing executor, so no need
        // to check memory and executor limits
        keepScheduling && enoughCores
      }
    }

Worker 上剩余的核数大于一个Executor的核数
Worker 上如果允许创建新的Executor，需要检查Worker上的内存是否足够Executor，和创建的Executor的总数否超过App对Executor的大小限制

EX: 设置

executor-cores=5

但如果Worker里剩余的core数只有4，这时候这个Executor 是无法在这个Worker上分配成功的

如何在Worker上均衡分配Executor

在Spark上通过轮训的在所有有效的Worker列表（在前面1里已经谈过如何创建空闲的worker列表）里创建Executor，每次轮训的在每个Worker上分配一个executor的核数（一个executor），直到分配完这个应用所需要的所有核数。

Master.scala

  private def scheduleExecutorsOnWorkers(
      app: ApplicationInfo,
      usableWorkers: Array[WorkerInfo],
      spreadOutApps: Boolean): Array[Int] = {
    val coresPerExecutor = app.desc.coresPerExecutor
    val minCoresPerExecutor = coresPerExecutor.getOrElse(1)
    val oneExecutorPerWorker = coresPerExecutor.isEmpty
    val memoryPerExecutor = app.desc.memoryPerExecutorMB
    val numUsable = usableWorkers.length
    val assignedCores = new Array[Int](numUsable) // Number of cores to give to each worker
    val assignedExecutors = new Array[Int](numUsable) // Number of new executors on each worker
    var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)
。。。。。。。。。

    // Keep launching executors until no more workers can accommodate any
    // more executors, or if we have reached this application's limits
    var freeWorkers = (0 until numUsable).filter(canLaunchExecutor)
    while (freeWorkers.nonEmpty) {
      freeWorkers.foreach { pos =>
        var keepScheduling = true
        while (keepScheduling && canLaunchExecutor(pos)) {
          coresToAssign -= minCoresPerExecutor
          assignedCores(pos) += minCoresPerExecutor

          // If we are launching one executor per worker, then every iteration assigns 1 core
          // to the executor. Otherwise, every iteration assigns cores to a new executor.
          if (oneExecutorPerWorker) {
            assignedExecutors(pos) = 1
          } else {
            assignedExecutors(pos) += 1
          }

          // Spreading out an application means spreading out its executors across as
          // many workers as possible. If we are not spreading out, then we should keep
          // scheduling executors on this worker until we use all of its resources.
          // Otherwise, just move on to the next worker.
          if (spreadOutApps) {
            keepScheduling = false
          }
        }
      }
      freeWorkers = freeWorkers.filter(canLaunchExecutor)
    }
    assignedCores
  }

assignedCores是每个workers的被分配的核数的列表，为何不是分配的Executor数目呢？

还记得前面的参数每个Executor的核数的配置么？如果没有配置，就是默认为每个Worker只起一个Executor, 如果此时返回的是Executor的数目列表的话，在这种情况下只能返回{1,1...}的集合，根本无法知道每个Worker的分配的核数。

但反过来却很容易知道每个Work要创建的Executor的数目，只要 cores.sum/coresPerExecutor 就可以了

3. Worker上申请资源

  private def allocateWorkerResourceToExecutors(
      app: ApplicationInfo,
      assignedCores: Int,
      coresPerExecutor: Option[Int],
      worker: WorkerInfo): Unit = {
    // If the number of cores per executor is specified, we divide the cores assigned
    // to this worker evenly among the executors with no remainder.
    // Otherwise, we launch a single executor that grabs all the assignedCores on this worker.
    val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
    val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
    for (i <- 1 to numExecutors) {
      val exec = app.addExecutor(worker, coresToAssign)
      launchExecutor(worker, exec)
      app.state = ApplicationState.RUNNING
    }
  }

基于2部分的Executor的分配原则，生成Executor的ID号，向Worker轮训的发送每个Executor的LaunchExecutor消息，同时也汇报给Driver ExecutedAdded的消息

  private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
    logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
    worker.addExecutor(exec)
    worker.endpoint.send(LaunchExecutor(masterUrl,
      exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
    exec.application.driver.send(
      ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
  }

Driver对ExecutedAdded消息的处理

      case ExecutorAdded(id: Int, workerId: String, hostPort: String, cores: Int, memory: Int) =>
        val fullId = appId + "/" + id
        logInfo("Executor added: %s on %s (%s) with %d cores".format(fullId, workerId, hostPort,
          cores))
        listener.executorAdded(fullId, workerId, hostPort, cores, memory)

在listener处理的函数里，只是简单的记录了日志

  override def executorAdded(fullId: String, workerId: String, hostPort: String, cores: Int,
    memory: Int) {
    logInfo("Granted executor ID %s on hostPort %s with %d cores, %s RAM".format(
      fullId, hostPort, cores, Utils.megabytesToString(memory)))
  }

4. 资源申请管理

虽然我们都在谈论Executor，但实际上核心数才是关键，而Worker的资源也是由core和内存来决定是否能够在上申请成功，如果Worker上的空闲核心数不够申请一个Executor的核心数时候，这个Worker会被忽略。

在Master上会有一个数组统计依然存活的Application

private val waitingApps = new ArrayBuffer[ApplicationInfo]

在启动ExecutorsOnWorkers函数里

private def startExecutorsOnWorkers(): Unit = {
    // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
    // in the queue, then the second app, etc.
    for (app <- waitingApps if app.coresLeft > 0) {
.....
}
}

没有运行完的application都会被加入到等待队列里，直到application运行结束，才会从队列中被移除。

如果没有完全分配完core的application（比如application设置了总共需要的cores，但实际上资源不够只分配了一部分Cores），都会继续再次优先被分配资源，因为在waitingApps的队列的前面，后续的Application资源分配遵循FIFO的策略，等待前面的Application分配资源结束，才能获取到分配资源的权利。

注意：

这里并不是代表没分配完Core的Application就不开始运行了，Application的最小单位是Executor, 在前面的代码里也看到在分配的时候，只要Worker能被分配出Executor，就会对Worker发送LaunchExecutor 消息，并不需要等完整的分配完下面的参数

total-executor-cores=10

在Spark理念中当资源不足的时候，先分配给Application一部分的Executor，让任务运行了在说，后续如果有Worker资源被释放,继续对该Application从worker中申请Executor，直到资源申请完，或者该Application运行完，而所有的Worker的状态、资源的状况，均保存在Master里，由Master来全局调度分配。

设置过大的Total-Executor-Cores会带来很大的风险

Master会不停的持续的分配Worker资源直到最大的Core的数目为止

后续：

Worker收到LaunchExecutor的消息后，会启动Executor的子进程，Executor会发消息RegisterExecutor给Application，通知Application所分配的Executor启动了