Spark源码解析系列(四、资源分配及Executor启动)

回顾

上一节 我们简单看了下application的注册过程,今天我们接着看下spark里核心设计之一的资源分配实现。

Master处理注册消息

Master在收到消息后调用receive() 方法,根据消息类型找到注册application的实现部分。

case RegisterApplication(description, driver) =>
      // 先判断master节点目前状态是否可用
      if (state == RecoveryState.STANDBY) {
        // ignore, don't send response
      } else {
        logInfo("Registering app " + description.name)
        val app = createApplication(description, driver)
        registerApplication(app)
        logInfo("Registered app " + description.name + " with ID " + app.id)
        // persistenceEngine 用来对任务状态做持久化的对象,方便失败重跑等操作。
        persistenceEngine.addApplication(app)
        // 给client发送一个注册完成的消息
        driver.send(RegisteredApplication(app.id, self))
        // 分配资源处理
        schedule()
      }

我们具体看下schedule()的逻辑

private def schedule(): Unit = {
    if (state != RecoveryState.ALIVE) {
      return
    }
    // 获取所有可用的Worker,并打乱顺序
    val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
    val numWorkersAlive = shuffledAliveWorkers.size
    var curPos = 0
    // waitingDrivers是一个列表,表示driver可以是多个的。这里依次对这些等待的应用进行资源分配
    for (driver <- waitingDrivers.toList) { 
    // 遍历waitingDrivers的副本,我们用循环的方式把Worker分配给每个等待的application。
    // 直到这次分配完所有可用的Worker。
      var launched = false
      var numWorkersVisited = 0
      while (numWorkersVisited < numWorkersAlive && !launched) {
        val worker = shuffledAliveWorkers(curPos)
        numWorkersVisited += 1
        // 如果当前分配的worker资源不能满足application要求的数量,会找下一个
        // 直到满足则启动这个application
        if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
          // 向worker发送启动消息
          launchDriver(worker, driver)
          waitingDrivers -= driver
          launched = true
        }
        curPos = (curPos + 1) % numWorkersAlive
      }
    }
    startExecutorsOnWorkers()
  }

资源分配

每一个application任务对应一个driver,driver启动后会向master申请资源并发送task任务。下面我们继续看startExecutorsOnWorkers(),这个方法将遍历所有等待中的application,依次对这些application分配资源。

private def startExecutorsOnWorkers(): Unit = {
    // 现在这是一个非常简单的FIFO调度程序
    for (app <- waitingApps if app.coresLeft > 0) {
      val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor
      // 过滤掉没有足够资源启动执行器的worker
      val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
        .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
          worker.coresFree >= coresPerExecutor.getOrElse(1))
        .sortBy(_.coresFree).reverse
        // 根据spreadOutApps参数有两种分配方式到worker,集中或者打散(默认)。
      val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

      // 确定好分配cores数量,进行具体的分配
      for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
        allocateWorkerResourceToExecutors(
          app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
      }
    }
  }

其中scheduleExecutorsOnWorkers方法比较长我们就不列出了,里面的while循环中不停对worker组进行检查,如果某个worker可以加载一个executor的话,会从待分配的核数中减去一个executor中包含的核数,并且在对应的已分配核数的数组中增加一个executor中包含的核数,这个数据与可用worker数据是对应的。另外如果每个worker中允许存在多个executor的话,则该worker每分配一次资源,就会增加一个executor,否则只能存在一个executor,另外如果spreadOutApps值为true的话,则在一个worker上分配完一次资源后,就去下一个worker上分配资源,否则会一直在这个worker分配资源,直到不满足一个executor所需资源为止,其中封装的canLaunchExecutor()方法返回指定的worker是否可以为这个app启动一个executor。

// spreadOutApps 我们可以在环境变量中通过spark.deploy.spreadOut修改
  private val spreadOutApps = conf.getBoolean("spark.deploy.spreadOut", true)

接着看allocateWorkerResourceToExecutors的会先创建这个Executor加入集合再发送消息给相应的worker和driver。

private def allocateWorkerResourceToExecutors(
      app: ApplicationInfo,
      assignedCores: Int,
      coresPerExecutor: Option[Int],
      worker: WorkerInfo): Unit = {
    val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
    val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
    for (i <- 1 to numExecutors) {
    // 新建ExecutorDesc 放入executors(一个Map集合)
      val exec = app.addExecutor(worker, coresToAssign)
    // 发送启动消息
      launchExecutor(worker, exec)
      app.state = ApplicationState.RUNNING
    }
  }

launchExecutor()给worker和driver分别发送executor启动消息。到此Executor资源分配策略就完成了。

private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
    logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
    worker.addExecutor(exec)
    worker.endpoint.send(LaunchExecutor(masterUrl,
      exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
    exec.application.driver.send(
      ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
  }

Worker启动Executor

我们再简单看看worker端收到消息做了哪些相关的事情。首先我们找到Worker类中处理LaunchExecutor消息的地方。

case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>

处理逻辑有点多,就挑些重点了

		...
		// 先创建ExecutorRunner,里面有很多重要配置信息在构造参数,相信大家看名称也明白
		val manager = new ExecutorRunner(
            appId,
            execId,
            appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
            cores_,
            memory_,
            self,
            workerId,
            host,
            webUi.boundPort,
            publicAddress,
            sparkHome,
            executorDir,
            workerUri,
            conf,
            appLocalDirs, ExecutorState.RUNNING)
          // 把ExecutorID放入map中
          executors(appId + "/" + execId) = manager
          // 启动这个Executor
          manager.start()
          coresUsed += cores_
          memoryUsed += memory_
          // 发送状态变更的消息给master
          sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))

接着我们看看start(),里面是Executor启动详情。

private[worker] def start() {
	// 先创建一个线程
    workerThread = new Thread("ExecutorRunner for " + fullId) {
    	// fetchAndRunExecutor是真正启动的逻辑
      override def run() { fetchAndRunExecutor() }
    }
    workerThread.start()
    // Shutdown hook that kills actors on shutdown.
    shutdownHook = ShutdownHookManager.addShutdownHook { () =>
      // It's possible that we arrive here before calling `fetchAndRunExecutor`, then `state` will
      // be `ExecutorState.RUNNING`. In this case, we should set `state` to `FAILED`.
      if (state == ExecutorState.RUNNING) {
        state = ExecutorState.FAILED
      }
      killProcess(Some("Worker shutting down")) }
  }

为什么要新开一个线程去处理了,就是防止阻塞,我们一个worker是可以执行多个Executor任务的。fetchAndRunExecutor()中是真正的启动逻辑。builder.start()会构建一个新的流程。

try {
      // Launch the process
      val builder = CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf),
        memory, sparkHome.getAbsolutePath, substituteVariables)
      val command = builder.command()
      val formattedCommand = command.asScala.mkString("\"", "\" \"", "\"")
      logInfo(s"Launch command: $formattedCommand")

      builder.directory(executorDir)
      builder.environment.put("SPARK_EXECUTOR_DIRS", appLocalDirs.mkString(File.pathSeparator))
      // In case we are running this from within the Spark Shell, avoid creating a "scala"
      // parent process for the executor command
      builder.environment.put("SPARK_LAUNCH_WITH_SCALA", "0")

      // Add webUI log urls
      val baseUrl =
        if (conf.getBoolean("spark.ui.reverseProxy", false)) {
          s"/proxy/$workerId/logPage/?appId=$appId&executorId=$execId&logType="
        } else {
          s"http://$publicAddress:$webUiPort/logPage/?appId=$appId&executorId=$execId&logType="
        }
      builder.environment.put("SPARK_LOG_URL_STDERR", s"${baseUrl}stderr")
      builder.environment.put("SPARK_LOG_URL_STDOUT", s"${baseUrl}stdout")

	  // 使用processBuilder构造器创建新的进程
      process = builder.start()
      val header = "Spark Executor Command: %s\n%s\n\n".format(
        formattedCommand, "=" * 40)

      // Redirect its stdout and stderr to files
      val stdout = new File(executorDir, "stdout")
      stdoutAppender = FileAppender(process.getInputStream, stdout, conf)

      val stderr = new File(executorDir, "stderr")
      Files.write(header, stderr, StandardCharsets.UTF_8)
      stderrAppender = FileAppender(process.getErrorStream, stderr, conf)

      // Wait for it to exit; executor may exit with code 0 (when driver instructs it to shutdown)
      // or with nonzero exit code
      val exitCode = process.waitFor()
      state = ExecutorState.EXITED
      val message = "Command exited with code " + exitCode
      worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
    }

到此spark的资源分配和worker启动Executor的过程就看完了,不足之处还望大家指正。下一节我们来看看RDD的执行流程。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值