spark Launch Executor

这里写图片描述


ClientEndpoint发送RegisterApplication请求,Master返回RegisteredApplication注册成功消息,到这里application注册就完成了;接下来就是启动Executors,schedule()是启动Exexutors的入口

private def schedule(): Unit = {
  if (state != RecoveryState.ALIVE) {
    return
  }
  // 随机打乱所有的worker,避免在一个worker上启动过多的dirver;这里需要说明的是worker启动后会向master注册,
//注册完后master就有与worker通信的workendpointRef,
  val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
  val numWorkersAlive = shuffledAliveWorkers.size
  var curPos = 0
  for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
    var launched = false
    var numWorkersVisited = 0
    while (numWorkersVisited < numWorkersAlive && !launched) {
      val worker = shuffledAliveWorkers(curPos)
      numWorkersVisited += 1
      if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
        launchDriver(worker, driver)
        waitingDrivers -= driver
        launched = true
      }
      curPos = (curPos + 1) % numWorkersAlive
    }
  }
//在worker上启动Executor
  startExecutorsOnWorkers()
}

在schedule()最后一行就是启动Executors的具体实现startExecutorsOnWorkers(),具体方法的调用流程:
startExecutorsOnWorkers()->allocateWorkerResourceToExecutors()->launchExecutor(),这里重点关注launchExecutor(),之前的方法调用我们可以暂时忽略

private def launchExecutor(worker: WorkerInfo, exec: ExecutorDesc): Unit = {
  logInfo("Launching executor " + exec.fullId + " on worker " + worker.id)
  worker.addExecutor(exec)
//向worker发送LaunchExecutor消息
  worker.endpoint.send(LaunchExecutor(masterUrl,
    exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory))
  exec.application.driver.send(
    ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory))
}

在launchExecutor方法里,首先是用worker.endpoint.send(LaunchExecutor)请求,worker接收到请求后,首先创建executor的工作目录;

val executorDir = new File(workDir, appId + "/" + execId)

之后创建ExecutorRunner,并且调用start()方法,并且给worker和Master发送ExecutorStateChanged消息

val manager = new ExecutorRunner(
  appId,
  execId,
  appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
  cores_,
  memory_,
  self,
  workerId,
  host,
  webUi.boundPort,
  publicAddress,
  sparkHome,
  executorDir,
  workerUri,
  conf,
  appLocalDirs, ExecutorState.RUNNING)
executors(appId + "/" + execId) = manager
//启动Executor,在start方法里会向worker发送ExecutorStateChanged消息
manager.start()
coresUsed += cores_
memoryUsed += memory_
//向master发送ExecutorStateChanged消息
sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))

先来看下ExecutorRunner.start()方法,代码很简单,就是构造了一个线程,线程内部调用fetchANdRunExecutor()

workerThread = new Thread("ExecutorRunner for " + fullId) {
  override def run() { fetchAndRunExecutor() }
}
workerThread.start()

fetchAndRunExecutor这个方法主要用ProcessBuilder拼接Linux命令行启动Executor,

val builder = CommandUtils.buildProcessBuilder(appDesc.command, new SecurityManager(conf),
  memory, sparkHome.getAbsolutePath, substituteVariables)
process = builder.start()
worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))

Executor启动的时候同时给worker和master发送了ExecutorStateChanged消息,首先worker接收到消息后,直接就将改消息发送给了master;具体可以查看Worker中的handleExecutorStateChanged方法

sendToMaster(executorStateChanged)

Executor发送的消息最后都到了Master,Master收到消息后,给Dirver发送ExecutorUpdated消息,这里的Dirver也就是ClientEndpoint;

ClientEndpoint接收到消息后,打印了下状态信息,以及根据Executor的状态决定是否需要移除Executor

同时master会根据Executor的状态来决定时候需要移除Executor,最后再次调用schedule()方法,具体代码细节可以根据流程图去追踪下

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值