Spark CoarseGrainedExecutorBackend的启动

1 简介

CoarseGrainedExecutorBackend是Executor的守护进程,用于Executor的创建和维护。CoarseGrainedExecutorBackend在spark运行期是一个单独的进程,我们可以在运行spark的期间,用jps命令查看一下,就可以看到这个进程。

2 代码流程

下面我们就来看一下CoarseGrainedExecutorBackend的启动。源码版本spark-2.4.0
在上一篇博客中,我们讲到Executor的启动,会调用FetchAndRunExecutor的方法,然后创建ProcessBuilder的命令进行启动CoarseGrainedExecutorBackend。
首先进入到CoarseGrainedExecutorBackend的类中,

private[spark] class CoarseGrainedExecutorBackend(
    override val rpcEnv: RpcEnv,
    driverUrl: String,
    executorId: String,
    hostname: String,
    cores: Int,
    userClassPath: Seq[URL],
    env: SparkEnv)
  extends ThreadSafeRpcEndpoint with ExecutorBackend with Logging {

private[this] val stopping = new AtomicBoolean(false)
var executor: Executor = null
@volatile var driver: Option[RpcEndpointRef] = None

// If this CoarseGrainedExecutorBackend is changed to support multiple threads, then this may need
// to be changed so that we don't share the serializer instance across threads
private[this] val ser: SerializerInstance = env.closureSerializer.newInstance()

可以看到:
1,CoarseGrainedExecutorBackend是继承了ThreadSafeRpcEndpoint和ExecutorBackend。所以是RpcEndpoint的子类,能够和其他的Endpoint进行通信,它的生命周期同样是onStart->receive*->stop。
2,维护了两个属性,executor和Driver, executor负责运行task,driver负责和Driver通信。

再进入到main方法里面:

def main(args: Array[String]) {
//声明一些变量
  var driverUrl: String = null
  var executorId: String = null
  var hostname: String = null
  var cores: Int = 0
  var appId: String = null
  var workerUrl: Option[String] = None
  val userClassPath = new mutable.ListBuffer[URL]()
//获取命令的列表
  var argv = args.toList
  while (!argv.isEmpty) {
//根据参数进行模式匹配
    argv match {
      case ("--driver-url") :: value :: tail =>
        driverUrl = value
        argv = tail
      case ("--executor-id") :: value :: tail =>
        executorId = value
        argv = tail
      case ("--hostname") :: value :: tail =>
        hostname = value
        argv = tail
      case ("--cores") :: value :: tail =>
        cores = value.toInt
        argv = tail
      case ("--app-id") :: value :: tail =>
        appId = value
        argv = tail
      case ("--worker-url") :: value :: tail =>
        // Worker url is used in spark standalone mode to enforce fate-sharing with worker
        workerUrl = Some(value)
        argv = tail
      case ("--user-class-path") :: value :: tail =>
        userClassPath += new URL(value)
        argv = tail
      case Nil =>
      case tail =>
        // scalastyle:off println
        System.err.println(s"Unrecognized options: ${tail.mkString(" ")}")
        // scalastyle:on println
        printUsageAndExit()
    }
  }

  if (driverUrl == null || executorId == null || hostname == null || cores <= 0 ||
    appId == null) {
    printUsageAndExit()
  }
//调用run方法,传入赋值后的参数
  run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
  System.exit(0)
}

下面看一下run方法:

private def run(
    driverUrl: String,
    executorId: String,
    hostname: String,
    cores: Int,
    appId: String,
    workerUrl: Option[String],
    userClassPath: Seq[URL]) {

  Utils.initDaemon(log)

  SparkHadoopUtil.get.runAsSparkUser { () =>
    // Debug code
    Utils.checkHost(hostname)

    // Bootstrap to fetch the driver's Spark properties.
    val executorConf = new SparkConf
//创建RpcEnv
    val fetcher = RpcEnv.create(
      "driverPropsFetcher",
      hostname,
      -1,
      executorConf,
      new SecurityManager(executorConf),
      clientMode = true)
//创建driver
    val driver = fetcher.setupEndpointRefByURI(driverUrl)
//向driver发送消息,RetrieveSparkAppConfig,获取spark的属性
    val cfg = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig)
    val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId))
//关闭fetcher
    fetcher.shutdown()

    // Create SparkEnv using properties we fetched from the driver.
    val driverConf = new SparkConf()
    for ((key, value) <- props) {
      // this is required for SSL in standalone mode
      if (SparkConf.isExecutorStartupConf(key)) {
        driverConf.setIfMissing(key, value)
      } else {
//将获取的消息添加到driverConf中
        driverConf.set(key, value)
      }
    }

    cfg.hadoopDelegationCreds.foreach { tokens =>
      SparkHadoopUtil.get.addDelegationTokens(tokens, driverConf)
    }
//创建Executor的SparkEnv
    val env = SparkEnv.createExecutorEnv(
      driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)
//  创建CoarseGrainedExecutorBackend实例,并注册到自身的Executor Env的rpcEnv中
    env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
      env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
//创建workerwarcher,在worker出现异常的时候会关闭CoarseGrainedExecutorBackend
    workerUrl.foreach { url =>
      env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
    }
//阻塞等待RpcEnv的退出
    env.rpcEnv.awaitTermination()
  }
}

上面的方法中完成以下几件事:
1,创建RpcEnv
2,获取Spark的配置信息
3,创建Executor的sparkEnv
4, 创建CoarseGrainedExecutorBackend实例,并注册到自身的Executor Env的rpcEnv中。
5,若Worker 出现异常的时候,关闭CoarseGrainedExecutorBackend。
CoarseGrainedExecutorBackend在注册到rpcEnv中的时候,会调用CoarseGrainedExecutorBackend的onStart方法,看一下这个方法:

override def onStart() {
  logInfo("Connecting to driver: " + driverUrl)
  rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
    // This is a very fast action so we can use "ThreadUtils.sameThread"
    driver = Some(ref)
//向driver发送RegisterExecutor的消息,
    ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
  }(ThreadUtils.sameThread).onComplete {
    // This is a very fast action so we can use "ThreadUtils.sameThread"
    case Success(msg) =>
      // Always receive `true`. Just ignore it
    case Failure(e) =>
      exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
  }(ThreadUtils.sameThread)
}
发送注册Executor的消息,会发送到CoarseGrainedSchedulerBackend端,看一下这里的方法:
case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>
//判断是否已经存在要注册的Executor
  if (executorDataMap.contains(executorId)) {
//如果存在,就向executor发送注册失败的消息
    executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
    context.reply(true)
//判断是否在黑名单里
  } else if (scheduler.nodeBlacklist.contains(hostname)) {
    // If the cluster manager gives us an executor on a blacklisted node (because it
    // already started allocating those resources before we informed it of our blacklist,
    // or if it ignored our blacklist), then we reject that executor immediately.
//如果在黑名单里,则发送注册失败
    logInfo(s"Rejecting $executorId as it has been blacklisted.")
    executorRef.send(RegisterExecutorFailed(s"Executor is blacklisted: $executorId"))
    context.reply(true)
  } else {
    // If the executor's rpc env is not listening for incoming connections, `hostPort`
    // will be null, and the client connection should be used to contact the executor.
    val executorAddress = if (executorRef.address != null) {
        executorRef.address
      } else {
        context.senderAddress
      }
    logInfo(s"Registered executor $executorRef ($executorAddress) with ID $executorId")
//向hashMap中添加executor id与rpc Address信息
    addressToExecutorId(executorAddress) = executorId
//更新总的cores数量
    totalCoreCount.addAndGet(cores)
//更新总的注册的Executor的数量
    totalRegisteredExecutors.addAndGet(1)
//创建ExecutorData
    val data = new ExecutorData(executorRef, executorAddress, hostname,
      cores, cores, logUrls)
    // This must be synchronized because variables mutated
    // in this block are read when requesting executors
    CoarseGrainedSchedulerBackend.this.synchronized {
      executorDataMap.put(executorId, data)
      if (currentExecutorIdCounter < executorId.toInt) {
        currentExecutorIdCounter = executorId.toInt
      }
      if (numPendingExecutors > 0) {
        numPendingExecutors -= 1
        logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")
      }
    }
//  向CoarseGrainedExecutorBackend发送消息Executor注册成功
    executorRef.send(RegisteredExecutor)
    // Note: some tests expect the reply to come after we put the executor in the map
    context.reply(true)
  //  向listenerBus投递SparkListenerExecutorAdded事件
    listenerBus.post(
      SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))
//  调用makeOffer方法,给Task分配资源
    makeOffers()
  }

上面的方法中完成了以下事情:
1.判断executor是否重复注册
2.向addressToExecutorId hashMap中添加executor id与rpc Address信息
3.更新totalCoreCount总数,
4.更新executor总数
5.封装ExecutorData对象
6.将分装的ExecutorData对象,executor Id添加到executorDataMap缓存中
7.更新currentExecutorIdCounter、numPendingExecutors信息
8.向CoarseGrainedExecutorBackend发送消息Executor注册成功
9.向listenerBus投递SparkListenerExecutorAdded事件
10.调用makeOffer方法,给Task分配资源
再回到CoarseGrainedExecutorBackend端,

case RegisteredExecutor =>
  logInfo("Successfully registered with driver")
  try {
    executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
  } catch {
    case NonFatal(e) =>
      exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
  }

接收到注册Executor的信息后就会创建Executor。至此CoarseGrainedExecutorBackend的启动流程就结束了。

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值