Spark源码分析之CoarseGrainedExecutorBackend运行流程（Executor）

最新推荐文章于 2024-06-19 17:15:12 发布

HaiwiSong

最新推荐文章于 2024-06-19 17:15:12 发布

阅读量1.1k

点赞数

分类专栏：大数据：Spark 文章标签： spark ExecutorBackend CoarseGrained executor DriverEndpoint

本文链接：https://blog.csdn.net/oTengYue/article/details/105596352

版权

大数据：Spark 专栏收录该内容

12 篇文章 5 订阅

订阅专栏

接上文 Spark源码分析之AM端运行流程（Driver）分析完了在AM端Driver的运行流程，在最后我们看到AM向Yarn提交申请Executor容器请求，请求上下文参数如下图：
在这里插入图片描述
Yarn分配运行Executor容器流程和Yarn分配运行Driver容器流程一样（流程分析见 Spark源码分析之任务提交流程（Client）），我们继续看启动Executor的launch_container.sh：

如上面两图可以看出，容器启动后的入口类是 org.apache.spark.executor.CoarseGrainedExecutorBackend；由--driver-url spark://CoarseGrainedScheduler@node0:43195可知Driver注册的服务名为CoarseGrainedScheduler，对应的服务类是DriverEndpoint，是org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend的内部类。
下面我们进入CoarseGrainedExecutorBackend的伴生类的main函数开始分析（源码同样基于Spark2.4）：

  def main(args: Array[String]) {
    var driverUrl: String = null
    var executorId: String = null
    var hostname: String = null
    var cores: Int = 0
    var appId: String = null
    var workerUrl: Option[String] = None
    val userClassPath = new mutable.ListBuffer[URL]()

    var argv = args.toList
    while (!argv.isEmpty) {
      argv match {
        case ("--driver-url") :: value :: tail =>
          driverUrl = value
          argv = tail
        case ("--executor-id") :: value :: tail =>
          executorId = value
          argv = tail
        case ("--hostname") :: value :: tail =>
          hostname = value
          argv = tail
        case ("--cores") :: value :: tail =>
          cores = value.toInt
          argv = tail
        case ("--app-id") :: value :: tail =>
          appId = value
          argv = tail
        case ("--worker-url") :: value :: tail =>
          // Worker url is used in spark standalone mode to enforce fate-sharing with worker
          workerUrl = Some(value)
          argv = tail
        case ("--user-class-path") :: value :: tail =>
          userClassPath += new URL(value)
          argv = tail
        case Nil =>
        case tail =>
          // scalastyle:off println
          System.err.println(s"Unrecognized options: ${tail.mkString(" ")}")
          // scalastyle:on println
          printUsageAndExit()
      }
    }

    if (hostname == null) {
      hostname = Utils.localHostName()
      log.info(s"Executor hostname is not provided, will use '$hostname' to advertise itself")
    }

    if (driverUrl == null || executorId == null || cores <= 0 || appId == null) {
      printUsageAndExit()
    }

    run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
    System.exit(0)
  }

main函数操作包含：获取命令行参数，解析校验参数，最后调用伴生类的run方法（解析见代码注释），如下：


  private def run(
      driverUrl: String,
      executorId: String,
      hostname: String,
      cores: Int,
      appId: String,
      workerUrl: Option[String],
      userClassPath: Seq[URL]) {

    Utils.initDaemon(log)

    SparkHadoopUtil.get.runAsSparkUser { () =>
      // Debug code
      Utils.checkHost(hostname)

      // Bootstrap to fetch the driver's Spark properties.
      // 首先创建名为fetcher的RpcEnv，主要用于从driver拉取Spark配置信息，用完后面就关闭了
      val executorConf = new SparkConf
      val fetcher = RpcEnv.create(
        "driverPropsFetcher",
        hostname,
        -1,
        executorConf,
        new SecurityManager(executorConf),
        clientMode = true)
      // 通过--driver-url参数注册获得Driver引用（driverUrl例如：spark://CoarseGrainedScheduler@node0:43195）
      val driver = fetcher.setupEndpointRefByURI(driverUrl)
      // 向Driver发送RetrieveSparkAppConfig消息，拉取Spark配置信息
      val cfg = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig)
      val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId))
      fetcher.shutdown()

      // Create SparkEnv using properties we fetched from the driver.
      // 通过拉取的driver配置信息创建SparkConf
      val driverConf = new SparkConf()
      for ((key, value) <- props) {
        // this is required for SSL in standalone mode
        if (SparkConf.isExecutorStartupConf(key)) {
          driverConf.setIfMissing(key, value)
        } else {
          driverConf.set(key, value)
        }
      }

      cfg.hadoopDelegationCreds.foreach { tokens =>
        SparkHadoopUtil.get.addDelegationTokens(tokens, driverConf)
      }

      // 创建Executor的SparkEnv
      val env = SparkEnv.createExecutorEnv(
        driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)

      // 创建CoarseGrainedExecutorBackend实例，并注册到自身的Executor Env的rpcEnv中
      env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
        env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
      // 创建WorkerWatcher，用于当worker发生异常情况时，关闭CoarseGrainedExecutorBackend（仅standalone模式有效）
      workerUrl.foreach { url =>
        env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
      }
      // 等待，直到rpcEnv退出
      env.rpcEnv.awaitTermination()
    }
  }

其中重点是val env = SparkEnv.createExecutorEnv(...) 和env.rpcEnv.setupEndpoint(...)，涉及Spark的Rpc框架设计（详细分析见另一篇博文 Spark源码分析之Rpc框架），在前者SparkEnv.createExecutorEnv()函数初始化注册了多个服务，例如：MapOutputTracker、BlockManagerMaster和OutputCommitCoordinator等，后者env.rpcEnv.setupEndpoint()函数又单独注册了Executor服务（为啥这么分开注册，目前没明白原因，但不影响代码分析），注册完后就可以和driver进行正常消息交互。
那我们看交互是如何开始的？
从 Spark源码分析之Rpc框架分析可知，在setupEndpoint函数注册过程中，会向Dispatcher调度器注册EndpointData，而在实例化EndpointData时候，会维护一个Inbox用于接受消息，在实例化Inbox时候会放入第一个消息OnStart消息，这样在后面的循环消费消息时候就能消费第一个消息调用服务对应的onStart()方法了，即是每个服务在注册过程都会首先触发调用自身的onStart方法。下面以注册CoarseGrainedExecutorBackend服务调用其onStart()方法演示其过程：
在这里插入图片描述

在其StandaloneSchedulerBackend.onstart()方法中会向Driver（即CoarseGrainedExecutorBackend的内部类DriverEndpoint）发送RegisterExecutor消息，如下图：

DriverEndpoint接受到RegisterExecutor消息后，判断executor是否重复注册，如果重复注册直接回复消息；否则，初始化生成ExecutorData并添加到内存中，并向CoarseGrainedExecutorBackend发送RegisteredExecutor消息。CoarseGrainedExecutorBackend接收到消息后，会初始化 Executor 实例，初始化Executor工作，例如定时发送心跳等。从上面分析我们可以看出：CoarseGrainedExecutorBackend是一个JVM进程，该进程为Executor的守护进程，用于Executor的创建和维护，CoarseGrainedExecutorBackend和Executor是一一对应，一个Worker可以启动多个CoarseGrainedExecutorBackend。
至此完成了Executor在Driver的注册，之后Executor就可以接受Driver下发的各种消息了。