Spark- ApplicationMaster Class& ApplicationMaster Object即Spark AppMaster ,executor的启动源码解析


本作者也把代码的注释上传到 码云平台 帮助大家阅读和理解

Object ApplicationMaster

这个类是ApplicationMaster的伴生对象,当在yarn里面启动AppMaster的时候,会从Object ApplicationMaster main方法开始。
先来看看 启动时从main方法传进来的参数:
Seq(userClass) ++ userJar ++ primaryPyFile ++ primaryRFile ++ userArgs ++ Seq(–properties-file, hdfs___spark_conf__.properties)

userClass是 --class com.yyb.larn.main
userJar是 --jar hdfs://xx/yy.jar
primaryPyFile --primary-py-file None 用户的py文件
primaryRFile --primary-r-file None 用户的R文件
userArgs 是 --arg 用户自己Job的参数
–properties-file 这个是一个hdfs的zip文件,
hdfs://user/userName/.sparkStaging/appID/spark_conf/hdfs___spark_conf__.properties文件,里面包含了用户spark-submit 配置的 --conf的参数。这个文件在AppMaster运行的时候已经是本地的文件,在此节点运行目录里面。
下图的这个整个__spark_conf__.zip 有yarn负责 调度和下载。
在这里插入图片描述
下图是__spark_conf__.properties 文件的内容

spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/lib/hadoop/lib/native
spark.yarn.jars=local\:/opt/cloudera/parcels/SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658/lib/spark2/jars/*
spark.hadoop.mapreduce.application.classpath=
spark.sql.hive.metastore.jars=${env\:HADOOP_COMMON_HOME}/../hive/lib/*\:${env\:HADOOP_COMMON_HOME}/client/*
spark.executor.memory=10g
spark.yarn.cache.types=FILE,FILE,FILE
spark.master=yarn
spark.driver.memory=4g
spark.hadoop.yarn.application.classpath=
spark.authenticate=false
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.8.4-1.cdh5.8.4.p0.5/lib/hadoop/lib/native
spark.sql.catalogImplementation=hive
spark.submit.deployMode=cluster
spark.dynamicAllocation.enabled=true
spark.sql.hive.metastore.version=1.1.0
spark.app.name=com.saic.portrait.dw.to_dw.statistical.profileAggr.ToDWFactProfileAggr_up5_Job
spark.eventLog.enabled=true
spark.shuffle.service.port=7337
spark.yarn.dist.jars=hdfs\://nameservice1/user/center/script/jars/mongo-java-driver-3.4.2.jar,hdfs\://nameservice1/user/center/script/jars/amqp-client-4.2.0.jar
spark.yarn.cache.visibilities=PUBLIC,PUBLIC,PUBLIC
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../..
spark.yarn.cache.timestamps=1571901062837,1571901062692,1571901061930
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.yarn.cache.filenames=hdfs\://nameservice1/user/center/script/jars/personportrait-1.0-SNAPSHOT.jar\#__app__.jar,hdfs\://nameservice1/user/center/script/jars/mongo-java-driver-3.4.2.jar\#mongo-java-driver-3.4.2.jar,hdfs\://nameservice1/user/center/script/jars/amqp-client-4.2.0.jar\#amqp-client-4.2.0.jar
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.cache.sizes=1504481,1673643,491199
spark.yarn.cache.confArchive=hdfs\://nameservice1/user/center/.sparkStaging/application_1566690562041_9626/__spark_conf__.zip
spark.eventLog.dir=hdfs\://nameservice1/user/spark/spark2ApplicationHistory
spark.executor.instances=5
spark.ui.killEnabled=true
spark.yarn.historyServer.address=http\://njtest-cdh5-nn02.nj\:18089

下面来看看这个main方法:

def main(args: Array[String]): Unit = {
    SignalUtils.registerLogger(log)
    //参数解析 传进来的那些参数 这个是通过模式匹配来提取参数的
    val amArgs = new ApplicationMasterArguments(args)
    //见Class ApplicationMaster 部分的解读
    master = new ApplicationMaster(amArgs)
    //解读master的run 方法
    System.exit(master.run())
  }

Class ApplicationMaster

这个类是ApplicationMaster的伴生类。
先来看看这个类的构造方法:

private val isClusterMode = args.userClass != null
private val sparkConf = new SparkConf()
//--properties-file 一般不是空的,所以这个sparkConf保存用户和有关的config
//可以参考上面的__spark_conf__.properties 文件的这些属性,注意这里已经是本地文件了
  if (args.propertiesFile != null) {
    Utils.getPropertiesFromFile(args.propertiesFile).foreach { case (k, v) =>
      sparkConf.set(k, v)
    }
  }
private val securityMgr = new SecurityManager(sparkConf)
sparkConf.getAll.foreach { case (k, v) =>
	//把sparkConf 中的属性都设置到 sys 系统properties中,所以在运行用户Job的
	//时候,需要从 sys的properties中获取config 
	//在 SparkConfig中 这个是默认开启 从从 sys的properties中获取config
	//可以参考我的另一篇博文 https://blog.csdn.net/u010374412/article/details/103038530
	//这里有一个 如果有多个driver同时运行在这个host上时,会不会出现属性冲突???
	//当然不会了,这里设计到java 的System properties 这个类,这个类的实例只会在一个jvm实例中存在一份,即System properties 是在jvm内部共享的
    sys.props(k) = v
  }
 private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
private val ugi = {}
//连接resourceMananger 以便申请资源
private val client = doAsUser { new YarnRMClient() }
//失败最大尝试次数
private val maxNumExecutorFailures = {
    val effectiveNumExecutors =
      if (Utils.isDynamicAllocationEnabled(sparkConf)) {
        sparkConf.get(DYN_ALLOCATION_MAX_EXECUTORS)
      } else {
        sparkConf.get(EXECUTOR_INSTANCES).getOrElse(0)
      }
    // By default, effectiveNumExecutors is Int.MaxValue if dynamic allocation is enabled. We need
    // avoid the integer overflow here.
    val defaultMaxNumExecutorFailures = math.max(3,
      if (effectiveNumExecutors > Int.MaxValue / 2) Int.MaxValue else (2 * effectiveNumExecutors))

    sparkConf.get(MAX_EXECUTOR_FAILURES).getOrElse(defaultMaxNumExecutorFailures)
  }
@volatile private var exitCode = 0
  @volatile private var unregistered = false
  @volatile private var finished = false
  @volatile private var finalStatus = getDefaultFinalStatus
  @volatile private var finalMsg: String = ""
  @volatile private var userClassThread: Thread = _

  @volatile private var reporterThread: Thread = _
  @volatile private var allocator: YarnAllocator = _

  // A flag to check whether user has initialized spark context
  @volatile private var registered = false
private val userClassLoader = {}
private val allocatorLock = new Object()
private val heartbeatInterval ={}
private val initialAllocationInterval = {}
private var nextAllocationInterval = initialAllocationInterval
private var rpcEnv: RpcEnv = null
private val sparkContextPromise = Promise[SparkContext]()
private var credentialRenewer: AMCredentialRenewer = _
private val localResources = {}

run方法:

final def run(): Int = {
    doAsUser {
    //到 runImpl方法
      runImpl()
    }
    exitCode
  }

runImpl方法:

private def runImpl(): Unit = {
    try {//获取这个container的appid
      val appAttemptId = client.getAttemptId()

      var attemptID: Option[String] = None

      if (isClusterMode) {
        // Set the web ui port to be ephemeral for yarn so we don't conflict with
        // other spark processes running on the same box
        System.setProperty("spark.ui.port", "0")

        // Set the master and deploy mode property to match the requested mode.
        System.setProperty("spark.master", "yarn")
        System.setProperty("spark.submit.deployMode", "cluster")

        // Set this internal configuration if it is running on cluster mode, this
        // configuration will be checked in SparkContext to avoid misuse of yarn cluster mode.
        System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString())

        attemptID = Option(appAttemptId.getAttemptId.toString)
      }

      new CallerContext(
        "APPMASTER", sparkConf.get(APP_CALLER_CONTEXT),
        Option(appAttemptId.getApplicationId.toString), attemptID).setCurrentContext()

      logInfo("ApplicationAttemptId: " + appAttemptId)

      // This shutdown hook should run *after* the SparkContext is shut down.
      val priority = ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY - 1
      ShutdownHookManager.addShutdownHook(priority) { () =>
        val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
        val isLastAttempt = client.getAttemptId().getAttemptId() >= maxAppAttempts

        if (!finished) {
          // The default state of ApplicationMaster is failed if it is invoked by shut down hook.
          // This behavior is different compared to 1.x version.
          // If user application is exited ahead of time by calling System.exit(N), here mark
          // this application as failed with EXIT_EARLY. For a good shutdown, user shouldn't call
          // System.exit(0) to terminate the application.
          finish(finalStatus,
            ApplicationMaster.EXIT_EARLY,
            "Shutdown hook called before final status was reported.")
        }

        if (!unregistered) {
          // we only want to unregister if we don't want the RM to retry
          if (finalStatus == FinalApplicationStatus.SUCCEEDED || isLastAttempt) {
            unregister(finalStatus, finalMsg)
            cleanupStagingDir()
          }
        }
      }

      // If the credentials file config is present, we must periodically renew tokens. So create
      // a new AMDelegationTokenRenewer
      if (sparkConf.contains(CREDENTIALS_FILE_PATH)) {
        // Start a short-lived thread for AMCredentialRenewer, the only purpose is to set the
        // classloader so that main jar and secondary jars could be used by AMCredentialRenewer.
        val credentialRenewerThread = new Thread {
          setName("AMCredentialRenewerStarter")
          setContextClassLoader(userClassLoader)

          override def run(): Unit = {
            val credentialManager = new YARNHadoopDelegationTokenManager(
              sparkConf,
              yarnConf,
              conf => YarnSparkHadoopUtil.hadoopFSsToAccess(sparkConf, conf))

            val credentialRenewer =
              new AMCredentialRenewer(sparkConf, yarnConf, credentialManager)
            credentialRenewer.scheduleLoginFromKeytab()
          }
        }

        credentialRenewerThread.start()
        credentialRenewerThread.join()
      }

      if (isClusterMode) {
      //这里是 主要的运行内容
        runDriver()
      } else {
        runExecutorLauncher()
      }
    } catch {
      case e: Exception =>
        // catch everything else if not specifically handled
        logError("Uncaught exception: ", e)
        finish(FinalApplicationStatus.FAILED,
          ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
          "Uncaught exception: " + e)
    }
  }

runDriver:

private def runDriver(): Unit = {
    addAmIpFilter(None)
    //启动应用
    userClassThread = startUserApplication()

    // This a bit hacky, but we need to wait until the spark.driver.port property has
    // been set by the Thread executing the user class.
    logInfo("Waiting for spark context initialization...")
    val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
    try {
    //等待用户Job线程完成sparkContext的初始化
    //这里的sc初始化之后,会在YarnClusterScheduler类的postStartHook函数中调用(如果是yarn-cluster模式)
    //ApplicationMaster Object的sparkContextInitialized方法,这个方法中
    //会把初始化后的sc赋值给 ApplicationMaster 的实例的 sparkContextPromise的
    //所以这里就会有sc这个用户线程初始化的sc的实例,默认sc的初始化最长100S

//但是何时调用这个YarnClusterScheduler类的postStartHook啦?
//在SparkContext的构造方法中,完成_schedulerBackend和_taskScheduler的创建后,会调用postApplicationStart()这个方法中,执行YarnClusterScheduler类的postStartHook 。。。
      val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
        Duration(totalWaitTime, TimeUnit.MILLISECONDS))
      if (sc != null) {
      //获取通信rpc环境
        rpcEnv = sc.env.rpcEnv
        val driverRef = createSchedulerRef(
          sc.getConf.get("spark.driver.host"),
          sc.getConf.get("spark.driver.port"))
        //注册AppMaster
        //这个里面会申请executors的资源,当sc实例化完成后,会阻塞用户线程,等待 这里申请资源
        registerAM(sc.getConf, rpcEnv, driverRef, sc.ui.map(_.webUrl))
        registered = true
      } else {
        // Sanity check; should never happen in normal operation, since sc should only be null
        // if the user app did not create a SparkContext.
        throw new IllegalStateException("User did not initialize spark context!")
      }
      //这个触发 刚才阻塞的用户线程
      resumeDriver()
      //等待用户线程完成
      userClassThread.join()
    } catch {
      case e: SparkException if e.getCause().isInstanceOf[TimeoutException] =>
        logError(
          s"SparkContext did not initialize after waiting for $totalWaitTime ms. " +
           "Please check earlier log output for errors. Failing the application.")
        finish(FinalApplicationStatus.FAILED,
          ApplicationMaster.EXIT_SC_NOT_INITED,
          "Timed out waiting for SparkContext.")
    } finally {
      resumeDriver()
    }
  }

startUserApplication:

private def startUserApplication(): Thread = {
    logInfo("Starting the user application in a separate Thread")

    var userArgs = args.userArgs//用户自己的Job参数
    //scala程序的话不会走这个if条件
    if (args.primaryPyFile != null && args.primaryPyFile.endsWith(".py")) {
      // When running pyspark, the app is run using PythonRunner. The second argument is the list
      // of files to add to PYTHONPATH, which Client.scala already handles, so it's empty.
      userArgs = Seq(args.primaryPyFile, "") ++ userArgs
    }
    if (args.primaryRFile != null && args.primaryRFile.endsWith(".R")) {
      // TODO(davies): add R dependencies here
    }
	//找到main 方法
    val mainMethod = userClassLoader.loadClass(args.userClass)
      .getMethod("main", classOf[Array[String]])

    val userThread = new Thread {
      override def run() {
        try {
        //传入参数,并且执行方法,从这个开始就是执行用户自己的Job了
          mainMethod.invoke(null, userArgs.toArray)
          finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
          logDebug("Done running users class")
        } catch {
          case e: InvocationTargetException =>
            e.getCause match {
              case _: InterruptedException =>
                // Reporter thread can interrupt to stop user class
              case SparkUserAppException(exitCode) =>
                val msg = s"User application exited with status $exitCode"
                logError(msg)
                finish(FinalApplicationStatus.FAILED, exitCode, msg)
              case cause: Throwable =>
                logError("User class threw exception: " + cause, cause)
                finish(FinalApplicationStatus.FAILED,
                  ApplicationMaster.EXIT_EXCEPTION_USER_CLASS,
                  "User class threw exception: " + StringUtils.stringifyException(cause))
            }
            sparkContextPromise.tryFailure(e.getCause())
        } finally {
          // Notify the thread waiting for the SparkContext, in case the application did not
          // instantiate one. This will do nothing when the user code instantiates a SparkContext
          // (with the correct master), or when the user code throws an exception (due to the
          // tryFailure above).
          sparkContextPromise.trySuccess(null)
        }
      }
    }
    userThread.setContextClassLoader(userClassLoader)
    userThread.setName("Driver")
    userThread.start()
    userThread
  }

registerAM

这个方法里面是 申请和启动executors的内容,下面来详细看看这块代码:

//这个方法的执行实在 driver 的 非用户线程 执行的
private def registerAM(
      _sparkConf: SparkConf,
      _rpcEnv: RpcEnv,
      driverRef: RpcEndpointRef,
      uiAddress: Option[String]) = {
    val appId = client.getAttemptId().getApplicationId().toString()/appID
    val attemptId = client.getAttemptId().getAttemptId().toString()
    val historyAddress = ApplicationMaster
      .getHistoryServerAddress(_sparkConf, yarnConf, appId, attemptId)//Spark History 地址
//注意 这里是与 driver 的地址和端口 ref 的name 为 CoarseGrainedScheduler
//这个 会在 executor 中用到
    val driverUrl = RpcEndpointAddress(
      _sparkConf.get("spark.driver.host"),
      _sparkConf.get("spark.driver.port").toInt,
      CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString

    logInfo {
      val executorMemory = sparkConf.get(EXECUTOR_MEMORY).toInt
      val executorCores = sparkConf.get(EXECUTOR_CORES)
      val dummyRunner = new ExecutorRunnable(None, yarnConf, sparkConf, driverUrl, "<executorId>",
        "<hostname>", executorMemory, executorCores, appId, securityMgr, localResources)
      dummyRunner.launchContextDebugInfo()
    }
//准备 申请executor资源 返回YarnAllocator对象
    allocator = client.register(driverUrl,
      driverRef,
      yarnConf,
      _sparkConf,
      uiAddress,
      historyAddress,
      securityMgr,
      localResources)

    //install YarnAM ref
    //目的 在于 完成 executor的自己的启动之后, driver 发送 init 命令给 executor
    rpcEnv.setupEndpoint("YarnAM", new AMEndpoint(rpcEnv, driverRef))
   //开始 申请executor资源 详细的可以看看 下一个小节
    allocator.allocateResources()
    reporterThread = launchReporterThread()
  }

YarnAllocator

方法:

//开始 申请executor资源
def allocateResources(): Unit = synchronized {
    updateResourceRequests()//申请资源完毕

    val progressIndicator = 0.1f
    // Poll the ResourceManager. This doubles as a heartbeat if there are no pending container
    // requests.
    val allocateResponse = amClient.allocate(progressIndicator)

    val allocatedContainers = allocateResponse.getAllocatedContainers()

    if (allocatedContainers.size > 0) {
      logDebug(("Allocated containers: %d. Current executor count: %d. " +
        "Launching executor count: %d. Cluster resources: %s.")
        .format(
          allocatedContainers.size,
          runningExecutors.size,
          numExecutorsStarting.get,
          allocateResponse.getAvailableResources))
      //初始化 和 启动 executor
      handleAllocatedContainers(allocatedContainers.asScala)
    }

    val completedContainers = allocateResponse.getCompletedContainersStatuses()
    if (completedContainers.size > 0) {
      logDebug("Completed %d containers".format(completedContainers.size))
      processCompletedContainers(completedContainers.asScala)
      logDebug("Finished processing %d completed containers. Current running executor count: %d."
        .format(completedContainers.size, runningExecutors.size))
    }
  }


//尽可能申请 目标个executor,还未 启动
def updateResourceRequests(): Unit = {
    val pendingAllocate = getPendingAllocate
    val numPendingAllocate = pendingAllocate.size
    //计算需要申请 多少个executor
    val missing = targetNumExecutors - numPendingAllocate -
      numExecutorsStarting.get - runningExecutors.size
    logDebug(s"Updating resource requests, target: $targetNumExecutors, " +
      s"pending: $numPendingAllocate, running: ${runningExecutors.size}, " +
      s"executorsStarting: ${numExecutorsStarting.get}")

    if (missing > 0) {//一般会走这一步
      logInfo(s"Will request $missing executor container(s), each with " +
        s"${resource.getVirtualCores} core(s) and " +
        s"${resource.getMemory} MB memory (including $memoryOverhead MB of overhead)")

      // Split the pending container request into three groups: locality matched list, locality
      // unmatched list and non-locality list. Take the locality matched container request into
      // consideration of container placement, treat as allocated containers.
      // For locality unmatched and locality free container requests, cancel these container
      // requests, since required locality preference has been changed, recalculating using
      // container placement strategy.
      val (localRequests, staleRequests, anyHostRequests) = splitPendingAllocationsByLocality(
        hostToLocalTaskCounts, pendingAllocate)

      // cancel "stale" requests for locations that are no longer needed
      staleRequests.foreach { stale =>
        amClient.removeContainerRequest(stale)
      }
      val cancelledContainers = staleRequests.size
      if (cancelledContainers > 0) {
        logInfo(s"Canceled $cancelledContainers container request(s) (locality no longer needed)")
      }

      // consider the number of new containers and cancelled stale containers available
      val availableContainers = missing + cancelledContainers

      // to maximize locality, include requests with no locality preference that can be cancelled
      val potentialContainers = availableContainers + anyHostRequests.size

      val containerLocalityPreferences = containerPlacementStrategy.localityOfRequestedContainers(
        potentialContainers, numLocalityAwareTasks, hostToLocalTaskCounts,
          allocatedHostToContainersMap, localRequests)

      val newLocalityRequests = new mutable.ArrayBuffer[ContainerRequest]
      containerLocalityPreferences.foreach {
        case ContainerLocalityPreferences(nodes, racks) if nodes != null =>
          newLocalityRequests += createContainerRequest(resource, nodes, racks)
        case _ =>
      }

      if (availableContainers >= newLocalityRequests.size) {
        // more containers are available than needed for locality, fill in requests for any host
        for (i <- 0 until (availableContainers - newLocalityRequests.size)) {
          newLocalityRequests += createContainerRequest(resource, null, null)
        }
      } else {
        val numToCancel = newLocalityRequests.size - availableContainers
        // cancel some requests without locality preferences to schedule more local containers
        anyHostRequests.slice(0, numToCancel).foreach { nonLocal =>
          amClient.removeContainerRequest(nonLocal)
        }
        if (numToCancel > 0) {
          logInfo(s"Canceled $numToCancel unlocalized container requests to resubmit with locality")
        }
      }

      newLocalityRequests.foreach { request =>
        amClient.addContainerRequest(request)
      }

      if (log.isInfoEnabled()) {
        val (localized, anyHost) = newLocalityRequests.partition(_.getNodes() != null)
        if (anyHost.nonEmpty) {
          logInfo(s"Submitted ${anyHost.size} unlocalized container requests.")
        }
        localized.foreach { request =>
          logInfo(s"Submitted container request for host ${hostStr(request)}.")
        }
      }
    } else if (numPendingAllocate > 0 && missing < 0) {
      val numToCancel = math.min(numPendingAllocate, -missing)
      logInfo(s"Canceling requests for $numToCancel executor container(s) to have a new desired " +
        s"total $targetNumExecutors executors.")

      val matchingRequests = amClient.getMatchingRequests(RM_REQUEST_PRIORITY, ANY_HOST, resource)
      if (!matchingRequests.isEmpty) {
        matchingRequests.iterator().next().asScala
          .take(numToCancel).foreach(amClient.removeContainerRequest)
      } else {
        logWarning("Expected to find pending requests, but found none.")
      }
    }
  }


//初始化 和 启动 executors
def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
    val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)

    // Match incoming requests by host
    val remainingAfterHostMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- allocatedContainers) {
      matchContainerToRequest(allocatedContainer, allocatedContainer.getNodeId.getHost,
        containersToUse, remainingAfterHostMatches)
    }

    // Match remaining by rack
    val remainingAfterRackMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- remainingAfterHostMatches) {
      val rack = resolver.resolve(conf, allocatedContainer.getNodeId.getHost)
      matchContainerToRequest(allocatedContainer, rack, containersToUse,
        remainingAfterRackMatches)
    }

    // Assign remaining that are neither node-local nor rack-local
    val remainingAfterOffRackMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- remainingAfterRackMatches) {
      matchContainerToRequest(allocatedContainer, ANY_HOST, containersToUse,
        remainingAfterOffRackMatches)
    }

    if (!remainingAfterOffRackMatches.isEmpty) {
      logDebug(s"Releasing ${remainingAfterOffRackMatches.size} unneeded containers that were " +
        s"allocated to us")
      for (container <- remainingAfterOffRackMatches) {
        internalReleaseContainer(container)
      }
    }
	//方法转到这里
    runAllocatedContainers(containersToUse)

    logInfo("Received %d containers from YARN, launching executors on %d of them."
      .format(allocatedContainers.size, containersToUse.size))
  }

//启动 所有的executors
private def runAllocatedContainers(containersToUse: ArrayBuffer[Container]): Unit = {
    for (container <- containersToUse) {
      executorIdCounter += 1
      val executorHostname = container.getNodeId.getHost
      val containerId = container.getId
      val executorId = executorIdCounter.toString
      assert(container.getResource.getMemory >= resource.getMemory)
      logInfo(s"Launching container $containerId on host $executorHostname " +
        s"for executor with ID $executorId")

      def updateInternalState(): Unit = synchronized {
        runningExecutors.add(executorId)
        numExecutorsStarting.decrementAndGet()
        executorIdToContainer(executorId) = container
        containerIdToExecutorId(container.getId) = executorId

        val containerSet = allocatedHostToContainersMap.getOrElseUpdate(executorHostname,
          new HashSet[ContainerId])
        containerSet += containerId
        allocatedContainerToHostMap.put(containerId, executorHostname)
      }

      if (runningExecutors.size() < targetNumExecutors) {
        numExecutorsStarting.incrementAndGet()
        if (launchContainers) {
        //在 ContainerLauncher 线程池里面 运行 ExecutorRunnable的代码 ,最大的池数 25
          launcherPool.execute(new Runnable {
            override def run(): Unit = {
              try {//这里又转到了 ExecutorRunnable 类 详情见 下一节
                new ExecutorRunnable(
                  Some(container),
                  conf,
                  sparkConf,
                  driverUrl,
                  executorId,
                  executorHostname,
                  executorMemory,
                  executorCores,
                  appAttemptId.getApplicationId.toString,
                  securityMgr,
                  localResources
                ).run()
                updateInternalState()
              } catch {
                case e: Throwable =>
                  numExecutorsStarting.decrementAndGet()
                  if (NonFatal(e)) {
                    logError(s"Failed to launch executor $executorId on container $containerId", e)
                    // Assigned container should be released immediately
                    // to avoid unnecessary resource occupation.
                    amClient.releaseAssignedContainer(containerId)
                  } else {
                    throw e
                  }
              }
            }
          })
        } else {
          // For test only
          updateInternalState()
        }
      } else {
        logInfo(("Skip launching executorRunnable as running executors count: %d " +
          "reached target executors count: %d.").format(
          runningExecutors.size, targetNumExecutors))
      }
    }
  }

ExecutorRunnable

这个类 负责 组装 executor 的启动命令

//开始方法
def run(): Unit = {
    logDebug("Starting Executor Container")
    nmClient = NMClient.createNMClient()
    nmClient.init(conf)
    nmClient.start()
    //主要在这个方法
    startContainer()
  }

def startContainer(): java.util.Map[String, ByteBuffer] = {
    val ctx = Records.newRecord(classOf[ContainerLaunchContext])
      .asInstanceOf[ContainerLaunchContext]
    val env = prepareEnvironment().asJava

    ctx.setLocalResources(localResources.asJava)
    ctx.setEnvironment(env)

    val credentials = UserGroupInformation.getCurrentUser().getCredentials()
    val dob = new DataOutputBuffer()
    credentials.writeTokenStorageToStream(dob)
    ctx.setTokens(ByteBuffer.wrap(dob.getData()))
	//组装 启动 executor 命令
    val commands = prepareCommand()

    ctx.setCommands(commands.asJava)
    ctx.setApplicationACLs(
      YarnSparkHadoopUtil.getApplicationAclsForYarn(securityMgr).asJava)

    // If external shuffle service is enabled, register with the Yarn shuffle service already
    // started on the NodeManager and, if authentication is enabled, provide it with our secret
    // key for fetching shuffle files later
    if (sparkConf.get(SHUFFLE_SERVICE_ENABLED)) {
      val secretString = securityMgr.getSecretKey()
      val secretBytes =
        if (secretString != null) {
          // This conversion must match how the YarnShuffleService decodes our secret
          JavaUtils.stringToBytes(secretString)
        } else {
          // Authentication is not enabled, so just provide dummy metadata
          ByteBuffer.allocate(0)
        }
      ctx.setServiceData(Collections.singletonMap("spark_shuffle", secretBytes))
    }

    // Send the start request to the ContainerManager
    try {
    //启动这个 container
      nmClient.startContainer(container.get, ctx)
    } catch {
      case ex: Exception =>
        throw new SparkException(s"Exception while starting container ${container.get.getId}" +
          s" on host $hostname", ex)
    }
  }


//组装 启动 executor 命令
private def prepareCommand(): List[String] = {
    // Extra options for the JVM
    val javaOpts = ListBuffer[String]()

    // Set the environment variable through a command prefix
    // to append to the existing value of the variable
    var prefixEnv: Option[String] = None

    // Set the JVM memory
    val executorMemoryString = executorMemory + "m"
    javaOpts += "-Xmx" + executorMemoryString

    // Set extra Java options for the executor, if defined
    sparkConf.get(EXECUTOR_JAVA_OPTIONS).foreach { opts =>
      javaOpts ++= Utils.splitCommandString(opts).map(YarnSparkHadoopUtil.escapeForShell)
    }
    sparkConf.get(EXECUTOR_LIBRARY_PATH).foreach { p =>
      prefixEnv = Some(Client.getClusterPath(sparkConf, Utils.libraryPathEnvPrefix(Seq(p))))
    }

    javaOpts += "-Djava.io.tmpdir=" +
      new Path(Environment.PWD.$$(), YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR)

    // Certain configs need to be passed here because they are needed before the Executor
    // registers with the Scheduler and transfers the spark configs. Since the Executor backend
    // uses RPC to connect to the scheduler, the RPC settings are needed as well as the
    // authentication settings.
    sparkConf.getAll
      .filter { case (k, v) => SparkConf.isExecutorStartupConf(k) }
      .foreach { case (k, v) => javaOpts += YarnSparkHadoopUtil.escapeForShell(s"-D$k=$v") }

    // Commenting it out for now - so that people can refer to the properties if required. Remove
    // it once cpuset version is pushed out.
    // The context is, default gc for server class machines end up using all cores to do gc - hence
    // if there are multiple containers in same node, spark gc effects all other containers
    // performance (which can also be other spark containers)
    // Instead of using this, rely on cpusets by YARN to enforce spark behaves 'properly' in
    // multi-tenant environments. Not sure how default java gc behaves if it is limited to subset
    // of cores on a node.
    /*
        else {
          // If no java_opts specified, default to using -XX:+CMSIncrementalMode
          // It might be possible that other modes/config is being done in
          // spark.executor.extraJavaOptions, so we don't want to mess with it.
          // In our expts, using (default) throughput collector has severe perf ramifications in
          // multi-tenant machines
          // The options are based on
          // http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html#0.0.0.%20When%20to%20Use
          // %20the%20Concurrent%20Low%20Pause%20Collector|outline
          javaOpts += "-XX:+UseConcMarkSweepGC"
          javaOpts += "-XX:+CMSIncrementalMode"
          javaOpts += "-XX:+CMSIncrementalPacing"
          javaOpts += "-XX:CMSIncrementalDutyCycleMin=0"
          javaOpts += "-XX:CMSIncrementalDutyCycle=10"
        }
    */

    // For log4j configuration to reference
    javaOpts += ("-Dspark.yarn.app.container.log.dir=" + ApplicationConstants.LOG_DIR_EXPANSION_VAR)

    val userClassPath = Client.getUserClasspath(sparkConf).flatMap { uri =>
      val absPath =
        if (new File(uri.getPath()).isAbsolute()) {
          Client.getClusterPath(sparkConf, uri.getPath())
        } else {
          Client.buildPath(Environment.PWD.$(), uri.getPath())
        }
      Seq("--user-class-path", "file:" + absPath)
    }.toSeq

    YarnSparkHadoopUtil.addOutOfMemoryErrorArgument(javaOpts)
    //启动命令
    val commands = prefixEnv ++
      Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
      javaOpts ++
      Seq("org.apache.spark.executor.CoarseGrainedExecutorBackend",
        "--driver-url", masterAddress,
        "--executor-id", executorId,
        "--hostname", hostname,
        "--cores", executorCores.toString,
        "--app-id", appId) ++
      userClassPath ++
      Seq(
        s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
        s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")

    // TODO: it would be nicer to just make sure there are no null commands here
    commands.map(s => if (s == null) "null" else s).toList
  }

object CoarseGrainedExecutorBackend & class CoarseGrainedExecutorBackend

当SparkAPPMaster 申请到execution,启动 execution的时候 就会来到CoarseGrainedExecutorBackend 的伴生类和伴生对象这里。这个是运行在 executor的主要类。
包含的功能包括:

  1. 向driver注册executor。
  2. 处理 向driver注册成功后的处理
  3. 处理 向driver注册失败后的处理
  4. 运行 driver 分配的Task
  5. kill 正在运行的Task
  6. Stop executor
  7. ShutDown executor

object CoarseGrainedExecutorBackend

看看详细的代码:

private[spark] object CoarseGrainedExecutorBackend extends Logging {
  //通过上一步 解析玩参数后,来启动 executor
  private def run(
      driverUrl: String,
      executorId: String,
      hostname: String,
      cores: Int,
      appId: String,
      workerUrl: Option[String],
      userClassPath: Seq[URL]) {

    Utils.initDaemon(log)

    SparkHadoopUtil.get.runAsSparkUser { () =>
      // Debug code
      Utils.checkHost(hostname)

      // Bootstrap to fetch the driver's Spark properties.
      val executorConf = new SparkConf
      /**
       * NettyEnv 通信环境已经准备好了
       * 但是这个 fetcher 之后为了获取 driver的 SparkConfig
       * 之后 就会 shutdown
       */
      val fetcher: RpcEnv = RpcEnv.create(
        "driverPropsFetcher",
        hostname,//hostname 就是 本机 name
        -1,
        executorConf,
        new SecurityManager(executorConf),
        clientMode = true)
      // setupEndpointRefByURI 这个方法会 验证driverUrl 的host是否 已经install了RpcEndpointVerifier 这个ref
      //如果 这个host addr是正常的话,那么这个host 必然会有host是否有RpcEndpointVerifier 这个ref 因为在 NettyEnv 中 会首先 install RpcEndpointVerifier这个 ref
      val driver: RpcEndpointRef = fetcher.setupEndpointRefByURI(driverUrl)
      //使用和driver 通信的 ref 和driver通信 的org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend 的driver install 过的 ref
      //driver的处理过程在 org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend line 231
      val cfg: SparkAppConfig = driver.askSync[SparkAppConfig](RetrieveSparkAppConfig)//获取driver 中 以spark. 开头的 SparkConfig 可以看到 2个类型都是SparkAppConfig
      val props = cfg.sparkProperties ++ Seq[(String, String)](("spark.app.id", appId))
      fetcher.shutdown()

      //下面才会 创建 executor 的 NettyEnv
      // Create SparkEnv using properties we fetched from the driver.
      val driverConf = new SparkConf()
      for ((key, value) <- props) {
        // this is required for SSL in standalone mode
        if (SparkConf.isExecutorStartupConf(key)) {
          driverConf.setIfMissing(key, value)
        } else {
          driverConf.set(key, value)
        }
      }
      if (driverConf.contains("spark.yarn.credentials.file")) {
        logInfo("Will periodically update credentials from: " +
          driverConf.get("spark.yarn.credentials.file"))
        Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil")
          .getMethod("startCredentialUpdater", classOf[SparkConf])
          .invoke(null, driverConf)
      }

      cfg.hadoopDelegationCreds.foreach { tokens =>
        SparkHadoopUtil.get.addDelegationTokens(tokens, driverConf)
      }

      //创建 通信 环境 NettyEnv
      val env = SparkEnv.createExecutorEnv(
        driverConf, executorId, hostname, cores, cfg.ioEncryptionKey, isLocal = false)
      //executor install Executor ref
      //启动 CoarseGrainedExecutorBackend endPoint
      //下一步可以直接 观察CoarseGrainedExecutorBackend这个endPoint了
      env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend(
        env.rpcEnv, driverUrl, executorId, hostname, cores, userClassPath, env))
      workerUrl.foreach { url =>
        env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))
      }
      env.rpcEnv.awaitTermination()
      if (driverConf.contains("spark.yarn.credentials.file")) {
        Utils.classForName("org.apache.spark.deploy.yarn.YarnSparkHadoopUtil")
          .getMethod("stopCredentialUpdater")
          .invoke(null)
      }
    }
  }
//yarn cluster executor 启动入口
  def main(args: Array[String]) {
    var driverUrl: String = null  //driver 启动 executor 的时候组装命令的时候 传递过来的
    var executorId: String = null //一般是从0开始递增的
    var hostname: String = null //获取的是此台机器的 hostname
    var cores: Int = 0
    var appId: String = null
    var workerUrl: Option[String] = None
    val userClassPath = new mutable.ListBuffer[URL]()

    var argv = args.toList
    while (!argv.isEmpty) {
      argv match {
        case ("--driver-url") :: value :: tail =>
          driverUrl = value
          argv = tail
        case ("--executor-id") :: value :: tail =>
          executorId = value
          argv = tail
        case ("--hostname") :: value :: tail =>
          hostname = value
          argv = tail
        case ("--cores") :: value :: tail =>
          cores = value.toInt
          argv = tail
        case ("--app-id") :: value :: tail =>
          appId = value
          argv = tail
        case ("--worker-url") :: value :: tail =>
          // Worker url is used in spark standalone mode to enforce fate-sharing with worker
          workerUrl = Some(value)
          argv = tail
        case ("--user-class-path") :: value :: tail =>
          userClassPath += new URL(value)
          argv = tail
        case Nil =>
        case tail =>
          // scalastyle:off println
          System.err.println(s"Unrecognized options: ${tail.mkString(" ")}")
          // scalastyle:on println
          printUsageAndExit()
      }
    }
  //判断参数 是否合规
    if (driverUrl == null || executorId == null || hostname == null || cores <= 0 ||
      appId == null) {
      printUsageAndExit() //不合规的 打印帮助命令 并且exit 和 exitCode = 1
    }
    //转移 到run方法
    run(driverUrl, executorId, hostname, cores, appId, workerUrl, userClassPath)
    System.exit(0)
  }

  private def printUsageAndExit() = {
    // scalastyle:off println
    System.err.println(
      """
      |Usage: CoarseGrainedExecutorBackend [options]
      |
      | Options are:
      |   --driver-url <driverUrl>
      |   --executor-id <executorId>
      |   --hostname <hostname>
      |   --cores <cores>
      |   --app-id <appid>
      |   --worker-url <workerUrl>
      |   --user-class-path <url>
      |""".stripMargin)
    // scalastyle:on println
    System.exit(1)
  }

}

class CoarseGrainedExecutorBackend

看看这个类的代码:

/**
 * 这个伴生类和伴生对象
 * 是 driver 在 yarn 启动executor 的主要类
 * @param rpcEnv
 * @param driverUrl
 * @param executorId
 * @param hostname
 * @param cores
 * @param userClassPath
 * @param env
 */

private[spark] class CoarseGrainedExecutorBackend(
    override val rpcEnv: RpcEnv,
    driverUrl: String,
    executorId: String,
    hostname: String,
    cores: Int,
    userClassPath: Seq[URL],
    env: SparkEnv)
  extends ThreadSafeRpcEndpoint with ExecutorBackend with Logging { //注意这是一个 endPoint

  private[this] val stopping = new AtomicBoolean(false)
  var executor: Executor = null
  @volatile var driver: Option[RpcEndpointRef] = None

  // If this CoarseGrainedExecutorBackend is changed to support multiple threads, then this may need
  // to be changed so that we don't share the serializer instance across threads
  private[this] val ser: SerializerInstance = env.closureSerializer.newInstance()

  //初始化 方法
  override def onStart() {
    /**
     * val driverUrl = RpcEndpointAddress(
     * _sparkConf.get("spark.driver.host"),
     * _sparkConf.get("spark.driver.port").toInt,
     *       CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
     */
    //注意 driver url 注册的是 CoarseGrainedSchedulerBackend ref,所以下面的 对应到 driver 的处理的ref 就是 CoarseGrainedSchedulerBackend install 过的
    //CoarseGrainedSchedulerBackend的内部类的 DriverEndpoint
    logInfo("Connecting to driver: " + driverUrl)
    val x: Future[RpcEndpointRef] = rpcEnv.asyncSetupEndpointRefByURI(driverUrl)
    rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
      // This is a very fast action so we can use "ThreadUtils.sameThread"
      driver = Some(ref)
      //这里是向driver 注册 本executor,对应的处理逻辑 看 CoarseGrainedSchedulerBackend的内部类的 DriverEndpoint
      //RegisterExecutor 的序列化需要注意 他里面有  executorRef: RpcEndpointRef 所以就涉及到RpcEndpointRef的序列化,在这里的话是就是 NettyRpcEndpointRef 的序列化
      //可以具体的看看 NettyRpcEndpointRef 类的定义,只有 endpointAddress的参数 会被序列化 这个类也 重写了readObject 和 writeObject 方法
      ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
    }(ThreadUtils.sameThread).onComplete {
      // This is a very fast action so we can use "ThreadUtils.sameThread"
          //这里 异步等待 返回消息
      case Success(msg) =>
        // Always receive `true`. Just ignore it
      case Failure(e) => //注册不成功的话 则会 可以设置notifyDriver=true通知driver 且自己退出 exitCode=1
        exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
    }(ThreadUtils.sameThread)
  }

  def extractLogUrls: Map[String, String] = {
    val prefix = "SPARK_LOG_URL_"
    sys.env.filterKeys(_.startsWith(prefix))
      .map(e => (e._1.substring(prefix.length).toLowerCase(Locale.ROOT), e._2))
  }
//处理 driver的 one-way 类型消息
  override def receive: PartialFunction[Any, Unit] = {
        //在driver 注册成功的收到成功消息后的处理逻辑
    case RegisteredExecutor =>
      logInfo("Successfully registered with driver")
      try {
        executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false) //实例化 Executor对象
      } catch {
        case NonFatal(e) =>
          exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
      }

    //这里响应的是 在 driver 注册不成功 返回的具体的消息,此executor 退出,一般可以通知driver 有executor remove 的消息
    case RegisterExecutorFailed(message) =>
      exitExecutor(1, "Slave registration failed: " + message)

      //这里是 运行 driver 分配过来的任务的
    case LaunchTask(data) =>
      if (executor == null) {
        exitExecutor(1, "Received LaunchTask command but executor was null")
      } else {
        val taskDesc = TaskDescription.decode(data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
        executor.launchTask(this, taskDesc)
      }
    //这里是 处理 driver kill task的
    case KillTask(taskId, _, interruptThread, reason) =>
      if (executor == null) {
        exitExecutor(1, "Received KillTask command but executor was null")
      } else {
        executor.killTask(taskId, interruptThread, reason)
      }

    case StopExecutor =>
      stopping.set(true)
      logInfo("Driver commanded a shutdown")
      // Cannot shutdown here because an ack may need to be sent back to the caller. So send
      // a message to self to actually do the shutdown.
      self.send(Shutdown) //使用  override val rpcEnv: RpcEnv 发送 Shutdown 消息到driver

    case Shutdown =>
      stopping.set(true)
      new Thread("CoarseGrainedExecutorBackend-stop-executor") {
        override def run(): Unit = {
          // executor.stop() will call `SparkEnv.stop()` which waits until RpcEnv stops totally.
          // However, if `executor.stop()` runs in some thread of RpcEnv, RpcEnv won't be able to
          // stop until `executor.stop()` returns, which becomes a dead-lock (See SPARK-14180).
          // Therefore, we put this line in a new thread.
          executor.stop()
        }
      }.start()

    case UpdateDelegationTokens(tokenBytes) =>
      logInfo(s"Received tokens of ${tokenBytes.length} bytes")
      SparkHadoopUtil.get.addDelegationTokens(tokenBytes, env.conf)
  }

  override def onDisconnected(remoteAddress: RpcAddress): Unit = {
    if (stopping.get()) {
      logInfo(s"Driver from $remoteAddress disconnected during shutdown")
    } else if (driver.exists(_.address == remoteAddress)) {
      exitExecutor(1, s"Driver $remoteAddress disassociated! Shutting down.", null,
        notifyDriver = false)
    } else {
      logWarning(s"An unknown ($remoteAddress) driver disconnected.")
    }
  }

  //向driver 发送更新状态的 消息
  override def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer) {
    val msg = StatusUpdate(executorId, taskId, state, data)
    driver match {
      case Some(driverRef) => driverRef.send(msg)
      case None => logWarning(s"Drop $msg because has not yet connected to driver")
    }
  }

  /**
   * This function can be overloaded by other child classes to handle
   * executor exits differently. For e.g. when an executor goes down,
   * back-end may not want to take the parent process down.
   */
  protected def exitExecutor(code: Int,
                             reason: String,
                             throwable: Throwable = null,
                             notifyDriver: Boolean = true) = {
    val message = "Executor self-exiting due to : " + reason
    if (throwable != null) {
      logError(message, throwable)
    } else {
      logError(message)
    }

    if (notifyDriver && driver.nonEmpty) {
      driver.get.send(RemoveExecutor(executorId, new ExecutorLossReason(reason)))
    }

    System.exit(code)
  }
}

总结

Spark yarn-cluster 模式下,driver段有2个线程在运行:
1.一个是AppMaster 线程
2.一个是 用户线程
用户线程是在 AppMaster 线程 负责启动的,两个线程直接有交互的功能。
在AppMaster 线程 启动 用户线程之后,AppMaster 线程线程会阻塞等待用户线程完成SparkContext的初始化的完成,SparkContext的初始化包括 spark.driver.host,spark.driver.port(这个端口是在启动时候传入0端口,有系统随机分配的端口)等信息之后,唤醒AppMaster 线程,阻塞用户线程,AppMaster 线程完成registerAM 申请资源,再唤醒 用户线程的执行;最后AppMaster 线程等待用户线程完成。
3. executor的 启动 命令:

val commands = prefixEnv ++
      Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
      javaOpts ++
      Seq("org.apache.spark.executor.CoarseGrainedExecutorBackend",
        "--driver-url", masterAddress,
        "--executor-id", executorId,
        "--hostname", hostname,
        "--cores", executorCores.toString,
        "--app-id", appId) ++
      userClassPath ++
      Seq(
        s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
        s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")
发布了26 篇原创文章 · 获赞 1 · 访问量 993
展开阅读全文

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 数字20 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览