spark源码-任务提交流程之-3-ApplicationMaster

zdaiqing

已于 2022-07-28 16:21:19 修改

阅读量630

点赞数

分类专栏：大数据源码 Spark 文章标签： spark 大数据分布式

于 2022-07-25 17:42:20 首次发布

本文链接：https://blog.csdn.net/m0_37817767/article/details/125980041

版权

源码同时被 3 个专栏收录

29 篇文章 2 订阅

订阅专栏

大数据

26 篇文章 0 订阅

订阅专栏

Spark

25 篇文章 2 订阅

订阅专栏

ApplicationMaster

1.概述
2.main 主入口
3.执行流程
4.总结
5.参考资料

1.概述

在【spark源码-任务提交流程之YarnClusterApplication】中分析到，任务执行流程在执行YarnClusterApplication过程中，会将am上下文参数进行封装，然后将应用提交给RM；这其中就将amClass参数进行了设置；

针对yarn-cluster模式，amClass = bin/java org.apache.spark.deploy.yarn.ApplicationMaster；即将应用提交到RM后，RM会选择一台NM机器启动AM；

下面将针对AM的启动进行分析;

2.main 主入口

全路径：org.apache.spark.deploy.yarn.ApplicationMaster;

在ApplicationMaster启动后，做了如下3件事：

1、解析参数；

2、实例化AM

3、执行run

object ApplicationMaster extends Logging {
  def main(args: Array[String]): Unit = {
    SignalUtils.registerLogger(log)
    //解析参数
    val amArgs = new ApplicationMasterArguments(args)
    //实例化AM
    master = new ApplicationMaster(amArgs)
    //执行run
    System.exit(master.run())
  }
}

2.1.解析AM参数进行封装

根据参数定义应用程序入口点、jars、参数、属性文件等信息；

同时设置executor默认数量为2；

class ApplicationMasterArguments(val args: Array[String]) {
  //应用程序 jar 以及选项中包含的任何 jar		由参数--jar定义
  var userJar: String = null
  //应用程序的入口点
  var userClass: String = null
  var primaryPyFile: String = null
  var primaryRFile: String = null
  //应用程序参数
  var userArgs: Seq[String] = Nil
  //额外属性的文件
  var propertiesFile: String = null

  parseArgs(args.toList)

  private def parseArgs(inputArgs: List[String]): Unit = {
    val userArgsBuffer = new ArrayBuffer[String]()

    var args = inputArgs

    while (!args.isEmpty) {
      // --num-workers, --worker-memory, and --worker-cores are deprecated since 1.0,
      // the properties with executor in their names are preferred.
      args match {
        case ("--jar") :: value :: tail =>
          userJar = value
          args = tail

        case ("--class") :: value :: tail =>
          userClass = value
          args = tail

        case ("--primary-py-file") :: value :: tail =>
          primaryPyFile = value
          args = tail

        case ("--primary-r-file") :: value :: tail =>
          primaryRFile = value
          args = tail

        case ("--arg") :: value :: tail =>
          userArgsBuffer += value
          args = tail

        case ("--properties-file") :: value :: tail =>
          propertiesFile = value
          args = tail

        case _ =>
          printUsageAndExit(1, args)
      }
    }

    if (primaryPyFile != null && primaryRFile != null) {
      // scalastyle:off println
      System.err.println("Cannot have primary-py-file and primary-r-file at the same time")
      // scalastyle:on println
      System.exit(-1)
    }

    userArgs = userArgsBuffer.toList
  }

  def printUsageAndExit(exitCode: Int, unknownParam: Any = null) {
  	//..............
  }
}
object ApplicationMasterArguments {
  val DEFAULT_NUMBER_EXECUTORS = 2
}

2.2.实例化AM

在实例化AM过程中，进行了如下事件：

实例化sparkConf并将属性文件中的参数设置到sparkConf中；

将sparkConf中的参数设置到系统属性中；

实例化securityMgr；

实例化RM Client；

加载客户端设置的本地化文件列表

private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends Logging {

  // TODO: Currently, task to container is computed once (TaskSetManager) - which need not be
  // optimal as more containers are available. Might need to handle this better.

  private val isClusterMode = args.userClass != null
	//实例化spark config
  private val sparkConf = new SparkConf()
  if (args.propertiesFile != null) {
    //将配置的属性文件中的参数缓存到sparkConf中：通过hashMap实现缓存
    Utils.getPropertiesFromFile(args.propertiesFile).foreach { case (k, v) =>
      sparkConf.set(k, v)
    }
  }

  //根据sparkConf实例化securityMgr
  private val securityMgr = new SecurityManager(sparkConf实例化)

  private var metricsSystem: Option[MetricsSystem] = None

  // 将sparkConf中的参数设置为系统属性
  sparkConf.getAll.foreach { case (k, v) =>
    sys.props(k) = v
  }

  //根据sparkConf构建yarnConf
  private val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))

  //实例化类加载器
  private val userClassLoader = {
    val classpath = Client.getUserClasspath(sparkConf)
    val urls = classpath.map { entry =>
      new URL("file:" + new File(entry.getPath()).getAbsolutePath())
    }

    if (isClusterMode) {
      if (Client.isUserClassPathFirst(sparkConf, isDriver = true)) {
        new ChildFirstURLClassLoader(urls, Utils.getContextOrSparkClassLoader)
      } else {
        new MutableURLClassLoader(urls, Utils.getContextOrSparkClassLoader)
      }
    } else {
      new MutableURLClassLoader(urls, Utils.getContextOrSparkClassLoader)
    }
  }

  //令牌更新器
  private val credentialRenewer: Option[AMCredentialRenewer] = sparkConf.get(KEYTAB).map { _ =>
    new AMCredentialRenewer(sparkConf, yarnConf)
  }

  //使用UGI的用户作为当前ApplicationMaster的运行用
  private val ugi = credentialRenewer match {
    case Some(cr) =>
      // Set the context class loader so that the token renewer has access to jars distributed
      // by the user.
      val currentLoader = Thread.currentThread().getContextClassLoader()
      Thread.currentThread().setContextClassLoader(userClassLoader)
      try {
        cr.start()
      } finally {
        Thread.currentThread().setContextClassLoader(currentLoader)
      }

    case _ =>
      SparkHadoopUtil.get.createSparkUser()
  }

  //实例化RM
  private val client = doAsUser { new YarnRMClient() }

  //默认为executor数量的两倍(如果启用动态分配，则为最大executor数量的两倍)，最小为3
  private val maxNumExecutorFailures = {
    val effectiveNumExecutors =
      if (Utils.isDynamicAllocationEnabled(sparkConf)) {
        sparkConf.get(DYN_ALLOCATION_MAX_EXECUTORS)
      } else {
        sparkConf.get(EXECUTOR_INSTANCES).getOrElse(0)
      }
    // By default, effectiveNumExecutors is Int.MaxValue if dynamic allocation is enabled. We need
    // avoid the integer overflow here.
    val defaultMaxNumExecutorFailures = math.max(3,
      if (effectiveNumExecutors > Int.MaxValue / 2) Int.MaxValue else (2 * effectiveNumExecutors))

    sparkConf.get(MAX_EXECUTOR_FAILURES).getOrElse(defaultMaxNumExecutorFailures)
  }

  @volatile private var exitCode = 0
  @volatile private var unregistered = false
  @volatile private var finished = false
  @volatile private var finalStatus = getDefaultFinalStatus
  @volatile private var finalMsg: String = ""
  @volatile private var userClassThread: Thread = _

  @volatile private var reporterThread: Thread = _
  @volatile private var allocator: YarnAllocator = _

  // A flag to check whether user has initialized spark context
  @volatile private var registered = false

  // Lock for controlling the allocator (heartbeat) thread.
  private val allocatorLock = new Object()

  // 心跳间隔
  private val heartbeatInterval = {
    // Ensure that progress is sent before YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS elapses.
    val expiryInterval = yarnConf.getInt(YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS, 120000)
    math.max(0, math.min(expiryInterval / 2, sparkConf.get(RM_HEARTBEAT_INTERVAL)))
  }

  // 分配程序轮询之前的初始等待间隔，以允许在执行程序被请求时更快地上升
  private val initialAllocationInterval = math.min(heartbeatInterval,
    sparkConf.get(INITIAL_HEARTBEAT_INTERVAL))

  // 分配程序轮询之前的下一个等待间隔
  private var nextAllocationInterval = initialAllocationInterval

  private var rpcEnv: RpcEnv = null

  // 在集群模式下，用于告诉AM用户的SparkContext已经初始化.
  private val sparkContextPromise = Promise[SparkContext]()

  // 加载客户端设置的本地化文件列表。它在启动执行器时使用，并在这里加载，以便这些配置在集群模式下不会污染Web UI的环境页面
  private val localResources = doAsUser {
    logInfo("Preparing Local resources")
    val resources = HashMap[String, LocalResource]()

    def setupDistributedCache(
        file: String,
        rtype: LocalResourceType,
        timestamp: String,
        size: String,
        vis: String): Unit = {
      val uri = new URI(file)
      val amJarRsrc = Records.newRecord(classOf[LocalResource])
      amJarRsrc.setType(rtype)
      amJarRsrc.setVisibility(LocalResourceVisibility.valueOf(vis))
      amJarRsrc.setResource(ConverterUtils.getYarnUrlFromURI(uri))
      amJarRsrc.setTimestamp(timestamp.toLong)
      amJarRsrc.setSize(size.toLong)

      val fileName = Option(uri.getFragment()).getOrElse(new Path(uri).getName())
      resources(fileName) = amJarRsrc
    }

    val distFiles = sparkConf.get(CACHED_FILES)
    val fileSizes = sparkConf.get(CACHED_FILES_SIZES)
    val timeStamps = sparkConf.get(CACHED_FILES_TIMESTAMPS)
    val visibilities = sparkConf.get(CACHED_FILES_VISIBILITIES)
    val resTypes = sparkConf.get(CACHED_FILES_TYPES)

    for (i <- 0 to distFiles.size - 1) {
      val resType = LocalResourceType.valueOf(resTypes(i))
      setupDistributedCache(distFiles(i), resType, timeStamps(i).toString, fileSizes(i).toString,
      visibilities(i))
    }

    // Distribute the conf archive to executors.
    sparkConf.get(CACHED_CONF_ARCHIVE).foreach { path =>
      val uri = new URI(path)
      val fs = FileSystem.get(uri, yarnConf)
      val status = fs.getFileStatus(new Path(uri))
      // SPARK-16080: Make sure to use the correct name for the destination when distributing the
      // conf archive to executors.
      val destUri = new URI(uri.getScheme(), uri.getRawSchemeSpecificPart(),
        Client.LOCALIZED_CONF_DIR)
      setupDistributedCache(destUri.toString(), LocalResourceType.ARCHIVE,
        status.getModificationTime().toString, status.getLen.toString,
        LocalResourceVisibility.PRIVATE.name())
    }

    // Clean up the configuration so it doesn't show up in the Web UI (since it's really noisy).
    CACHE_CONFIGS.foreach { e =>
      sparkConf.remove(e)
      sys.props.remove(e.key)
    }

    resources.toMap
  }
}

2.3.执行AM的run方法

针对集群模式，设置系统属性、构建spark调用上下文、调用runDriver方法

private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends Logging {
	final def run(): Int = {
    doAsUser {
      //执行runImpl实现方法
      runImpl()
    }
    exitCode
  }

  private def runImpl(): Unit = {
    try {
      val appAttemptId = client.getAttemptId()

      var attemptID: Option[String] = None

      //设置集群模式的系统属性
      if (isClusterMode) {
        // Set the web ui port to be ephemeral for yarn so we don't conflict with
        // other spark processes running on the same box
        System.setProperty("spark.ui.port", "0")

        // Set the master and deploy mode property to match the requested mode.
        System.setProperty("spark.master", "yarn")
        System.setProperty("spark.submit.deployMode", "cluster")

        // Set this internal configuration if it is running on cluster mode, this
        // configuration will be checked in SparkContext to avoid misuse of yarn cluster mode.
        System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString())

        attemptID = Option(appAttemptId.getAttemptId.toString)
      }

      //在HDFS和Yarn上建立Spark调用上下文。上下文将由传入的参数构造
      new CallerContext(
        "APPMASTER", sparkConf.get(APP_CALLER_CONTEXT),
        Option(appAttemptId.getApplicationId.toString), attemptID).setCurrentContext()

      logInfo("ApplicationAttemptId: " + appAttemptId)

      // This shutdown hook should run *after* the SparkContext is shut down.
      val priority = ShutdownHookManager.SPARK_CONTEXT_SHUTDOWN_PRIORITY - 1
      ShutdownHookManager.addShutdownHook(priority) { () =>
        val maxAppAttempts = client.getMaxRegAttempts(sparkConf, yarnConf)
        val isLastAttempt = client.getAttemptId().getAttemptId() >= maxAppAttempts

        if (!finished) {
          // The default state of ApplicationMaster is failed if it is invoked by shut down hook.
          // This behavior is different compared to 1.x version.
          // If user application is exited ahead of time by calling System.exit(N), here mark
          // this application as failed with EXIT_EARLY. For a good shutdown, user shouldn't call
          // System.exit(0) to terminate the application.
          finish(finalStatus,
            ApplicationMaster.EXIT_EARLY,
            "Shutdown hook called before final status was reported.")
        }

        if (!unregistered) {
          // we only want to unregister if we don't want the RM to retry
          if (finalStatus == FinalApplicationStatus.SUCCEEDED || isLastAttempt) {
            unregister(finalStatus, finalMsg)
            cleanupStagingDir()
          }
        }
      }

      if (isClusterMode) {
        //yarn-cluster模式，执行runDriver
        runDriver()
      } else {
        runExecutorLauncher()
      }
    } catch {
      case e: Exception =>
        // catch everything else if not specifically handled
        logError("Uncaught exception: ", e)
        finish(FinalApplicationStatus.FAILED,
          ApplicationMaster.EXIT_UNCAUGHT_EXCEPTION,
          "Uncaught exception: " + StringUtils.stringifyException(e))
    } finally {
      try {
        metricsSystem.foreach { ms =>
          ms.report()
          ms.stop()
        }
      } catch {
        case e: Exception =>
          logWarning("Exception during stopping of the metric system: ", e)
      }
    }
  }
  
  private def doAsUser[T](fn: => T): T = {
    ugi.doAs(new PrivilegedExceptionAction[T]() {
      override def run: T = fn
    })
  }
}

2.3.1.runDriver

private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends Logging {
  private def runDriver(): Unit = {
    addAmIpFilter(None)
    //启动用户程序，返回一个线程，即启动driver线程
    userClassThread = startUserApplication()

    // This a bit hacky, but we need to wait until the spark.driver.port property has
    // been set by the Thread executing the user class.
    logInfo("Waiting for spark context initialization...")
    val totalWaitTime = sparkConf.get(AM_MAX_WAIT_TIME)
    try {
      val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
        Duration(totalWaitTime, TimeUnit.MILLISECONDS))
      if (sc != null) {
        rpcEnv = sc.env.rpcEnv

        val userConf = sc.getConf
        val host = userConf.get("spark.driver.host")
        val port = userConf.get("spark.driver.port").toInt
        //注册AM：AM向RM注册
        registerAM(host, port, userConf, sc.ui.map(_.webUrl))

        val driverRef = rpcEnv.setupEndpointRef(
          RpcAddress(host, port),
          YarnSchedulerBackend.ENDPOINT_NAME)
        //申请资源
        createAllocator(driverRef, userConf)
      } else {
        // Sanity check; should never happen in normal operation, since sc should only be null
        // if the user app did not create a SparkContext.
        throw new IllegalStateException("User did not initialize spark context!")
      }
      resumeDriver()
      userClassThread.join()
    } catch {
      case e: SparkException if e.getCause().isInstanceOf[TimeoutException] =>
        logError(
          s"SparkContext did not initialize after waiting for $totalWaitTime ms. " +
           "Please check earlier log output for errors. Failing the application.")
        finish(FinalApplicationStatus.FAILED,
          ApplicationMaster.EXIT_SC_NOT_INITED,
          "Timed out waiting for SparkContext.")
    } finally {
      resumeDriver()
    }
  }
}

2.3.1.1.startUserApplication 启动一个driver线程

driver是一个执行用户类代码的线程名称；

通过spark-submit提交参数–class指定的类路径确定用户类；

通过反射获取该用户类的main方法；

创建线程执行该main方法；

driver线程是am进程的子线程；

private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends Logging {
	private def startUserApplication(): Thread = {
    logInfo("Starting the user application in a separate Thread")

    //解析用户参数
    var userArgs = args.userArgs
    if (args.primaryPyFile != null && args.primaryPyFile.endsWith(".py")) {
      // When running pyspark, the app is run using PythonRunner. The second argument is the list
      // of files to add to PYTHONPATH, which Client.scala already handles, so it's empty.
      userArgs = Seq(args.primaryPyFile, "") ++ userArgs
    }
    if (args.primaryRFile != null && args.primaryRFile.endsWith(".R")) {
      // TODO(davies): add R dependencies here
    }

    //通过反射获取用户类main方法
    val mainMethod = userClassLoader.loadClass(args.userClass)
      .getMethod("main", classOf[Array[String]])

    //创建新线程
    val userThread = new Thread {
      override def run() {
        try {
          if (!Modifier.isStatic(mainMethod.getModifiers)) {
            logError(s"Could not find static main method in object ${args.userClass}")
            finish(FinalApplicationStatus.FAILED, ApplicationMaster.EXIT_EXCEPTION_USER_CLASS)
          } else {
            //通过反射实现main方法调用
            mainMethod.invoke(null, userArgs.toArray)
            finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS)
            logDebug("Done running user class")
          }
        } catch {
          case e: InvocationTargetException =>
            e.getCause match {
              case _: InterruptedException =>
                // Reporter thread can interrupt to stop user class
              case SparkUserAppException(exitCode) =>
                val msg = s"User application exited with status $exitCode"
                logError(msg)
                finish(FinalApplicationStatus.FAILED, exitCode, msg)
              case cause: Throwable =>
                logError("User class threw exception: " + cause, cause)
                finish(FinalApplicationStatus.FAILED,
                  ApplicationMaster.EXIT_EXCEPTION_USER_CLASS,
                  "User class threw exception: " + StringUtils.stringifyException(cause))
            }
            sparkContextPromise.tryFailure(e.getCause())
        } finally {
          // Notify the thread waiting for the SparkContext, in case the application did not
          // instantiate one. This will do nothing when the user code instantiates a SparkContext
          // (with the correct master), or when the user code throws an exception (due to the
          // tryFailure above).
          sparkContextPromise.trySuccess(null)
        }
      }
    }
    userThread.setContextClassLoader(userClassLoader)
    //设置线程名为driver：driver线程为AM进程下的一个子线程，该线程执行用户类的main方法
    userThread.setName("Driver")
    //启动线程：执行线程run方法，及调用执行用户类定义的main方法
    userThread.start()
    //返回driver线程
    userThread
  }
}

2.3.1.2.向RM注册AM

通过RPC方式向RM注册AM；

通过将AM的host、port、对外提供的追踪的web url注册到RM并得到RM响应，完成注册流程；

private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends Logging {
	private def registerAM(
      host: String,	//dirver的ip
      port: Int,		//driver线程的port
      _sparkConf: SparkConf,
      uiAddress: Option[String]): Unit = {
    val appId = client.getAttemptId().getApplicationId().toString()
    val attemptId = client.getAttemptId().getAttemptId().toString()
    val historyAddress = ApplicationMaster
      .getHistoryServerAddress(_sparkConf, yarnConf, appId, attemptId)

    //调用YarnRMClient的register方法进行注册
    client.register(host, port, yarnConf, _sparkConf, uiAddress, historyAddress)
    registered = true
  }
}

private[spark] class YarnRMClient extends Logging {
  def register(
      driverHost: String,
      driverPort: Int,
      conf: YarnConfiguration,
      sparkConf: SparkConf,
      uiAddress: Option[String],
      uiHistoryAddress: String): Unit = {
    //构建AM与RM通信客户端并启动
    amClient = AMRMClient.createAMRMClient()
    amClient.init(conf)
    amClient.start()
    this.uiHistoryAddress = uiHistoryAddress

    val trackingUrl = uiAddress.getOrElse {
      if (sparkConf.get(ALLOW_HISTORY_SERVER_TRACKING_URL)) uiHistoryAddress else ""
    }

    logInfo("Registering the ApplicationMaster")
    synchronized {
      //注册AM
      amClient.registerApplicationMaster(driverHost, driverPort, trackingUrl)
      registered = true
    }
  }
}

//AMRMClientImpl是AMRMClient的实现之一
public class AMRMClientImpl<T extends ContainerRequest> extends AMRMClient<T> {
  //注册AM：将AM的host、port、appTrackingUrl信息绑定到AMRMClient中，准备注册环境
  public RegisterApplicationMasterResponse registerApplicationMaster(String appHostName, int appHostPort, String appTrackingUrl) throws YarnException, IOException {
        this.appHostName = appHostName;
        this.appHostPort = appHostPort;
        this.appTrackingUrl = appTrackingUrl;
        Preconditions.checkArgument(appHostName != null, "The host name should not be null");
        Preconditions.checkArgument(appHostPort >= -1, "Port number of the host should be any integers larger than or equal to -1");
    		//环境准备好后，调用具体注册逻辑
        return this.registerApplicationMaster();
    }

  	//具体AM注册逻辑
    private RegisterApplicationMasterResponse registerApplicationMaster() throws YarnException, IOException {
      	//注册信息封装
        RegisterApplicationMasterRequest request = RegisterApplicationMasterRequest.newInstance(this.appHostName, this.appHostPort, this.appTrackingUrl);
      	//RPC方式注册，封装响应信息
        RegisterApplicationMasterResponse response = this.rmClient.registerApplicationMaster(request);
        synchronized(this) {
            this.lastResponseId = 0;
            if (!response.getNMTokensFromPreviousAttempts().isEmpty()) {
                this.populateNMTokens(response.getNMTokensFromPreviousAttempts());
            }

            return response;
        }
    }
}

2.3.1.2.1 RegisterApplicationMasterRequest 注册请求信息封装

主要封装host、port、web url；

public abstract class RegisterApplicationMasterRequest {
  public static RegisterApplicationMasterRequest newInstance(String host, int port, String trackingUrl) {
        RegisterApplicationMasterRequest request = (RegisterApplicationMasterRequest)Records.newRecord(RegisterApplicationMasterRequest.class);
        //ApplicationMaster启动所在的节点的host
    		request.setHost(host);
        //ApplicationMaster本次启动对外rpc的端口号
    		request.setRpcPort(port);
        //ApplicationMaster对外提供的追踪的web url，用户可以通过该url查看应用程序执行状态
    		request.setTrackingUrl(trackingUrl);
        return request;
    }
}

2.3.1.2.2 egisterApplicationMasterResponse 注册响应信息封装

响应重要信息：最大可申请的单个Container的占用的资源量、应用程序访问控制列表；

public class RegisterApplicationMasterResponsePBImpl extends RegisterApplicationMasterResponse {
    Builder builder = null;
    boolean viaProto = false;
  	//最大可申请的单个Container的占用的资源量
    private Resource maximumResourceCapability;
  	//应用程序访问控制列表
    private Map<ApplicationAccessType, String> applicationACLS = null;
    private List<Container> containersFromPreviousAttempts = null;
    private List<NMToken> nmTokens = null;
    private EnumSet<SchedulerResourceTypes> schedulerResourceTypes = null;
}

2.3.1.3.AM向RM申请资源

private[spark] class ApplicationMaster(args: ApplicationMasterArguments) extends Logging {
	private def createAllocator(driverRef: RpcEndpointRef, _sparkConf: SparkConf): Unit = {
    val appId = client.getAttemptId().getApplicationId().toString()
    val driverUrl = RpcEndpointAddress(driverRef.address.host, driverRef.address.port,
      CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString

    // Before we initialize the allocator, let's log the information about how executors will
    // be run up front, to avoid printing this out for every single executor being launched.
    // Use placeholders for information that changes such as executor IDs.
    logInfo {
      val executorMemory = _sparkConf.get(EXECUTOR_MEMORY).toInt
      val executorCores = _sparkConf.get(EXECUTOR_CORES)
      val dummyRunner = new ExecutorRunnable(None, yarnConf, _sparkConf, driverUrl, "<executorId>",
        "<hostname>", executorMemory, executorCores, appId, securityMgr, localResources)
      dummyRunner.launchContextDebugInfo()
    }

    //创建分配器
    allocator = client.createAllocator(
      yarnConf,
      _sparkConf,
      driverUrl,
      driverRef,
      securityMgr,
      localResources)

    credentialRenewer.foreach(_.setDriverRef(driverRef))

    // Initialize the AM endpoint *after* the allocator has been initialized. This ensures
    // that when the driver sends an initial executor request (e.g. after an AM restart),
    // the allocator is ready to service requests.
    rpcEnv.setupEndpoint("YarnAM", new AMEndpoint(rpcEnv, driverRef))

    //分配器分配资源
    allocator.allocateResources()
    val ms = MetricsSystem.createMetricsSystem("applicationMaster", sparkConf, securityMgr)
    val prefix = _sparkConf.get(YARN_METRICS_NAMESPACE).getOrElse(appId)
    ms.registerSource(new ApplicationMasterSource(prefix, allocator))
    // do not register static sources in this case as per SPARK-25277
    ms.start(false)
    metricsSystem = Some(ms)
    reporterThread = launchReporterThread()
  }
}

2.3.1.3.1.createAllocator创建分配器

private[spark] class YarnRMClient extends Logging {
  def createAllocator(
      conf: YarnConfiguration,
      sparkConf: SparkConf,
      driverUrl: String,
      driverRef: RpcEndpointRef,
      securityMgr: SecurityManager,
      localResources: Map[String, LocalResource]): YarnAllocator = {
    require(registered, "Must register AM before creating allocator.")
    //通过实例化YarnAllocator，由YarnAllocator向RM申请container资源
    new YarnAllocator(driverUrl, driverRef, conf, sparkConf, amClient, getAttemptId(), securityMgr,
      localResources, new SparkRackResolver())
  }
}

2.3.1.3.2.allocateResources分配器申请、分配资源

private[yarn] class YarnAllocator(
    driverUrl: String,
    driverRef: RpcEndpointRef,
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    amClient: AMRMClient[ContainerRequest],
    appAttemptId: ApplicationAttemptId,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource],
    resolver: SparkRackResolver,
    clock: Clock = new SystemClock)
  extends Logging {
    
    def allocateResources(): Unit = synchronized {
    //更新容器请求列表：根据当前运行的executor数量和要请求的executor总数，来同步更新向ResourceManager请求的containers数。
    updateResourceRequests()

    val progressIndicator = 0.1f
    // 向ResourceManager申请containers资源，并返回分配响应
    val allocateResponse = amClient.allocate(progressIndicator)
		//获取分配的containers列表
    val allocatedContainers = allocateResponse.getAllocatedContainers()
    //黑名单节点跟踪
    allocatorBlacklistTracker.setNumClusterNodes(allocateResponse.getNumClusterNodes)

    //如果分配的containers数大于0，就处理这些containers
    if (allocatedContainers.size > 0) {
      logDebug(("Allocated containers: %d. Current executor count: %d. " +
        "Launching executor count: %d. Cluster resources: %s.")
        .format(
          allocatedContainers.size,
          runningExecutors.size,
          numExecutorsStarting.get,
          allocateResponse.getAvailableResources))
			//处理从ResourceManager获取的containers，并在containers中启动executor
      handleAllocatedContainers(allocatedContainers.asScala)
    }

    //获取已使用containers列表，也可能是出错的containers
    val completedContainers = allocateResponse.getCompletedContainersStatuses()
    if (completedContainers.size > 0) {
      logDebug("Completed %d containers".format(completedContainers.size))
      //处理使用过的containers
      processCompletedContainers(completedContainers.asScala)
      logDebug("Finished processing %d completed containers. Current running executor count: %d."
        .format(completedContainers.size, runningExecutors.size))
    }
  }
}

2.3.1.3.2.1.updateResourceRequests 更新容器请求

根据当前运行的executor数量和要请求的executor总数，来同步更新向ResourceManager请求的containers数。

根据每个节点的task，将挂起的容器请求列表进行分组：

本地可匹配请求列表；

本地不匹配请求列表；

非本地请求列表；

对于那些本地匹配的请求列表之外的两种请求，会进行取消并重新发起请求，然后，根据container放置策略来重新计算本地性，以最大化任务的本地性执行；

关注容器请求本地性，尽量提高容器请求本地性，对不不能满足本地性的请求，将其取消；

当已存在的executor数量多于需要的executor数量时，从挂起的容器请求中移除多出数量的容器请求；

private[yarn] class YarnAllocator(
    driverUrl: String,
    driverRef: RpcEndpointRef,
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    amClient: AMRMClient[ContainerRequest],
    appAttemptId: ApplicationAttemptId,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource],
    resolver: SparkRackResolver,
    clock: Clock = new SystemClock)
  extends Logging {
    
	def updateResourceRequests(): Unit = {
    // 挂起的容器请求序列
    val pendingAllocate = getPendingAllocate
    val numPendingAllocate = pendingAllocate.size
    //计算缺失的executor数量
    val missing = targetNumExecutors - numPendingAllocate -
      numExecutorsStarting.get - runningExecutors.size
    logDebug(s"Updating resource requests, target: $targetNumExecutors, " +
      s"pending: $numPendingAllocate, running: ${runningExecutors.size}, " +
      s"executorsStarting: ${numExecutorsStarting.get}")

    // 挂起的容器请求分类：
    // localRequests: 本地可匹配请求列表
    // staleRequests:	本地不匹配请求列表
    // anyHostRequests:	非本地请求列表；
    val (localRequests, staleRequests, anyHostRequests) = splitPendingAllocationsByLocality(
      hostToLocalTaskCounts, pendingAllocate)

    if (missing > 0) {
      logInfo(s"Will request $missing executor container(s), each with " +
        s"${resource.getVirtualCores} core(s) and " +
        s"${resource.getMemory} MB memory (including $memoryOverhead MB of overhead)")

      // 取消本地不匹配的请求列表
      staleRequests.foreach { stale =>
        amClient.removeContainerRequest(stale)
      }
      val cancelledContainers = staleRequests.size
      if (cancelledContainers > 0) {
        logInfo(s"Canceled $cancelledContainers container request(s) (locality no longer needed)")
      }

      // 计算可用容器数量
      val availableContainers = missing + cancelledContainers

      // 计算潜在容器数了：将非本地请求包含进来，尽量提高本地化程度；
      val potentialContainers = availableContainers + anyHostRequests.size

      //重新计算每个container的节点本地性（node locality）和机架本地性（rack locality，当前机架其他节点）
      val containerLocalityPreferences = containerPlacementStrategy.localityOfRequestedContainers(
        potentialContainers, numLocalityAwareTasks, hostToLocalTaskCounts,
          allocatedHostToContainersMap, localRequests)

      根据计算的containers本地性，重新实例化container请求
      val newLocalityRequests = new mutable.ArrayBuffer[ContainerRequest]
      containerLocalityPreferences.foreach {
        case ContainerLocalityPreferences(nodes, racks) if nodes != null =>
          newLocalityRequests += createContainerRequest(resource, nodes, racks)
        case _ =>
      }

      //当前可用的containers可以满足所有新的container请求
      if (availableContainers >= newLocalityRequests.size) {
        for (i <- 0 until (availableContainers - newLocalityRequests.size)) {
          newLocalityRequests += createContainerRequest(resource, null, null)
        }
      } 
      //当前可用的containers不能满足所有新的container请求，不能满足的请求会在其他机架的节点上放置container，所以会取消这些请求，来获取更好的本地性
      else {
        val numToCancel = newLocalityRequests.size - availableContainers
        anyHostRequests.slice(0, numToCancel).foreach { nonLocal =>
          amClient.removeContainerRequest(nonLocal)
        }
        if (numToCancel > 0) {
          logInfo(s"Canceled $numToCancel unlocalized container requests to resubmit with locality")
        }
      }

      //重新添加container请求:将请求传递给RM
      newLocalityRequests.foreach { request =>
        amClient.addContainerRequest(request)
      }

      if (log.isInfoEnabled()) {
        val (localized, anyHost) = newLocalityRequests.partition(_.getNodes() != null)
        if (anyHost.nonEmpty) {
          logInfo(s"Submitted ${anyHost.size} unlocalized container requests.")
        }
        localized.foreach { request =>
          logInfo(s"Submitted container request for host ${hostStr(request)}.")
        }
      }
    } 
    //挂起的+启动的+运行中的executor数量多样需要的executor数量，则从挂起的请求中取消多出的请求；
    else if (numPendingAllocate > 0 && missing < 0) {
      val numToCancel = math.min(numPendingAllocate, -missing)
      logInfo(s"Canceling requests for $numToCancel executor container(s) to have a new desired " +
        s"total $targetNumExecutors executors.")
      // cancel pending allocate requests by taking locality preference into account
      val cancelRequests = (staleRequests ++ anyHostRequests ++ localRequests).take(numToCancel)
      cancelRequests.foreach(amClient.removeContainerRequest)
    }
  }
}

2.3.1.3.2.2.handleAllocatedContainers 处理分配的资源

根据匹配规则，在节点、机架、其他机架3个场景下选择可用容器；

在可用容器中启动executor

private[yarn] class YarnAllocator(
    driverUrl: String,
    driverRef: RpcEndpointRef,
    conf: YarnConfiguration,
    sparkConf: SparkConf,
    amClient: AMRMClient[ContainerRequest],
    appAttemptId: ApplicationAttemptId,
    securityMgr: SecurityManager,
    localResources: Map[String, LocalResource],
    resolver: SparkRackResolver,
    clock: Clock = new SystemClock)
  extends Logging {
  
  def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
    //可用容器列表
    val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)

    // 根据节点（host）结合匹配规则，选择可用容器
    val remainingAfterHostMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- allocatedContainers) {
      matchContainerToRequest(allocatedContainer, allocatedContainer.getNodeId.getHost,
        containersToUse, remainingAfterHostMatches)
    }

    // 根据机架结合匹配规则，选择可用容器：单独线程完成
    val remainingAfterRackMatches = new ArrayBuffer[Container]
    if (remainingAfterHostMatches.nonEmpty) {
      var exception: Option[Throwable] = None
      val thread = new Thread("spark-rack-resolver") {
        override def run(): Unit = {
          try {
            for (allocatedContainer <- remainingAfterHostMatches) {
              val rack = resolver.resolve(conf, allocatedContainer.getNodeId.getHost)
              matchContainerToRequest(allocatedContainer, rack, containersToUse,
                remainingAfterRackMatches)
            }
          } catch {
            case e: Throwable =>
              exception = Some(e)
          }
        }
      }
      thread.setDaemon(true)
      thread.start()

      try {
        thread.join()
      } catch {
        case e: InterruptedException =>
          thread.interrupt()
          throw e
      }

      if (exception.isDefined) {
        throw exception.get
      }
    }

    // 非本节点和本机架的容器，再次匹配
    val remainingAfterOffRackMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- remainingAfterRackMatches) {
      matchContainerToRequest(allocatedContainer, ANY_HOST, containersToUse,
        remainingAfterOffRackMatches)
    }

    //经过本节点、本机架、非本节点和本机架 3中匹配过滤后，还未匹配上的，内部释放
    if (!remainingAfterOffRackMatches.isEmpty) {
      logDebug(s"Releasing ${remainingAfterOffRackMatches.size} unneeded containers that were " +
        s"allocated to us")
      for (container <- remainingAfterOffRackMatches) {
        internalReleaseContainer(container)
      }
    }
		//在分配的containers中启动executors
    runAllocatedContainers(containersToUse)

    logInfo("Received %d containers from YARN, launching executors on %d of them."
      .format(allocatedContainers.size, containersToUse.size))
  }
}

3.执行流程

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GAjslApe-1658742053147)(/Users/daiqing/Library/Application Support/typora-user-images/image-20220725173937925.png)]