Spark源码2.4.0（二）-- on yarn启动

最新推荐文章于 2024-07-24 08:30:24 发布

~花卷~

最新推荐文章于 2024-07-24 08:30:24 发布

阅读量270

点赞数

分类专栏： spark 文章标签： spark

本文链接：https://blog.csdn.net/sinat_26781639/article/details/104113331

版权

spark 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

spark提交有三种模式
看下其中的 on yarn cluster：
在这里插入图片描述一、在sparksubmit这个类中启动时回去初始化运行环境（prepareSubmitEnvironment）

  private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
    // 环境相关，部署模式提交模式
    val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
    ...

该方法返回childMainClass，就是后面对应映射启动的client类，对应下面代码分析：

// spark集群模式下，childMainClass对应 RestSubmissionClientApp 或者 ClientApp这两个类
    if (args.isStandaloneCluster) {
      if (args.useRest) {
        childMainClass = REST_CLUSTER_SUBMIT_CLASS
        childArgs += (args.primaryResource, args.mainClass)
      } else {
        // In legacy standalone cluster mode, use Client as a wrapper around the user class
        childMainClass = STANDALONE_CLUSTER_SUBMIT_CLASS
        if (args.supervise) { childArgs += "--supervise" }
        Option(args.driverMemory).foreach { m => childArgs += ("--memory", m) }
        Option(args.driverCores).foreach { c => childArgs += ("--cores", c) }
        childArgs += "launch"
        childArgs += (args.master, args.primaryResource, args.mainClass)
      }
      if (args.childArgs != null) {
        childArgs ++= args.childArgs
      }
    }
...
// yarn cluster模式下 对应  org.apache.spark.deploy.yarn.YarnClusterApplication
if (isYarnCluster) {
      childMainClass = YARN_CLUSTER_SUBMIT_CLASS
      if (args.isPython) {
        childArgs += ("--primary-py-file", args.primaryResource)
        childArgs += ("--class", "org.apache.spark.deploy.PythonRunner")
      } else if (args.isR) {
        val mainFile = new Path(args.primaryResource).getName
        childArgs += ("--primary-r-file", mainFile)
        childArgs += ("--class", "org.apache.spark.deploy.RRunner")
      } else {
        if (args.primaryResource != SparkLauncher.NO_RESOURCE) {
          childArgs += ("--jar", args.primaryResource)
        }
        childArgs += ("--class", args.mainClass)
      }
      if (args.childArgs != null) {
        args.childArgs.foreach { arg => childArgs += ("--arg", arg) }
      }
    }

再知道是哪个client类后就做映射启动:

    mainClass = Utils.classForName(childMainClass)
    ...
    // 这里的app就是返回的对应映射类
    val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
      mainClass.newInstance().asInstanceOf[SparkApplication]
    } else {
      // SPARK-4170
      if (classOf[scala.App].isAssignableFrom(mainClass)) {
        logWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.")
      }
      new JavaMainApplication(mainClass)
    }
    ...
        //  开启， 调用子类中重写的start()方法  进行driver的注册
    try {
      app.start(childArgs.toArray, sparkConf)
    } catch {
      case t: Throwable =>
        throw findCause(t)
    }

上一篇已经讲了spark集群模式下的client.start。对于土办法driver的创建注册，该篇看下对应的yarn-cluster模式下的启动。对应代码：YarnClusterApplication

private[spark] class YarnClusterApplication extends SparkApplication {

  override def start(args: Array[String], conf: SparkConf): Unit = {
    // SparkSubmit would use yarn cache to distribute files & jars in yarn mode,
    // so remove them from sparkConf here for yarn mode.
    conf.remove("spark.jars")
    conf.remove("spark.files")
	// 入口run
    new Client(new ClientArguments(args), conf).run()
  }

}
-->
  def run(): Unit = {
   // 向yarn提交application
    this.appId = submitApplication()

之后获取yarn的提交信息，然后进行提交：

 def submitApplication(): ApplicationId = {
    var appId: ApplicationId = null
    try {
      launcherBackend.connect()
      yarnClient.init(hadoopConf)
      yarnClient.start()

      logInfo("Requesting a new application from cluster with %d NodeManagers"
        .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))

      // Get a new application from our RM
      // 申请application
      val newApp = yarnClient.createApplication()
      val newAppResponse = newApp.getNewApplicationResponse()
      appId = newAppResponse.getApplicationId()

      new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT),
        Option(appId.toString)).setCurrentContext()

      // Verify whether the cluster has enough resources for our AM
      // 资源校验
      verifyClusterResources(newAppResponse)

      // Set up the appropriate contexts to launch our AM
      // 设置amcontainer的参数信息
      val containerContext = createContainerLaunchContext(newAppResponse)
      // 设置appcontext最后用来用yarn提交的信息
      val appContext = createApplicationSubmissionContext(newApp, containerContext)

      // Finally, submit and monitor the application
      logInfo(s"Submitting application $appId to ResourceManager")
      // 启动AM,向yarn提交任务
      yarnClient.submitApplication(appContext)
      launcherBackend.setAppId(appId.toString)
      reportLauncherState(SparkAppHandle.State.SUBMITTED)

      appId
    } catch {
      case e: Throwable =>
        if (appId != null) {
          cleanupStagingDir(appId)
        }
        throw e
    }
  }

观察下向yarn提交的启动类信息 val containerContext = createContainerLaunchContext(newAppResponse) ：

    val amClass =
      if (isClusterMode) {
        Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
      } else {
        Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
      }

上面ApplicationMaster就是AM启动的类。

~花卷~

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Spark源码2.4.0（二）-- on yarn启动

spark提交有三种模式看下其中的 on yarn cluster：一、在sparksubmit这个类中启动时回去初始化运行环境（prepareSubmitEnvironment） private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = { // 环境相关，部署模式提交模式 va...
复制链接

扫一扫

专栏目录