Spark内核解析(一) Spark向Yarn提交应用(源码解析)

本文深入解析Spark向Yarn提交应用的详细过程,从执行脚本开始,涵盖参数解析、提交操作、YarnClusterApplication的运行、ApplicationMaster启动与资源申请,到最后ExecutorBackend的执行。
摘要由CSDN通过智能技术生成

Spark内核解析(一) Spark向Yarn提交应用(源码解析)

执行脚本提交任务

实际是启动一个SparkSubmit的JVM进程

  • 提交应用的脚本如下:
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \	// 默认client
--deploy-mode cluster \
./examples/jars/spark-examples_2.12-2.4.5.jar \
10
  • 我们打开bin目录下的spark-submit文件,看看做了啥:
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
  • 可以看见执行了bin/spark-class脚本,最终形成了如下指令:
exec ${JAVA_HOME}/bin/java org.apache.spark.deploy.SparkSubmit 
  • bin/java启动的类,就会启动相应的JVM进程,所以我们去看看SparkSubmit的main方法
  override def main(args: Array[String]): Unit = {
    val submit = new SparkSubmit() {
      self =>
      override def doSubmit(args: Array[String]): Unit = {
        try {
          super.doSubmit(args)
        } catch {
          case e: SparkUserAppException =>
            exitFn(e.exitCode)
        }
      }
    }
    submit.doSubmit(args)
  }

执行提交操作

  • 代码有删减,只看关键部分。我们点击submit.doSubmit(args)进入到super.doSubmit(args),可以看到:
  def doSubmit(args: Array[String]): Unit = {
    val appArgs = parseArguments(args)

    appArgs.action match {
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
      case SparkSubmitAction.KILL => kill(appArgs)
      case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
      case SparkSubmitAction.PRINT_VERSION => printVersion()
    }
  }
解析参数
  • 进入parseArguments(args),可以看到返回了SparkSubmitArguments的实例对象:
protected def parseArguments(args: Array[String]): SparkSubmitArguments = {
    new SparkSubmitArguments(args)
}
  • Scala里面的主构造方法会被调用,以下代码会被执行:
var master: String = null
var deployMode: String = null
var mainClass: String = null
var action: SparkSubmitAction = null

// 解析一系列spark-submit命令行的选项
parse(args.asJava)
  • 这里主要就是看parse(args.asJava)利用正则,匹配出key和value,然后交给handle(name, value)处理:
// SparkSubmitArguments.scala

  override protected def handle(opt: String, value: String): Boolean = {
    opt match {
    
      case MASTER =>
        master = value

      case CLASS =>
        mainClass = value

      case DEPLOY_MODE =>
        if (value != "client" && value != "cluster") {
          error("--deploy-mode must be either \"client\" or \"cluster\"")
        }
        deployMode = value
  }
  • 可以看到,该方法将命令行参数进行了模式匹配:
--master yarn => master
--deploy-mode cluster => deployMode
--class SparkPI(WordCount) => mainClass
提交
  • action = Option(action).getOrElse(SUBMIT),所以进入submit(appArgs, uninitLog):
  private def submit(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {

    def doRunMain(): Unit = {
      if (args.proxyUser != null) {
        
      } else {
        runMain(args, uninitLog)
      }
    }

    if (args.isStandaloneCluster && args.useRest) {
      
    } else {
      doRunMain()
    }
  }

使用提交的参数,运行child class的main方法

  • 因为是Yarn模式,所以会进入到doRunMain(),接着进入到runMain(args, uninitLog):
  private def runMain(args: SparkSubmitArguments, uninitLog: Boolean): Unit = {
    val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)

    Thread.currentThread.setContextClassLoader(loader)

    for (jar <- childClasspath) {
      addJarToClasspath(jar, loader)
    }

    var mainClass: Class[_] = null

    mainClass = Utils.classForName(childMainClass)

    val app: SparkApplication = if (classOf[SparkApplication].isAssignableFrom(mainClass)) {
      mainClass.newInstance().asInstanceOf[SparkApplication]
    } else {
      new JavaMainApplication(mainClass)
    }

    app.start(childArgs.toArray, sparkConf)
  }
准备提交环境
  • prepareSubmitEnvironment方法很重要,返回参数也很重要,我们根据它的返回值(childArgs, childClasspath, sparkConf, childMainClass)往上搜索childMainClass可以看到:
cluster:
childMainClass = org.apache.spark.deploy.yarn.YarnClusterApplication

client:
childMainClass = mainClass

这里,我们主要想了解Yarn的cluster模式

  • 设置类加载器,用于后面的反射
Thread.currentThread.setContextClassLoader(loader)
通过类名加载这个类
mainClass = Utils.classForName(childMainClass)
反射创建类的对象并进行类型转换
val app: SparkApplication = mainClass.newInstance().asInstanceOf[SparkApplication]
运行childMainClass的start方法
app.start(childArgs.toArray, sparkConf)

评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值