Spark 源码阅读（5）——Spark-submit任务提交流程_spark-submit 生成单个文件-CSDN博客

本文链接：https://blog.csdn.net/qq_21989939/article/details/79598323

任务提交后执行前的逻辑：

client端：

1、spark-submit脚本提交任务，会通过反射的方式调用到我们自己提交的类的main方法

2、执行我们自己代码中的new SparkContext

2.1、创建actorSystem

2.2、创建TaskSchedulerImpl 任务分发的类

2.3、创建SparkDeploySchedulerBackend 调度任务

2.4、创建DAGScheduler 划分任务,创建一个线程，task阻塞队列

2.5、创建clinetActor,向master注册app(job-jar)信息

2.6、创建driverActor

Master端：

1、master接收到app的注册信息后将信息保存起来

2、通过worker的心跳和注册，知道集群中有多少资源

3、对比app需要的资源，然后根据集群中有的资源，进行资源分配（打伞、集中）

4、通知worker，启动executor

Worker端：

1、启动executor

2、和driverActor交互，监听和接收任务

集群提交任务后的调度spark-submit

在spark-submit的脚本启动启动了SparkSubmit类，调用了SparkSubmit的main方法

下面我们看一下SparkSubmit的main方法：

def main(args: Array[String]): Unit = {
  val appArgs = new SparkSubmitArguments(args)
  if (appArgs.verbose) {
    printStream.println(appArgs)
  }
  appArgs.action match {
    case SparkSubmitAction.SUBMIT => submit(appArgs)
    case SparkSubmitAction.KILL => kill(appArgs)
    case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
  }
}

匹配到SUBMIT，看一下submit方法：

private[spark] def submit(args: SparkSubmitArguments): Unit = {
  val (childArgs, childClasspath, sysProps, childMainClass) = prepareSubmitEnvironment(args)

  def doRunMain(): Unit = {
    if (args.proxyUser != null) {
      val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
        UserGroupInformation.getCurrentUser())
      try {
        proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
          override def run(): Unit = {
            runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)
          }
        })
      } catch {
        case e: Exception =>
          // Hadoop's AuthorizationException suppresses the exception's stack trace, which
          // makes the message printed to the output by the JVM not very helpful. Instead,
          // detect exceptions with empty stack traces here, and treat them differently.
          if (e.getStackTrace().length == 0) {
            printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}")
            exitFn()
          } else {
            throw e
          }
      }
    } else {
      runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)
    }
  }

   // In standalone cluster mode, there are two submission gateways:
   //   (1) The traditional Akka gateway using o.a.s.deploy.Client as a wrapper
   //   (2) The new REST-based gateway introduced in Spark 1.3
   // The latter is the default behavior as of Spark 1.3, but Spark submit will fail over
   // to use the legacy gateway if the master endpoint turns out to be not a REST server.
  if (args.isStandaloneCluster && args.useRest) {
    try {
      printStream.println("Running Spark using the REST application submission protocol.")
      doRunMain()
    } catch {
      // Fail over to use the legacy submission gateway
      case e: SubmitRestConnectionException =>
        printWarning(s"Master endpoint ${args.master} was not a REST server. " +
          "Falling back to legacy submission gateway instead.")
        args.useRest = false