pyspark源码之SparkSubmit学习( SparkSubmit.scala)

本系列文章是下载的是spark2.2.1版本的源码进行相关分析和学习。

SparkSubmit.scala包含了3个Object和1个class,分别是SparkSubmitAction、SparkSubmit、SparkSubmitUtil和OptionAssigner。
(1)首先来看一下SparkSubmitAction
SparkSubmitAction是一个只允许在deploy包中访问的枚举子类,用来判断sparksubmit命令的请求类型。
类型分为三种:提交、杀死或请求应用程序的状态,但后两种操作目前仅支持独立模式和Mesos集群模式
源码如下:
private[deploy] object SparkSubmitAction extends Enumeration {
  type SparkSubmitAction = Value
  val SUBMIT, KILL, REQUEST_STATUS = Value
}
(2)其次SparkSubmitUtils也是一个Object,由它是一个sparksubmit的辅助类,主要用于提供在SparkSubmit内部使用的一些方法.
(3)最后我们说一下SparkSubmit,它是一个非常重要的Object。
// Cluster managers集群管理器
  private val YARN = 1
  private val STANDALONE = 2
  private val MESOS = 4
  private val LOCAL = 8
  private val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
  
  // Deploy modes部署模式
  private val CLIENT = 1
  private val CLUSTER = 2
  private val ALL_DEPLOY_MODES = CLIENT | CLUSTER
  
  // Special primary resource names that represent shells rather than application jars.
  //表示shell而不是应用程序jars的特殊的主要资源名。

  private val SPARK_SHELL = "spark-shell"
  private val PYSPARK_SHELL = "pyspark-shell"
  private val SPARKR_SHELL = "sparkr-shell"
  private val SPARKR_PACKAGE_ARCHIVE = "sparkr.zip"
  private val R_PACKAGE_ARCHIVE = "rpkg.zip"
  
  我们可以看下主方法main():
   override def main(args: Array[String]): Unit = {
    val appArgs = new SparkSubmitArguments(args)//新建一个SparkSubmitArguments对象
    if (appArgs.verbose) { //如果appArgs是冗长的则打印,且在打印时会修改敏感信息
      // scalastyle:off println
      printStream.println(appArgs)
      // scalastyle:on println
    }
    appArgs.action match {
      case SparkSubmitAction.SUBMIT => submit(appArgs)//通过spark-submit提交应用程序
      case SparkSubmitAction.KILL => kill(appArgs)//通过spark-submit取消应用程序,目前仅支持独立模式和Mesos集群模式
      case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)//通过spark-submit请求得到应用程序状态,目前仅支持独立模式和Mesos集群模式

    }
  }

通过主程序可以看出,在主程序通过匹配action来完成相关操作:包括提交应用程序,取消应用程序,或请求应用程序状态三种,
  但后两种目前仅支持独立模式和Mesos集群模式
  我们先来看看submit操作:
  submit方法中首先通过CLI传递过来的参数,设置不同模式下的合适的类路径、系统属性及应用参数,然后创建环境运行应用程序的Main方法。
  具体内容如下:
   private def submit(args: SparkSubmitArguments): Unit = {
    val (childArgs, childClasspath, sysProps, childMainClass) = prepareSubmitEnvironment(args)//准备提交环境


    def doRunMain(): Unit = {
      if (args.proxyUser != null) {
        val proxyUser = UserGroupInformation.createProxyUser(args.proxyUser,
          UserGroupInformation.getCurrentUser())
        try {
          proxyUser.doAs(new PrivilegedExceptionAction[Unit]() {
            override def run(): Unit = {
              runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)
            }
          })
        } catch {
          case e: Exception =>
            if (e.getStackTrace().length == 0) {
              // scalastyle:off println
              printStream.println(s"ERROR: ${e.getClass().getName()}: ${e.getMessage()}")
              // scalastyle:on println
              exitFn(1)
            } else {
              throw e
            }
        }
      } else {
        runMain(childArgs, childClasspath, sysProps, childMainClass, args.verbose)
      }
    }

    if (args.isStandaloneCluster && args.useRest) {
      try {
        // scalastyle:off println
        printStream.println("Running Spark using the REST application submission protocol.")
        // scalastyle:on println
        doRunMain()
      } catch {
        // Fail over to use the legacy submission gateway
        case e: SubmitRestConnectionException =>
          printWarning(s"Master endpoint ${args.master} was not a REST server. " +
            "Falling back to legacy submission gateway instead.")
          args.useRest = false
          submit(args)
      }
    // 在所有其他模式中,只需要运行主类就可以了
    } else {
      doRunMain()
    }
  }
  
  由此可以看出submit调用doRunMain方法,然后doRunMain方法调用runMain方法触发应用程序的main方法。
  runMain方法如下:
    private def runMain(
      childArgs: Seq[String],
      childClasspath: Seq[String],
      sysProps: Map[String, String],
      childMainClass: String,
      verbose: Boolean): Unit = {
    // scalastyle:off println
    if (verbose) {
      printStream.println(s"Main class:\n$childMainClass")
      printStream.println(s"Arguments:\n${childArgs.mkString("\n")}")
      // sysProps may contain sensitive information, so redact before printing
 //sysProps可能包含敏感信息,所以在打印前要重新编辑
      printStream.println(s"System properties:\n${Utils.redact(sysProps).mkString("\n")}")
      printStream.println(s"Classpath elements:\n${childClasspath.mkString("\n")}")
      printStream.println("\n")
    }
    // scalastyle:on println


    val loader =//获取类路径
      if (sysProps.getOrElse("spark.driver.userClassPathFirst", "false").toBoolean) {
        new ChildFirstURLClassLoader(new Array[URL](0),
          Thread.currentThread.getContextClassLoader)
      } else {
        new MutableURLClassLoader(new Array[URL](0),
          Thread.currentThread.getContextClassLoader)
      }
    Thread.currentThread.setContextClassLoader(loader)


    for (jar <- childClasspath) {//将jar包添加到类路径中
      addJarToClasspath(jar, loader)
    }


    for ((key, value) <- sysProps) { //设置相关属性
      System.setProperty(key, value)
    }


    var mainClass: Class[_] = null


    try {
      mainClass = Utils.classForName(childMainClass)//找到主方法
    } catch {
      case e: ClassNotFoundException =>
        e.printStackTrace(printStream)
        if (childMainClass.contains("thriftserver")) {
          // scalastyle:off println
          printStream.println(s"Failed to load main class $childMainClass.")
          printStream.println("You need to build Spark with -Phive and -Phive-thriftserver.")
          // scalastyle:on println
        }
        System.exit(CLASS_NOT_FOUND_EXIT_STATUS)
      case e: NoClassDefFoundError =>
        e.printStackTrace(printStream)
        if (e.getMessage.contains("org/apache/hadoop/hive")) {
          // scalastyle:off println
          printStream.println(s"Failed to load hive class.")
          printStream.println("You need to build Spark with -Phive and -Phive-thriftserver.")
          // scalastyle:on println
        }
        System.exit(CLASS_NOT_FOUND_EXIT_STATUS)
    }


    // SPARK-4170
    if (classOf[scala.App].isAssignableFrom(mainClass)) { //如果没找到应用程序的主方法则给出警告
      printWarning("Subclasses of scala.App may not work correctly. Use a main() method instead.")
    }

    val mainMethod = mainClass.getMethod("main", new Array[String](0).getClass)//确保main方法为静态方法
    if (!Modifier.isStatic(mainMethod.getModifiers)) {
      throw new IllegalStateException("The main method in the given main class must be static")
    }

    @tailrec
    def findCause(t: Throwable): Throwable = t match {
      case e: UndeclaredThrowableException =>
        if (e.getCause() != null) findCause(e.getCause()) else e
      case e: InvocationTargetException =>
        if (e.getCause() != null) findCause(e.getCause()) else e
      case e: Throwable =>
        e
    }

    try {
      mainMethod.invoke(null, childArgs.toArray)
    } catch {
      case t: Throwable =>
        findCause(t) match {
          case SparkUserAppException(exitCode) =>
            System.exit(exitCode)


          case t: Throwable =>
            throw t
        }
    }
  }

 接着我们来看看kill操作:
 kill操作利用CLI传递过来的子任务ID和master通过REST协议的Post方式取消现有的任务。仅适合独立和Mesos的集群模式。
  private def kill(args: SparkSubmitArguments): Unit = {
    new RestSubmissionClient(args.master)
      .killSubmission(args.submissionToKill)
  }
  
 最后我们来看看requestStatus操作:
 该操作利用CLI传递过来的子任务ID和master通过REST协议Get方式得到任务的具体信息。仅适合独立和Mesos的集群模式。
 private def requestStatus(args: SparkSubmitArguments): Unit = {
    new RestSubmissionClient(args.master)
      .requestSubmissionStatus(args.submissionToRequestStatusFor)
  }

(4)最后附上SparkSubmit.scala源码全文(里面的汉语备注是自己理解翻译的,而英文备注是源码里提供的,可能存在着偏差,如果您有好的解读欢迎指正与分享)

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.spark.deploy

import java.io.{File, IOException}
import java.lang.reflect.{InvocationTargetException, Modifier, UndeclaredThrowableException}
import java.net.URL
import java.nio.file.Files
import java.security.PrivilegedExceptionAction
import java.text.ParseException

import scala.annotation.tailrec
import scala.collection.mutable.{ArrayBuffer, HashMap, Map}
import scala.util.Properties

import org.apache.commons.lang3.StringUtils
import org.apache.hadoop.conf.{Configuration => HadoopConfiguration}
import org.apache.hadoop.fs.{FileSystem, Path}
import org.apache.hadoop.security.UserGroupInformation
import org.apache.ivy.Ivy
import org.apache.ivy.core.LogOptions
import org.apache.ivy.core.module.descriptor._
import org.apache.ivy.core.module.id.{ArtifactId, ModuleId, ModuleRevisionId}
import org.apache.ivy.core.report.ResolveReport
import org.apache.ivy.core.resolve.ResolveOptions
import org.apache.ivy.core.retrieve.RetrieveOptions
import org.apache.ivy.core.settings.IvySettings
import org.apache.ivy.plugins.matcher.GlobPatternMatcher
import org.apache.ivy.plugins.repository.file.FileRepository
import org.apache.ivy.plugins.resolver.{ChainResolver, FileSystemResolver, IBiblioResolver}

import org.apache.spark._
import org.apache.spark.api.r.RUtils
import org.apache.spark.deploy.rest._
import org.apache.spark.launcher.SparkLauncher
import org.apache.spark.util._

/**
 * Whether to submit, kill, or request the status of an application.是否提交、杀死或请求应用程序的状态
 * The latter two operations are currently supported only for standalone and Mesos cluster modes.
 *后两种操作目前仅支持独立模式和Mesos集群模式
 */
private[deploy] object SparkSubmitAction extends Enumeration {
  type SparkSubmitAction = Value
  val SUBMIT, KILL, REQUEST_STATUS = Value
}

/**
 * Main gateway of launching a Spark application.动Spark应用程序的主要关口。
 *
 * This program handles setting up the classpath with relevant Spark dependencies and provides
 * a layer over the different cluster managers and deploy modes that Spark supports.
 *该程序将使用相关的Spark依赖性设置类路径classpath ,并提供Spark上支持的不同集群管理器和部署模式的图层。
 */
object SparkSubmit extends CommandLineUtils {

  // Cluster managers集群管理器
  private val YARN = 1
  private val STANDALONE = 2
  private val MESOS = 4
  private val LOCAL = 8
  private val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL

  // Deploy modes部署模式
  private val CLIENT = 1
  private val CLUSTER = 2
  private val ALL_DEPLOY_MODES = CLIENT | CLUSTER

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值