2. Spark源码解析之org.apache.spark.launcher.Main源码解析

最新推荐文章于 2023-01-02 17:43:52 发布

訾零

最新推荐文章于 2023-01-02 17:43:52 发布

阅读量864

点赞数 2

分类专栏： Spark

本文链接：https://blog.csdn.net/lingeio/article/details/96900143

版权

不管是启动spark-shell，或者通过spark-submit提交jar，还是其他其他master或者worker的脚本，最后都会进入spark-class，并调用launch.main方法构建执行命令。

java -Xmx128m -cp ...jars org.apache.spark.launcher.Main "$@"

也就是说org.apache.spark.launcher.Main是被spark-class调用，从spark-class接收参数。这个类是提供spark内部脚本调用的工具类，并不是真正的执行入口。它负责调用其他类，对参数进行解析，并生成执行命令，最后将命令返回给spark-class的 exec “${CMD[@]}”执行。

它主要是根据提交的类型spark-submit和spark-class（master、worker、hostoryserver等等），构建对应的命令解析对象SparkSubmitCommandBuilder和SparkClassCommandBuilder，再通过buildCommand方法构造执行命令。

大概看一下这时sparksubmit的参数，Master和Worker后续解析：

方式

参数

spark-shell

org.apache.spark.deploy.SparkSubmit

--class org.apache.spark.repl.Main

--name "Spark shell"

spark-submit

org.apache.spark.deploy.SparkSubmit

--class com.idmapping.scala.WordCount \
--master yarn \
--deploy-mode client \
--driver-memory 4G \
--executor-memory 3G \
--executor-cores 2 \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.default.parallelism=24 \
/user/jars/idmapping-job-1.0-SNAPSHOT.jar \
file:///user/tmp/words.txt file:///user/data/wordcount/

/**
 * Command line interface for the Spark launcher. Used internally by Spark scripts.
 */
//这是提供spark内部脚本使用工具类

class Main {

  /**
   * Usage: Main [class] [class args]
   * <p>
   * 分为spark-submit和spark-class两种模式
   * 但提交的是class类的话,会包含其他如:master/worker/history等等
   * This CLI works in two different modes:
   * <ul>
   *   <li>"spark-submit": if <i>class</i> is "org.apache.spark.deploy.SparkSubmit", the
   *   {@link SparkLauncher} class is used to launch a Spark application.</li>
   *   <li>"spark-class": if another class is provid