不管是启动spark-shell,或者通过spark-submit提交jar,还是其他其他master或者worker的脚本,最后都会进入spark-class,并调用launch.main方法构建执行命令。
java -Xmx128m -cp ...jars org.apache.spark.launcher.Main "$@"
也就是说org.apache.spark.launcher.Main是被spark-class调用,从spark-class接收参数。这个类是提供spark内部脚本调用的工具类,并不是真正的执行入口。它负责调用其他类,对参数进行解析,并生成执行命令,最后将命令返回给spark-class的 exec “${CMD[@]}”执行。
它主要是根据提交的类型spark-submit和spark-class(master、worker、hostoryserver等等),构建对应的命令解析对象SparkSubmitCommandBuilder和SparkClassCommandBuilder,再通过buildCommand方法构造执行命令。
大概看一下这时sparksubmit的参数,Master和Worker后续解析:
方式 | 参数 |
spark-shell | org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main --name "Spark shell" |
spark-submit | org.apache.spark.deploy.SparkSubmit --class com.idmapping.scala.WordCount \ |
/**
* Command line interface for the Spark launcher. Used internally by Spark scripts.
*/
//这是提供spark内部脚本使用工具类
class Main {
/**
* Usage: Main [class] [class args]
* <p>
* 分为spark-submit和spark-class两种模式
* 但提交的是class类的话,会包含其他如:master/worker/history等等
* This CLI works in two different modes:
* <ul>
* <li>"spark-submit": if <i>class</i> is "org.apache.spark.deploy.SparkSubmit", the
* {@link SparkLauncher} class is used to launch a Spark application.</li>
* <li>"spark-class": if another class is provid