点击关注上方“知了小巷”,
设为“置顶或星标”,第一时间送达干货。
Spark源码解析Yarn部署流程(SparkSubmit))
一、Yarn部署流程(SparkSubmit)
1.1 spark-submit 脚本
查看脚本 spark-submit 内容:
源码位置:spark/bin/spark-submit
# $@: 传入脚本的所有参数
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$@"
spark-class
build_command() {
"$RUNNER" -Xmx128m $SPARK_LAUNCHER_OPTS -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"
printf "%d\0" $?
}
源码位置:spark/launcher/src/main/java/org/apache/spark/launcher/Main.java
Spark应用启动的命令行接口,用于Spark内置脚本。
/**
* Command line interface for the Spark launcher. Used internally by Spark scripts.
*/
class Main {
// Usage: Main [class] [class args]
// This CLI works in two different modes: 两种模式
// 1. 如果是"spark-submit":即class是org.apache.spark.deploy.SparkSubmit,那么SparkLauncher class is used to launch a Spark application.
// 2. 如果是"spark-class":if another class is provided, an internal Spark class is run.
// This class works in tandem with the "bin/spark-class" script on Unix-like systems
public static void main(String[] argsArray) throws Exception {
checkArgument(argsArray.length > 0, "Not enough arguments: missing class name.");
// SparkSubmit
if (className.equals("org.apache.spark.deploy.SparkSubmit")) {
try {
AbstractCommandBuilder builder = new SparkSubmitCommandBuilder(args);
cmd = buildCommand(builder, env, printLaunchCommand);
} catch (IllegalArgumentException e) {
// ...
MainClassOptionParser parser = new MainClassOptionParser();
try {
parser.parse(args);
} catch (Exception ignored) {
// Ignore parsing exceptions.
}
// ...
help.add(parser.USAGE_ERROR);
AbstractCommandBuilder builder = new SparkSubmitCommandBuilder(help);
// cmd builder不同
cmd = buildCommand(builder, env, printLaunchCommand);
}
} else {
AbstractCommandBuilder builder = new SparkClassCommandBuilder(className, args);
// cmd builder不同
cmd = buildCommand(builder, env, printLaunchCommand);
}
// ...
}
// ...
}
查看 SparkSubmit 的 main方法:
object SparkSubmit extends CommandLineUtils with Logging
override def main(args: Array[String]): Unit = {
val submit = new SparkSubmit() {
self =>
// 封装配置参数
override protected def parseArguments(args: Array[String]): SparkSubmitArguments = {
// 创建SparkSubmitArguments对象
new SparkSubmitArguments(args) {
// ...
}
}
// ...
override def doSubmit(args: Array[String]): Unit = {
try {
// 会调到这里
super.doSubmit(args)
} catch {
case e: SparkUserAppException =>
exitFn(e.exitCode)
}
}
}
// 调用doSubmit
submit.doSubmit(args)
}
1.1.1 封装参数 new SparkSubmitArguments
上面可以看到创建了 SparkSubmitArguments对象:
spark/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
/**
* Parses and encapsulates arguments from the spark-submit script.
* The env argument is used for testing.
*/
private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, String] = sys.env)
extends SparkSubmitArgumentsParser with Logging {
//...
// Set parameters from command line arguments
// 解析命令行传入的参数
parse(args.asJava)
//...
}
parse的具体实现在:
spark/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java
/**
* Parse a list of spark-submit command line options.
*
* See SparkSubmitArguments.scala for a more formal description of available options.
*
* @throws IllegalArgumentException If an error is found during parsing.
*/
protected final void parse(List args) {
Pattern eqSeparatedOpt = Pattern.compile("(--[^=]+)=(.+)");
int idx = 0;
// 外层是一个for循环,遍历所有arg参数
for (idx = 0; idx String arg = args.