本文直接从SparkSubmit说起,脚本提交过程在之前的《spark-submit脚本执行过程》文章中已经说明。
一、主要过程概括
1、执行org.apache.spark.deploy.SparkSubmit的main方法提交。
2、运行yarn客户端Client的run方法。
3、向ResourceManager提交application请求container用来运行ApplicationMaster。
4、运行ApplicationMaster的main方法,运行driver程序并注册AM。
5、用户程序开始运行,遇到action动作开始作业调度。
二、源码分析
首先,SparkSubmit入口函数
override def main(args: Array[String]): Unit = {
val submit = new SparkSubmit() {
self =>
override protected def parseArguments(args: Array[String]): SparkSubmitArguments = {
new SparkSubmitArguments(args) {
override protected def logInfo(msg: => String): Unit = self.logInfo(msg)
override protected def logWarning(msg: => String): Unit = self.logWarning(msg)
}
}
override protected def logInfo(msg: => String): Unit = printMessage(msg)
override protected def logWarning(msg: => String): Unit = printMessage(s"Warning: $msg")
override def doSubmit(args: Array[String]): Unit = {
try {
super.doSubmit(args)
} catch {
case e: SparkUserAppException =>
exitFn(e.exitCode)
}
}
}
submit.doSubmit(args)
}
然后
def doSubmit(args: Array[String]): Unit = {
// Initialize logging if it hasn't been done yet. Keep track of whether logging needs to
// be reset before the application starts.
val uninitLog = initializeLogIfNecessary(true, silent = true)
val appArgs = parseArguments(args)
if (appArgs.verbose) {
logInfo(appArgs.toString)
}
appArgs.action match {
// 匹配到这里
case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
case SparkSubmitAction.KILL => kill(appArgs)
case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
case SparkSubmitAction.PRINT_VERSION => printVersion()
}
}
实际的submit执行分为两步
- 第一步:通过设置classpath、系统属性、应用参数来准备启动环境,用来运行由集群管理器和部署模式决定的child main class。
- 第二步:调用child main class的main方法。
看如下代码
<