sparkYarn集群提交流程分析(二)
书接上文,我们了解到了在不涉及集群的情况下,我们需要在本地启动一个
SparkSubmit
进程并且,在进程中执行了一个Client
伴生对象的main方法,这次我们从client
是什么说起
-
client
-
def main(argStrings: Array[String]) { if (!sys.props.contains("SPARK_SUBMIT")) { logWarning("WARNING: This client is deprecated and will be removed in a " + "future version of Spark. Use ./bin/spark-submit with \"--master yarn\"") } System.setProperty("SPARK_YARN_MODE", "true") val sparkConf = new SparkConf // SparkSubmit would use yarn cache to distribute files & jars in yarn mode, // so remove them from sparkConf here for yarn mode. sparkConf.remove("spark.jars") sparkConf.remove("spark.files") val args = new ClientArguments(argStrings) new Client(args, sparkConf).run() }
-
1 .可以看到进入main方法后,心事对这些配置一顿操作,只需要知道我们的提交模式是
spark-submit
进入到判断语句即可 -
2 .进入判断语句后只是对一些配置进行操作,这些都可以很简单的根据文档注释理解
-
3 .主要代码在于
-
//封装从sparksubmit传来的参数 val args = new ClientArguments(argStrings) //创建一个client对象 new Client(args, sparkConf).run()
-
4 .进入到
Client
类的主构造方法中 -
private[spark] class Client( val args: ClientArguments, val hadoopConf: Configuration, val sparkConf: SparkConf) extends Logging { import Client._ import YarnSparkHadoopUtil._ def this(clientArgs: ClientArguments, spConf: SparkConf) = this(clientArgs, SparkHadoopUtil.get.newConfiguration(spConf), spConf) //维护了一个YarnClient private val yarnClient = YarnClient.createYarnClient private val yarnConf = new YarnConfiguration(hadoopConf) private val isClusterMode = sparkConf.get("spark.submit.deployMode", "client") == "cluster"
-
5 .可以明显发现,
Client
中维护了一个YarnClient
对象,猜想这个是与yarn连接的对象 -
6 .在main方法中不光创建了对象还执行了Client对象的
run()
方法
-
-
run()
-
def run(): Unit = { this.appId = submitApplication() if (!launcherBackend.isConnected() && fireAndForget) { val report = getApplicationReport(appId) val state = report.getYarnApplicationState logInfo(s"Application report for $appId (state: $state)") logInfo(formatReportDetails(report)) if (state == YarnApplicationState.FAILED || state == YarnApplicationState.KILLED) { throw new SparkException(s"Application $appId finished with status: $state") } } else { val (yarnApplicationState, finalApplicationStatus) = monitorApplication(appId) if (yarnApplicationState == YarnApplicationState.FAILED || finalApplicationStatus == FinalApplicationStatus.FAILED) { throw new SparkException(s"Application $appId finished with failed status") } if (yarnApplicationState == YarnApplicationState.KILLED || finalApplicationStatus == FinalApplicationStatus.KILLED) { throw new SparkException(s"Application $appId is killed") } if (finalApplicationStatus == FinalApplicationStatus.UNDEFINED) { throw new SparkException(s"The final status of application $appId is undefined") } } }
-
1 .上面代码中比较关键的就是第2行代码
-
this.appId = submitApplication()
-
2 .这里可以发现这个方法的名字叫做
submitApplication
那么就一定与提交任务有关,点进去 -
/** * Submit an application running our ApplicationMaster to the ResourceManager. * * The stable Yarn API provides a convenience method (YarnClient#createApplication) for * creating applications and setting up the application submission context. This was not * available in the alpha API. */ def submitApplication(): ApplicationId = { var appId: ApplicationId = null try { launcherBackend.connect() // Setup the credentials before doing anything else, // so we have don't have issues at any point. setupCredentials() yarnClient.init(yarnConf) yarnClient.start() logInfo("Requesting a new application from cluster with %d NodeManagers" .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers)) // Get a new application from our RM val newApp = yarnClient.createApplication() val newAppResponse = newApp.getNewApplicationResponse() appId = newAppResponse.getApplicationId() reportLauncherState(SparkAppHandle.State.SUBMITTED) launcherBackend.setAppId(appId.toString) new CallerContext("CLIENT", Option(appId.toString)).setCurrentContext() // Verify whether the cluster has enough resources for our AM verifyClusterResources(newAppResponse) // Set up the appropriate contexts to launch our AM val containerContext = createContainerLaunchContext(newAppResponse) val appContext = createApplicationSubmissionContext(newApp, containerContext) // Finally, submit and monitor the application logInfo(s"Submitting application $appId to ResourceManager") yarnClient.submitApplication(appContext) appId } catch { case e: Throwable => if (appId != null) { cleanupStagingDir(appId) } throw e } }
-
3 .上面注释中说到yarn提供了一个
YarnClient#createApplication
的API来创建Application和创建Application上下文,并且还提到在RM中创建,由此可见YarnClient的createApplication()
会与RM进行交互,请求这个Application,并且还会返回一个AppId
-
-
步骤①小总结:
- 1 .至此对应了图片中从
Client
到RM
的第一次连接 - 2 .通过createApplication方法创建
- 3 .至此验证了Client中的YarnClient是连接Yarn的一个对象
- 1 .至此对应了图片中从
-
createContainerLaunchContext(newAppResponse)
-
1 .在往下查看submitApplication方法可以找到这样一句话
-
// Set up the appropriate contexts to launch our AM val containerContext = createContainerLaunchContext(newAppResponse)
-
2 .注释的意思是设置适当的上下文启动AM,AM(ApplicationMaster)是spark项目的管理者,所以也是最先启动的守护进程
-
3 .所以这个一定要点进去看看
-
/** * Set up a ContainerLaunchContext to launch our ApplicationMaster container. * This sets up the launch environment, java options, and the command for launching the AM. */ private def createContainerLaunchContext(newAppResponse: GetNewApplicationResponse) : ContainerLaunchContext = { logInfo("Setting up container launch context for our AM") val appId = newAppResponse.getApplicationId val appStagingDirPath = new Path(appStagingBaseDir, getAppStagingDir(appId)) val pySparkArchives = if (sparkConf.get(IS_PYTHON_APP)) { findPySparkArchives() } else { Nil } val launchEnv = setupLaunchEnv(appStagingDirPath, pySparkArchives) val localResources = prepareLocalResources(appStagingDirPath, pySparkArchives) val amContainer = Records.newRecord(classOf[ContainerLaunchContext]) amContainer.setLocalResources(localResources.asJava) amContainer.setEnvironment(launchEnv.asJava) val javaOpts = ListBuffer[String]() // Set the environment variable through a command prefix // to append to the existing value of the variable var prefixEnv: Option[String] = None // Add Xmx for AM memory javaOpts += "-Xmx" + amMemory + "m" ........ val amClass = if (isClusterMode) { Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName } else { Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName } .... val amArgs = Seq(amClass) ++ userClass ++ userJar ++ primaryPyFile ++ primaryRFile ++ userArgs ++ Seq( "--properties-file", buildPath(YarnSparkHadoopUtil.expandEnvironment(Environment.PWD), LOCALIZED_CONF_DIR, SPARK_CONF_FILE)) // Command for the ApplicationMaster val commands = prefixEnv ++ Seq( YarnSparkHadoopUtil.expandEnvironment(Environment.JAVA_HOME) + "/bin/java", "-server" ) ++ javaOpts ++ amArgs ++ Seq( "1>", ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout", "2>", ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr") // TODO: it would be nicer to just make sure there are no null commands here val printableCommands = commands.map(s => if (s == null) "null" else s).toList amContainer.setCommands(printableCommands.asJava) ...... // send the acl settings into YARN to control who has access via YARN interfaces val securityManager = new SecurityManager(sparkConf) amContainer.setApplicationACLs( YarnSparkHadoopUtil.getApplicationAclsForYarn(securityManager).asJava) setupSecurityToken(amContainer) amContainer }
-
4 .这里主要的意思就是封装一个java命令发送到集群中去执行,具体什么样的代码,继续向下看
-
5 .代码中找到一行注释
// Command for the ApplicationMaster
就是创建AM
的命令 -
6 .向上查找比较关键的就是变量
amClass
的值,由于是集群运行所以-
amClass = org.apache.spark.deploy.yarn.ApplicationMaster
-
-
7 .拼接后大概得出一个这样的命令
-
${JAVA_HOME}/bin/java org.apache.spark.deploy.yarn.ApplicationMaster
-
8 .也就是将这行代码发送到集群中执行,此时在及集群中RM会选择一个NM的节点执行
ApplicationMaster
的main()
也就是说在某一个节点上创建了一个AM的进程
-
-
步骤②总结
- 1 .在向RM第一次提交
Application
之后,同样在SubmitApplication
方法中会执行创建AM
的操作 - 2 .具体形式是由本地封装java命令发送到集群中运行
- 3 .发送往集群的过程后会在某一个节点中创建一个
ApplicationMaster
的进程
也就是将这行代码发送到集群中执行,此时在及集群中RM会选择一个NM的节点执行ApplicationMaster
的main()
也就是说在某一个节点上创建了一个AM的进程
- 1 .在向RM第一次提交
步骤②总结
-
1 .在向RM第一次提交
Application
之后,同样在SubmitApplication
方法中会执行创建AM
的操作 -
2 .具体形式是由本地封装java命令发送到集群中运行
-
3 .发送往集群的过程后会在某一个节点中创建一个
ApplicationMaster
的进程 -
欲听后事如何,请听下回解析!!!
转下一篇
(三) https://blog.csdn.net/long_World/article/details/114984490