spark源码解析

最新推荐文章于 2024-04-21 05:20:12 发布

小东升职记

最新推荐文章于 2024-04-21 05:20:12 发布

阅读量167

点赞数 1

分类专栏： Spark 文章标签： spark源码解析

本文链接：https://blog.csdn.net/qq_38704184/article/details/102606991

版权

Spark 专栏收录该内容

29 篇文章 2 订阅

订阅专栏

生产环境客户端提交spark程序，基于脚本提交的spark-submit

spark.version:2.4.0
scala.version:2.12

源码解析:

spark-submit:
    
    --main()  # InProcessSparkSubmit

        (new SparkSubmit()).doSubmit(args)
        doSubmit 做模式匹配到submit
            
            # submit 解析参数
            ->parseArguments 此时有一个重要的就是childMainClass为集群yarn模式的时候取值为org.apache.spark.deploy.yarn.YarnClusterApplication
             
            #submit最后调用内部方法doRunMain(),器内部再次调用runMain()方法
            runmain()->mainClass = Utils.classForName(childMainClass) # 进入yarn
                        ->app.start()->mainMethod.invoke()类反射调用
                        

yarn :org.apache.spark.deploy.yarn.YarnClusterApplication
    
    对SparkApplication onstart方法进行重写
    class YarnClusterApplication extends SparkApplication
    
    # 启动yarn客户端，yarnClient此时应该不在yarn中，和sparksubmit同级
    start()->new Client(new ClientArguments(args), conf).run()
        
        new ClientArguments(args) # 单纯的对参数进行解析
        
        new Client()->
            准备yarn客户端
            val yarnClient = YarnClient.createYarnClient ->YarnClient client = new YarnClientImpl()
            集群的一些配置信息
        /**
   * Submit an application to the ResourceManager.
   * If set spark.yarn.submit.waitAppCompletion to true, it will stay alive
   * reporting the application's status until the application has exited for any reason.
   * Otherwise, the client process will exit after submission.
   * If the application finishes with a failed, killed, or undefined status,
   * throw an appropriate SparkException.
   */
        run()->submitApplication()[启动后端连接，yarn客户端初始化，yarn客户端start]
                
                // Get a new application from our RM 在其中的一个nodemanger中创建AM
                val newApp = yarnClient.createApplication()
                
                 // Set up the appropriate contexts to launch our AM 
                // 提交的是相关的/bin/java 命令，所以am是一个进程，我们可以使用jps看到的
                val containerContext = createContainerLaunchContext(newAppResponse)
                        createContainerLaunchContext 
                            val amClass =
                                if (isClusterMode) {
                                    Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
                                } else {
                                    # spark-shell xcall jps 可以看到显示的为ExecutorLauncher，spark-shell不可能为cluster，只能为client
                                    Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
                                }
                                
                            org.apache.spark.deploy.yarn.ApplicationMaster
                            
                            ApplicationMaster()->val amArgs = new ApplicationMasterArguments(args) # 解析参数
                            master = new ApplicationMaster(amArgs)
                            启动ApplicationMaster
                            System.exit(master.run())
                            
                            run()->runImpl()->->runDriver()/runExecutorLauncher()->
                                userClassThread = startUserApplication()->userThread.setName("Driver") userThread.start() # driver 可以看出来是一个线程
                                
                                # 分配资源
                                createAllocator->allocator.allocateResources()
                                    
                                    
                                    handleAllocatedContainers(allocatedContainers.asScala)
                                        
                                        runAllocatedContainers(containersToUse)
                                            
                                            ExecutorRunnable.run()->
                                                # 想其中的一个nodemanager中创建容器
                                                nmClient = NMClient.createNMClient()
                                                nmClient.init(conf)
                                                nmClient.start()
                                                startContainer()->prepareCommand()->org.apache.spark.executor.CoarseGrainedExecutorBackend
                                                
                                                    ->main()->run()->env.rpcEnv.setupEndpoint("Executor", new CoarseGrainedExecutorBackend
/**
 * An end point for the RPC that defines what functions to trigger given a message.
 *
 * It is guaranteed that `onStart`, `receive` and `onStop` will be called in sequence.
 *
 * The life-cycle of an endpoint is:
 *
 * {@code constructor -> onStart -> receive* -> onStop}
 *
 * Note: `receive` can be called concurrently. If you want `receive` to be thread-safe, please use
 * [[ThreadSafeRpcEndpoint]]
 *
 * If any error is thrown from one of [[RpcEndpoint]] methods except `onError`, `onError` will be
 * invoked with the cause. If `onError` throws an error, [[RpcEnv]] will ignore it.
 */
                                                    
                                                    # executor先想driver注册，接收driver的注册成功响应，最后启动任务
                                                    ->onstart()-receive[RegisteredExecutor,RegisterExecutorFailed,LaunchTask]

小东升职记

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark源码解析

生产环境客户端提交spark程序，基于脚本提交的spark-submitspark.version:2.4.0scala.version:2.12源码解析:spark-submit: --main() #InProcessSparkSubmit (new SparkSubmit()).doSubmit(args) doSubm...
复制链接

扫一扫