SparkSubmit
SparkSubmit.main() --入口类
SparkSubmit.submit() --提交方法,调用SparkApplication.start
Client
ClientApp.start()–继承SparkApplication创建 ClientEndpoint
ClientEndpoint.onStart() --向master请求启动driver
Master
Master.receiveAndReply() --回应Client,找合适的Worker去启动driver
Master.schedule() --根据资源启动driver和executor
Worker
Worker.receive() --LaunchDriver 启动DriverRunner
DriverRunner
DriverRunner.start() --组装command执行启动
SparkContext
new SparkContext() --初始化,taskschedule和dagschedule初始化启动
SparkContext.createTaskScheduler() – StandaloneSchedulerBackend和 TaskSchedulerImpl 初始化
taskScheduler.start() --1、注册driverEndPoint 2、注册application3、启动executor4、executor注册到driver
StandaloneSchedulerBackend
super.start() --调用父类 CoarseGrainedSchedulerBackend 创建 DriverEndpoint
StandaloneAppClient.start --注册application
CoarseGrainedSchedulerBackend
CoarseGrainedSchedulerBackend.start() --DriverEndpoint创建
StandaloneAppClient
StandaloneAppClient.start() --创建 ClientEndpoint
ClientEndpoint.onStart().registerWithMaster() --向Master发送 RegisterApplication Master返回注册完的信息
Master
Master.RegisterApplication() --获取 ClientEndpoint 发送得注册信息
Master.send.RegisteredApplication() --返回 ClientEndpoint 注册成功
Master.schedule() 同上,启动executor
Master.launchExecutor()
Worker
Worker.LaunchExecutor() --ExecutorRunner.start()
ExecutorRunner
ExecutorRunner.start() --executor启动
ExecutorRunner.fetchAndRunExecutor --构建command,启动进程 CoarseGrainedExecutorBackend(executor的启动就是这个)
CoarseGrainedExecutorBackend
main()
CoarseGrainedExecutorBackend.onStart() --向driver注册executor CoarseGrainedSchedulerBackend.RegisterExecutor接受,成功后创建executor对象
executor = new Executor() 对象创建成功
RDD
sc.runJob() --启动job任务
DAGScheduler
DAGScheduler.submitJob()
DAGSchedulerEventProcessLoop.handleJobSubmitted() --循环rdd获取finalstage,通过shuffleDependcies区分.
DAGSchedulerEventProcessLoop.createResultStage() --获取finalstage
DAGSchedulerEventProcessLoop.getShuffleDependencies() --RDD以最后的resultstage往前循环寻找parent
submitStage(finalstage) --提交finalstage是个递归,找到最上面的一层开始执行
submitMissingTasks() --可以执行的stage,生成taskset,task位置最佳算法。
TaskSchedulerImpl
submitTasks() 提交任务
reviveOffers() – driver发送woker启动任务, CoarseGrainedSchedulerBackend 这个类去执行
CoarseGrainedSchedulerBackend
launchTasks() --发送LaunchTask给executor
CoarseGrainedExecutorBackend
LaunchTask() --接受driver信息
executor.launchTask --Executor启动任务1、初始化 TaskRunner线程启动
TaskRunner
run() --Task.run()实际上是ShuffleMapTask或者ResultTask的runTask