- RDD action => SparkContext.runJob(rdd: RDD[T], func: Iterator[T] => U)
- runJob() => dagScheduler.runJob: 对所有Partitions应用一个函数做变换,返回结果,是所有RDD action的主进入点。
- DAGScheduler.runJob => submitJob()并返回一个future => eventProcessLoop.post(JobSubmitted) 发布事件并等待完成
- eventProcessLoop.onReceive(event) => doOnReceive(event) => handleJobSubmitted() => createResultStage(finalRDD, func, partitions, jobId, callSite) => 创建ActiveJob => submitStage(finalStage)
- submitStage(stage) => 递归提交missing parent stages, 根据RDD的依赖关系构建DAG => submitMissingTasks(stage, jobId.get)
=> taskScheduler.submitTasks(new TaskSet(tasks.toArray, stage.id, stage.latestInfo.attemptId, jobId, properties)) - submitWaitingChildStages(stage)
Spark作业提交和DAG调度器生成Task
最新推荐文章于 2023-06-05 13:24:46 发布