本文要解决的问题:
从scheduler各个类的具体方法阅读源码,进一步了解Spark的scheduler的工作原理和过程。
Scheduler的基本过程
用户提交的Job到DAGScheduler后,会封装成ActiveJob,同时启动JobWaiter监听作业的完成情况。同时依据job中RDD的dependency和dependency属性(窄依赖NarrowDependency,宽依赖ShufflerDependecy),DAGScheduler会根据依赖关系的先后产生出不同的stage DAG(result stage, shuffle map stage)。在每一个stage内部,根据stage产生出相应的task,包括ResultTask或是ShuffleMapTask,这些task会根据RDD中partition的数量和分布,产生出一组相应的task,并将其包装为TaskSet提交到TaskScheduler上去。
DAGScheduler
DAGScheduler是高层级别的调度器。实现了stage-oriented调度。它计算一个DAG中stage的工作。并将这些stage输出落地物化。
最终提交stage以taskSet方式提交给TaskScheduler。DAGScheduler需要接收上下层的消息,它也是一个actor。这里主要看看他的一些事件处理。以下是的所处理的事件。
private[scheduler] case class JobSubmitted(
jobId: Int,
finalRDD: RDD[_],
func: (TaskContext, Iterator[_]) => _,
partitions: Array[Int],
callSite: CallSite,
listener: JobListener,
properties: Properties = null)
extends DAGSchedulerEvent
private[scheduler] case class StageCancelled(stageId: Int) extends DAGSchedulerEvent
private[scheduler] case class JobCancelled(jobId: Int) extends DAGSchedulerEvent
private[scheduler] case class JobGroupCancelled(groupId: String) extends DAGSchedulerEvent
private[scheduler] case object AllJobsCancelled extends DAGSchedulerEvent
private[scheduler]
case class BeginEvent(task: Task[_], taskInfo: TaskInfo) extends DAGSchedulerEvent
private[scheduler]
case class GettingResultEvent(taskInfo: TaskInfo) extends DAGSchedulerEvent
还有很多,不一一罗列。
JobSubmitted
//处理提交的作业,完成Job到stage的 转换,生成finalStage,并SubmitStage。
private[scheduler] def handleJobSubmitted(jobId: Int,
finalRDD: RDD[_],
func: (TaskContext, Iterator[_]) => _,
partitions: Array[Int],
callSite: CallSite,
listener: JobListener,
properties: Properties) {
var finalStage: ResultStage = null
try {
// New stage creation may throw an exception if, for example, jobs are run on a
// HadoopRDD whose underlying HDFS files have been deleted.
finalStage = newResultStage(finalRDD, func, partitions, jobId, callSite)
} catch {
case e: Exception =>
logWarning("Creating new stage failed due to exception - job: " + jobId, e)
listener.jobFailed(e)
return
}
val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
clearCacheLocs()
logInfo("Got job %s (%s) with %d output partitions".format(
job.jobId, callSite.shortForm, partitions.length))
logInfo("Final stage: " + finalStage + " (" + finalStage.name