Spark Shuffer机制

1、概要

1)、从Stage划分,最后一个Stage称为ResultStage,前面被称为ShufferMapStage,一个ShufferMapStage结束会有一个写磁盘操作。

2)、Spark Shuffer分为Map和Reduce阶段,跟MapReduce不一样,有Map程序和Reduce程序

3)、ResultStage结束了,Shuffer一定会有读写磁盘,一个Job的运行就结束了。

4)、Shuffer任务个数跟分区个数一致

 

2、从源码分析Shuffer过程

Executor接收到Driver发送过来的Task后,就直接调用Task的run方法,也就是开始执行Task

override def run(): Unit = {
          val res = task.run(
            taskAttemptId = taskId,
            attemptNumber = attemptNumber,
            metricsSystem = env.metricsSystem)
          threwException = false
          res
}

所以我们要看的是从Driver发送过来的Task,首先从行动算子进去。

——>SparkContext

  —>runJob():调用DAGScheduler无环图的runJob()

  def runJob[T, U: ClassTag](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      resultHandler: (Int, U) => Unit): Unit = {
    dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get)
    progressBar.foreach(_.finishAll())
    rdd.doCheckpoint()
  }

——>DAGScheduler

  —>runJob():调用submitJob()

  —>submitJob():向BlockingQueue放入一个JobSubmitted(),EventLoop的run()方法的val event = eventQueue.take()就取出来,执行子类的onReceive(event)方法,这里使用了一种模板方法的设计模式,父类提供一个模板,具体实现由子类去实现,调用是在父类。

  def submitJob[T, U](
      rdd: RDD[T],
      func: (TaskContext, Iterator[T]) => U,
      partitions: Seq[Int],
      callSite: CallSite,
      resultHandler: (Int, U) => Unit,
      properties: Properties): JobWaiter[U] = {10
    val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]
    val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)
    eventProcessLoop.post(JobSubmitted(
      jobId, rdd, func2, partitions.toArray, callSite, waiter,
      SerializationUtils.clone(properties)))
    waiter
  }

  —>doOnReceive():匹配到了event类型是JobSubmitted,执行handleJobSubmitted()方法

  override def onReceive(event: DAGSchedulerEvent): Unit = {
    val timerContext = timer.time()
    try {
      doOnReceive(event)
    } finally {
      timerContext.stop()
    }
  }

  private def doOnReceive(event: DAGSchedulerEvent): Unit = event match {
    case JobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties) =>
      dagScheduler.handleJobSubmitted(jobId, rdd, func, partitions, callSite, listener, properties)
    ...
 }

  —>handleJobSubmitted():调用submitStage()

  —>submitStage():循环每个Stage,最后提交每个阶段的Task

  private def submitStage(stage: Stage) {
    val jobId = activeJobForStage(stage)
    if (jobId.isDefined) {
      if (!waitingStages(stage) && !runningStages(stage) && !failedStages(stage)) {
        val missing = getMissingParentStages(stage).sortBy(_.id)
        if (missing.isEmpty) {
          submitMissingTasks(stage, jobId.get)
        } else {
          for (parent <- missing) {
            submitStage(parent)
          }
          waitingStages += stage
        }
      }
    } else {
      abortStage(stage, "No active job for stage " + stage.id, None)
    }
  }

  —>submitMissingTasks():这里的意思实际上就是循环计算后的每个分区,新建ShuffleMapTask或者ResultTask,然后发送到Executor执行。

    val tasks: Seq[Task[_]] = try {
      stage match {
        case stage: ShuffleMapStage =>
          partitionsToCompute.map { id =>
            ...
            new ShuffleMapTask(stage.id, stage.latestInfo.attemptId,
              taskBinary, part, locs, stage.latestInfo.taskMetrics, properties, Option(jobId),
              Option(sc.applicationId), sc.applicationAttemptId)
          }

        case stage: ResultStage =>
          partitionsToCompute.map { id =>
            ...
            new ResultTask(stage.id, stage.latestInfo.attemptId,
              taskBinary, part, locs, id, properties, stage.latestInfo.taskMetrics,
              Option(jobId), Option(sc.applicationId), sc.applicationAttemptId)
          }
      }
    } catch {
    }
    if (tasks.size > 0) {
      taskScheduler.submitTasks(new TaskSet(
        tasks.toArray, stage.id, stage.latestInfo.attemptId, jobId, properties))

——>ShuffleMapTask:这个是一个写过程,那么写之前会有读的(rdd.iterator)操作,也就是读取上一个Stage产生的结果文件

  override def runTask(context: TaskContext): MapStatus = {
    var writer: ShuffleWriter[Any, Any] = null
    try {
      val manager = SparkEnv.get.shuffleManager
      writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
      writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
      writer.stop(success = true).get
    } catch {
    }
  }

  —>ResultTask:其中rdd.iterator(partition, context)就是一个读取磁盘的过程。

  override def runTask(context: TaskContext): U = {
    func(context, rdd.iterator(partition, context))
  }

3、溢写磁盘逻辑说明

目前spark使用的方式是SortShuffleManager,多个Task只产生两个文件,一个索引文件,一个数据文件。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值