Spark执行环境——输出提交协调器

当Spark应用程序使用了SparkSQL(包括Hive)或者需要将任务的输出保存到HDFS时,就会用到输出提交协调器OutputCommitCoordinator,OutputCommitCoordinator将决定任务是否可以提交输出到HDFS。无论是Driver还是Executor,在SparkEnv中都包含了子组件OutputCommitCoordinator。在Driver上注册了OutputCommitCoordianatorEndpoint,所有Executor上的OutputCommitCoordinator都是通过OutputCommitCoordinatorEndpoint的RpcEndpointRef来询问Driver上的OutputCommitCoordinator,是否能够将输出提交到HDFS。

SparkEnv中创建OutputCommitCoordinator的实现代码如下:

//org.apache.spark.SparkEnv
 val outputCommitCoordinator = mockOutputCommitCoordinator.getOrElse {
   new OutputCommitCoordinator(conf, isDriver)
 }
 val outputCommitCoordinatorRef = registerOrLookupEndpoint("OutputCommitCoordinator",
   new OutputCommitCoordinatorEndpoint(rpcEnv, outputCommitCoordinator))
 outputCommitCoordinator.coordinatorRef = Some(outputCommitCoordinatorRef)

根据上述代码,OutputCommitCoordinator的创建步骤如下:

  • 1)新建OutputCommitCoordinator实例
  • 2)如果当前实例是Driver,则创建OutputCommitCoordinatorEndpoint,并且注册到Dispatcher中,注册名为OutputCommitCoordinator。如果当前应用程序是Executor,则从远端Driver实例的NettyRpcEnv的Dispatcher中查找OutputCommitCoordinatorEndpoint的引用
  • 3)无论是Driver还是Executor,最后都由OutputCommitCoordinator的属性coordinatorRef持有OutputCommitCoordinatorEndpoint的引用。

1 OutputCommitCoordinatorEndpoint的实现

OutputCommitCoordinatorEndpoint的代码如下:

//org.apache.spark.scheduler.OutputCommitCoordinator
private[spark] object OutputCommitCoordinator {
  private[spark] class OutputCommitCoordinatorEndpoint(
      override val rpcEnv: RpcEnv, outputCommitCoordinator: OutputCommitCoordinator)
    extends RpcEndpoint with Logging {
    override def receive: PartialFunction[Any, Unit] = {
      case StopCoordinator =>
        logInfo("OutputCommitCoordinator stopped!")
        stop()
    }
    override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
      case AskPermissionToCommitOutput(stage, partition, attemptNumber) =>
        context.reply(
          outputCommitCoordinator.handleAskPermissionToCommit(stage, partition, attemptNumber))
    }
  }
}

OutputCommitCoordinatorEndpoint将接收两个消息:

  • StopCoordinator:此消息将停止OutputCommitCoordinatorEndpoint
  • AskPermissionToCommitOutput:此消息将通过OutputCommitCoordinator的handleAskPermissionToCommit方法处理,进而确认客户端是否有权限将输出提交到HDFS

2 OutputCommitCoordinator的实现

OutputCommitCoordinator用于判定给定Stage的分区任务是否有权限将输出提交到HDFS,并对同一分区任务的多次尝试(TaskAttempt)进行协调。OutputCommitCoordinator中有以下属性:

  • conf:即SparkConf
  • isDriver:当前节点是否是Driver
  • coordinatorRef:OutputCommitCoordinatorEndpoint的NettyRpcEndpointRef引用
  • NO_AUTHORIZED_COMMITTER:值为-1的常量
  • authorizedCommitterByStage:缓存Stage的各个分区的任务尝试。

了解了OutpuCommitCoordiantor的各个属性,下面来看看OutputCommitCoordinator实现 的方法

2.1  handlerAskPermissionToCommit方法

用于判断给定的任务尝试是否有权限将给定Stage的指定分区的数据提交到HDFS

//org.apache.spark.scheduler.OutputCommitCoordinator
private[scheduler] def handleAskPermissionToCommit(
    stage: StageId,
    partition: PartitionId,
    attemptNumber: TaskAttemptNumber): Boolean = synchronized {
  authorizedCommittersByStage.get(stage) match {
    case Some(authorizedCommitters) =>
      authorizedCommitters(partition) match {
        case NO_AUTHORIZED_COMMITTER =>
          logDebug(s"Authorizing attemptNumber=$attemptNumber to commit for stage=$stage, " + s"partition=$partition")
          authorizedCommitters(partition) = attemptNumber
          true
        case existingCommitter =>
          logDebug(s"Denying attemptNumber=$attemptNumber to commit for stage=$stage, " +
            s"partition=$partition; existingCommitter = $existingCommitter")
          false
      }
    case None =>
      logDebug(s"Stage $stage has completed, so not allowing attempt number $attemptNumber of" + s"partition $partition to commit")
      false
  }
}

根据代码,其实现步骤如下:

  • 1)从authorizedCommittersByStage缓存中找到给定Stage的指定分区的TaskAttemptNumber(类型为Int)
  • 2)如果第1步获取的TaskAttemptNumber等于NO_AUTHORIZED)COMMITTER,则说明当前是首次提交给定Stage的指定分区的输出,因此按照第一提交者(first committer wins)策略,给定TaskAttemptNumber(即attemptNumber)有权限将给定Stage的指定分区的输出提交到HDFS。为了告诉后来的任务尝试“已经处理过”,还需要将给定分区的索引与attempNumber的关系保存到TaskAttemptNumber数组中
  • 3)如果第1步获取的TaskAttemptNumber不等于NO_AUTHORIZED_COMMITTER,则说明之前已经有任务尝试将给定Stage的指定分区的输出提交到HDFS,因此按照第一提交者胜利(first committer wins)策略,给定TaskAttempNumber(即attempNumber)没有权限将给定Stage的指定分区的输出提交到HDFS

2.2 isEmpty

用于判断authorizedCommittersByStage是否为空

//org.apache.spark.scheduler.OutputCommitCoordinator
def isEmpty: Boolean = {
  authorizedCommittersByStage.isEmpty
}

2.3 canCommit

用于向OutputCommitCoordinatorEndpoint发送AskPermissionToCommitOutput,并根据OutputCommitCoordinatorEndpoint的响应确认是否有权限将Stage的指定分区的输出提交到HDFS上

//org.apache.spark.scheduler.OutputCommitCoordinator
def canCommit(
    stage: StageId,
    partition: PartitionId,
    attemptNumber: TaskAttemptNumber): Boolean = {
  val msg = AskPermissionToCommitOutput(stage, partition, attemptNumber)
  coordinatorRef match {
    case Some(endpointRef) =>
        //询问是否有权限将Stage的指定分区的输出提交到HDFS上
        endpointRef.askWithRetry[Boolean](msg)
    case None =>
      logError(
        "canCommit called after coordinator was stopped (is SparkEnv shutdown in progress)?")
      false
  }
}

2.4 stageStart

stageStart方法用于启动Stage的输出提交到HDFS的协调机制,其实质为创建给定Stage的对应TaskAttempNumber数组,并将TaskAttemptNumber数组中的所有TaskAttemptNumber置为NO_AUTHORIZED_COMMITTER

//org.apache.spark.scheduler.OutputCommitCoordinator
private[scheduler] def stageStart(
    stage: StageId,
    maxPartitionId: Int): Unit = {
  val arr = new Array[TaskAttemptNumber](maxPartitionId + 1)
  java.util.Arrays.fill(arr, NO_AUTHORIZED_COMMITTER)
  synchronized {
    authorizedCommittersByStage(stage) = arr
  }
}

2.5 stageEnd

用于停止给定Stage的输出提交到HDFS的协调机制,其实质为将给定Stage及对应的TaskAttemptNumber数组从authorizedCommittersByStage中删除

//org.apache.spark.scheduler.OutputCommitCoordinator
private[scheduler] def stageEnd(stage: StageId): Unit = synchronized {
  authorizedCommittersByStage.remove(stage)
}

2.6 taskCompleted

给定Stage的指定分区的任务执行完成后将调用taskCompleted方法

//org.apache.spark.scheduler.OutputCommitCoordinator
private[scheduler] def taskCompleted(
    stage: StageId,
    partition: PartitionId,
    attemptNumber: TaskAttemptNumber,
    reason: TaskEndReason): Unit = synchronized {
  val authorizedCommitters = authorizedCommittersByStage.getOrElse(stage, {
    logDebug(s"Ignoring task completion for completed stage")
    return
  })
  reason match {
    case Success =>
    case denied: TaskCommitDenied =>
      logInfo(s"Task was denied committing, stage: $stage, partition: $partition, " + s"attempt: $attemptNumber")
    case otherReason =>
      if (authorizedCommitters(partition) == attemptNumber) {
        logDebug(s"Authorized committer (attemptNumber=$attemptNumber, stage=$stage, " + s"partition=$partition) failed; clearing lock")
        authorizedCommitters(partition) = NO_AUTHORIZED_COMMITTER
      }
  }
}

根据代码清单,有三种情况可视为任务完成:

  • 任务执行成功:此时reason等于特质TaskEndReason的子类Success
  • 任务提交被拒绝:此时reason等于特质TaskEndReason的子类TaskCommitDenied。
  • 其它原因:此时readson等于特质TaskEndReason的其它子类。针对这种原因,需要将给定Stage的对应TaskAttemptNumber数组中指定分区的值修改为NO_AUTHORIZED_COMMITTER,以便之后的任务尝试能够有权限提交

2.7 stop

stop方法通过向OutputCommitCoordinatorEndpoint发送StopCoordinator消息以停止OutputCommitCoordinatorEndpoint,然后清空authorizedCommittersByStage。

//org.apache.spark.scheduler.OutputCommitCoordinator
def stop(): Unit = synchronized {
  if (isDriver) {
    coordinatorRef.foreach(_ send StopCoordinator)
    coordinatorRef = None
    authorizedCommittersByStage.clear()
  }
}

3 OutputCommitCoordinator的工作原理

经过对OutputCommitCoordinatorEndpoint和OutputCommitCoordinator的详细介绍,OutputCommitCoordinator决定任务是否可以提交输出到HDFS的工作原理可以用下图总结说明:

authorizedCommittersByStage缓存了每一个Stage及其分区的内存结构,S0、S1及SN代表不同的Stage,P0、P1则代表Stage的各个分区,Pn、Pm说明每个Stage内的分区数量不同。图中每个序号的含义如下:

  • ①:表示OutputCommitCoordinatorEndpoint收到StopCoordinator消息,OutputCommitCoordinatorEndpoint将调用父类RpcEndpoint的stop方法,RpcEndpoint的stop方法实际又调用了NettyRpcEnv的stop方法,停止OutputCommitCoordinatorEndpoint的工作
  • ②:表示OutputCommitCoordinatorEndpoint在接收到AskPermissionToCommitOutput消息后,调用OutputCommitCoordinator的handleAskPermissionToCommit方法判断给定的任务尝试是否有权限将给定Stage的指定分区的数据提交到HDFS
  • ③:表示AskPermissionToCommitOutput消息携带的Stage为S0,分区为Pn,任务尝试号为1,handleAskPermissionToCommit方法执行时发现阶段S0的分区Pn还未有任务尝试占用(即值为-1),则允许当前任务尝试将阶段S0的分区Pn的数据提交到HDFS并且将Pn的值设置为1
  • ④:表示AskPermissionToCommitOutput消息携带的Stage为S1,分区为Pn,任务尝试号为11,handleAskPermissionToCommit方法执行时发现阶段S1的分区Pm已经被其它任务尝试占用(占用此分区的任务尝试号为10),则不允许当前任务尝试将阶段S1的分区Pm的数据提交到HDFS

 

 

 

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值