CheckpointCoordinator触发执行Checkpoint流程


CheckpointCoordinator组件是Checkpoint的协调者,负责触发Source节点的的Checkpoint(定时发送Barrier事件)、对整个作用的Checkpoint的管理。

通过“支持周期定时调度执行”的线程池,定时调度执行Source节点的Checkpoint(即Source节点向下游发送Barrier事件)。Checkpoint执行完成后,TaskManager会返回ACK消息给CheckpointCoordinator。

CheckpointCoordinator是由CheckpointCoordinatorDeActivator监控启动的,CheckpointCoordinatorDeActivator的本质就是一个可以监听作业状态的JobStatusListener(监听器)。当作业的JobStatus处于RUNNING时,会被CheckpointCoordinatorDeActivator监听到,从而启动CheckpointCoordinator组件。

/**
 * 监听JobStatus为Running时,启动CheckpointCoordinator组件
 */
public class CheckpointCoordinatorDeActivator implements JobStatusListener {

    private final CheckpointCoordinator coordinator;

    public CheckpointCoordinatorDeActivator(CheckpointCoordinator coordinator) {
        this.coordinator = checkNotNull(coordinator);
    }

    /**
	 * 专门监听作业状态的监听函数:启动 or 停止Checkpoint
	 */
    @Override
    public void jobStatusChanges(JobID jobId, JobStatus newJobStatus, long timestamp, Throwable error) {
        if (newJobStatus == JobStatus.RUNNING) {
            // Job的状态为RUNNING,启动CheckpointCoordinator:协调管理Checkpoint的触发
            coordinator.startCheckpointScheduler();
        } else {
            // 停止Job中的Checkpoint操作
            coordinator.stopCheckpointScheduler();
        }
    }
}

CheckpointCoordinator触发Checkpoint,本质就是向**“支持周期性调度执行”的线程池**中添加一个Runnable任务

/**
 * CheckpointCoordinator开始触发执行Checkpoint:本质就是向线程池中添加一个定时调度执行的Runnable任务
 */
public void startCheckpointScheduler() {
    synchronized (lock) {
        if (shutdown) {
            throw new IllegalArgumentException("Checkpoint coordinator is shut down");
        }

        // 保险起见,停止Checkpoint操作,确保以前注册的Timer都已经被注销了
        stopCheckpointScheduler();

        periodicScheduling = true;
        // 往创建好的checkpointCoordinatorTimer线程池中,添加定时调度执行的Runnable任务
        currentPeriodicTrigger = scheduleTriggerWithDelay(getRandomInitDelay());
    }
}


/**
 * Checkpoint的定时调度
 * 		CheckpointCoordinator组件之所以能够周期性的(按照配置好的时间间隔)执行Checkpoint,
 * 		就是利用ScheduledExecutor(支持周期性调度执行的线程池)能够周期性调度执行Runnable。
 */
private ScheduledFuture<?> scheduleTriggerWithDelay(long initDelay) {
    // 周期性的执行线程池内的Runnable任务
    return timer.scheduleAtFixedRate(
        // ScheduledTrigger是个Runnable任务
        new ScheduledTrigger(),
        // baseInterval参数:执行Checkpoint的间隔时间
        initDelay, baseInterval, TimeUnit.MILLISECONDS);
}

而Checkpoint的触发操作,就是在这个Runnable任务中实现的。

/**
 * 由CheckpointCoordinator负责定时调度执行的Runnable任务(执行Checkpoint操作)
 */
private final class ScheduledTrigger implements Runnable {

    @Override
    public void run() {
        try {
            // 执行Checkpoint操作
            triggerCheckpoint(System.currentTimeMillis(), true);
        }
        catch (Exception e) {
            LOG.error("Exception while triggering checkpoint for job {}.", job, e);
        }
    }
}

Runnable任务中,调用的是CheckpointCoordinator提供的“触发Checkpoint执行”的方法。CheckpointCoordinator触发执行Checkpoint操作,分为3个阶段:

  • 1.前期检查,准备好能够容纳(由CheckpointCoordinator负责触发执行Checkpoint操作的所有Source节点所对应的Execution的)Execution[]数组需要向CheckpointCoordinator汇报ACK消息的所有节点的ExecutionVertex集合
  • 2.创建PendingCheckpoint(为了确保Checkpoint能顺利执行,从开始执行Checkpoint,直到返回ACK,Checkpoint会一直处于Pending状态),并准备好专门用于清理过期Checkpoint操作的Runnable任务
  • 3.触发执行Checkpoint操作:在获取CheckpointCoordinator对象锁的前提下,使用ScheduledExecutor调度执行“专门用来清理过期的Checkpoint操作”的Runnable任务,并遍历每个(会被CheckpointCoordinator触发执行Checkpoint的)Source节点所对应的每个Execution,使其以同步 or 异步的方式执行Checkpoint操作

阶段1:前期检查

该阶段就是准备好触发执行Checkpoint的Source节点对应的Execution数组和向CheckpointCoordinator汇报ACK消息的ExecutionVertex集合

/**
 * 阶段1:前期检查
 *     构建容纳(由CheckpointCoordinator负责触发执行Checkpoint操作的所有Source节点所对应的Execution的)Execution[]数组
 *     构建所有需要向CheckpointCoordinator汇报ACK消息的所有节点的ExecutionVertex集合
 */
synchronized (lock) {
   // 当前CheckpointCoordinator状态是否为shutdown、Checkpoint次数是否超过配置的最大值
   preCheckBeforeTriggeringCheckpoint(isPeriodic, props.forceCheckpoint());
}

// 构建Execution[]数组:容纳了所有需要执行Checkpoint的Source节点(CheckpointCoordinator仅会触发Source节点的Checkpoint操作,其余节点由Barrier对齐触发)
Execution[] executions = new Execution[tasksToTrigger.length];
for (int i = 0; i < tasksToTrigger.length; i++) {
   // 需要执行Checkpoint的Task所对应的Execution
   Execution ee = tasksToTrigger[i].getCurrentExecutionAttempt();
   // Task对应的Execution集合为null,说明Task没有执行,此时得抛异常
   if (ee == null) {
      LOG.info("Checkpoint triggering task {} of job {} is not being executed at the moment. Aborting checkpoint.",
            tasksToTrigger[i].getTaskNameWithSubtaskIndex(),
            job);
      throw new CheckpointException(CheckpointFailureReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
   } else if (ee.getState() == ExecutionState.RUNNING) {
      // 如果这个Execution的状态为RUNNING,就添加到Execution数组中。否则就抛出CheckpointException
      executions[i] = ee;
   } else {
      LOG.info("Checkpoint triggering task {} of job {} is not in state {} but {} instead. Aborting checkpoint.",
            tasksToTrigger[i].getTaskNameWithSubtaskIndex(),
            job,
            ExecutionState.RUNNING,
            ee.getState());
      throw new CheckpointException(CheckpointFailureReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
   }
}

// 构建(所有节点)需要向CheckpointCoordinator发送ACK消息的集合:容纳ExecutionGraph中的所有ExecutionVertex
Map<ExecutionAttemptID, ExecutionVertex> ackTasks = new HashMap<>(tasksToWaitFor.length);

// tasksToWaitFor集合保存了ExecutionGraph中所有的ExecutionVertex。
// 也就是说每个ExecutionVertex节点对应的Task实例,都得向CheckpointCoordinator组件汇报ACK消息
for (ExecutionVertex ev : tasksToWaitFor) {
   // 每个节点的Task所对应的Execution
   Execution ee = ev.getCurrentExecutionAttempt();
   if (ee != null) {
      // 将Execution保存到Map集合中
      ackTasks.put(ee.getAttemptId(), ev);
   } else {
      LOG.info("Checkpoint acknowledging task {} of job {} is not being executed at the moment. Aborting checkpoint.",
            ev.getTaskNameWithSubtaskIndex(),
            job);
      throw new CheckpointException(CheckpointFailureReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
   }
}

阶段2:构建PendingCheckpoint

从开始执行Checkpoint操作到所有节点返回ACK消息,Checkpoint会处于Pending状态

/**
 * 阶段2:创建PendingCheckpoint(为了确保Checkpoint能顺利执行,从开始执行Checkpoint,直到返回ACK,Checkpoint会一直处于Pending状态)
 */

// Checkpoint过程中,状态数据的存放位置
final CheckpointStorageLocation checkpointStorageLocation;
// 每个Checkpoint都有唯一的checkpointID作为标记
final long checkpointID;

try {
   // this must happen outside the coordinator-wide lock, because it communicates
   // with external services (in HA mode) and may block for a while.
   // 每个Checkpoint都有唯一的checkpointID
   checkpointID = checkpointIdCounter.getAndIncrement();

   // 确定Checkpoint过程中,状态快照数据的存储位置
   checkpointStorageLocation = props.isSavepoint() ?
         checkpointStorage.initializeLocationForSavepoint(checkpointID, externalSavepointLocation) :
         checkpointStorage.initializeLocationForCheckpoint(checkpointID);
}
catch (Throwable t) {
   int numUnsuccessful = numUnsuccessfulCheckpointsTriggers.incrementAndGet();
   LOG.warn("Failed to trigger checkpoint for job {} ({} consecutive failed attempts so far).",
         job,
         numUnsuccessful,
         t);
   throw new CheckpointException(CheckpointFailureReason.EXCEPTION, t);
}

// 初始化PendingCheckpoint,它会被存储到Map集合中
final PendingCheckpoint checkpoint = new PendingCheckpoint(
   job,
   checkpointID,
   timestamp,
   ackTasks,
   masterHooks.keySet(),
   props,
   checkpointStorageLocation,
   executor);

阶段3:触发执行Checkpoint

遍历所有的Source节点所对应的Execution,以同步 or 异步的方式执行Source节点的Checkpoint操作

/**
 * 阶段3:Checkpoint的触发、执行
 *        只有当任务的执行时间来临,ScheduledExecutor才会真正启动一个线程去执行Runnable任务,其余时间ScheduledExecutor都是处于轮询任务的状态
 */
// 获取CheckpointCoordinator对象锁
synchronized (lock) {
   // 预检查:CheckpointCoordinator状态和PendingCheckpoint的尝试次数
   preCheckBeforeTriggeringCheckpoint(isPeriodic, props.forceCheckpoint());

   LOG.info("Triggering checkpoint {} @ {} for job {}.", checkpointID, timestamp, job);

   // 将创建好的PendingCheckpoint存到Map集合中
   pendingCheckpoints.put(checkpointID, checkpoint);

   // 使用ScheduledExecutor定时调度执行“专门用来清理过期的Checkpoint操作”的Runnable任务
   ScheduledFuture<?> cancellerHandle = timer.schedule(
         canceller,
         checkpointTimeout, TimeUnit.MILLISECONDS);

   if (!checkpoint.setCancellerHandle(cancellerHandle)) {
      // checkpoint is already disposed!
      // Checkpoint已经被释放
      cancellerHandle.cancel(false);
   }

   // TODO, asynchronously snapshots master hook without waiting here
   for (MasterTriggerRestoreHook<?> masterHook : masterHooks.values()) {
      final MasterState masterState =
         MasterHooks.triggerHook(masterHook, checkpointID, timestamp, executor)
            .get(checkpointTimeout, TimeUnit.MILLISECONDS);
      checkpoint.acknowledgeMasterState(masterHook.getIdentifier(), masterState);
   }
   Preconditions.checkState(checkpoint.areMasterStatesFullyAcknowledged());
}
// end of lock scope

final CheckpointOptions checkpointOptions = new CheckpointOptions(
      props.getCheckpointType(),
      checkpointStorageLocation.getLocationReference());

// send the messages to the tasks that trigger their checkpoint
/**
 * 核心:遍历执行Execution[]集合中的(由CheckpointCoordinator负责触发的Source节点所对应的)Execution节点,执行Checkpoint操作
 */
for (Execution execution: executions) {
   if (props.isSynchronous()) {
      // 同步执行Checkpoint操作
      execution.triggerSynchronousSavepoint(checkpointID, timestamp, checkpointOptions, advanceToEndOfTime);
   } else {
      // 异步执行Checkpoint操作
      execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);
   }
}

numUnsuccessfulCheckpointsTriggers.set(0);
// 返回Checkpoint的完成Future
return checkpoint.getCompletionFuture();

具体的执行过程由Source节点对应的Execution负责完成

/**
 * 异步执行Checkpoint操作
 */
public void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
    // 完成Execution对应的Source节点的Checkpoint操作,并通过Task实例触发数据源节点的Checkpoint操作
    triggerCheckpointHelper(checkpointId, timestamp, checkpointOptions, false);
}


/**
 * 由Execution对应的Task,触发数据源节点的Checkpoint操作
 */
private void triggerCheckpointHelper(long checkpointId, long timestamp, CheckpointOptions checkpointOptions, boolean advanceToEndOfEventTime) {

    final CheckpointType checkpointType = checkpointOptions.getCheckpointType();
    if (advanceToEndOfEventTime && !(checkpointType.isSynchronous() && checkpointType.isSavepoint())) {
        throw new IllegalArgumentException("Only synchronous savepoints are allowed to advance the watermark to MAX.");
    }

    // 获取当前Execution分配的LogicalSlot资源
    final LogicalSlot slot = assignedResource;

    // LogicalSlot不为null,说明Execution已经成功分配到Slot资源;
    // 否则,说明Execution没有执行所需要的Slot资源,Execution对应的Task实例也不会被启动执行
    if (slot != null) {
        // 通过LogicalSlot,获取到它所在的TaskManager的TaskManagerGateway
        final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();

        // 基于TaskManagerGateway提供的RPC方法,触发、执行指定Task的Checkpoint操作
        // 当TaskExecutor收到“触发Checkpoint”的请求后,会在TaskExecutor实例内完成对应Task实例的Checkpoint操作
        taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions, advanceToEndOfEventTime);
    } else {
        LOG.debug("The execution has no slot assigned. This indicates that the execution is no longer running.");
    }
}

首先要判断当前Execution是否正常分配了LogicalSlot资源,只有Execution拥有对应分配的Slot资源,才有资格正常执行Checkpoint。一旦Execution拥有对应分配的Slot资源,就会通过TaskManagerGateway,让TaskExecutor执行具体的Checkpoint操作

/**
 * 通知TaskExecutor触发执行Checkpoint
 */
@Override
public void triggerCheckpoint(ExecutionAttemptID executionAttemptID, JobID jobId, long checkpointId, long timestamp, CheckpointOptions checkpointOptions, boolean advanceToEndOfEventTime) {
    // 通过TaskExecutorGateway,通知TaskExecutor执行Checkpoint操作
    taskExecutorGateway.triggerCheckpoint(
        executionAttemptID,
        checkpointId,
        timestamp,
        checkpointOptions,
        advanceToEndOfEventTime);
}

TaskExecutor收到Execution发送的“触发执行Checkpoint”请求后,会根据“尝试执行Execution的唯一标识”,从TaskSlotTable中取出对应的Task实例,让Task执行具体的Checkpoint操作。如果Task异常,那就返回CheckpointException给CheckpointCoordinator。

/**
 * Execution通过RPC远程调用,将CheckpointCoordinator的“触发执行Checkpoint”的请求通知给TaskExecutor。
 * TaskExecutor收到后,根据Execution信息确认Task实例线程,并调用Task实例来触发执行Source节点的Checkpoint操作
 * 如果Task异常,那就返回CheckpointException给CheckpointCoordinator。
 */
@Override
public CompletableFuture<Acknowledge> triggerCheckpoint(
    ExecutionAttemptID executionAttemptID,
    long checkpointId,
    long checkpointTimestamp,
    CheckpointOptions checkpointOptions,
    boolean advanceToEndOfEventTime) {
    log.debug("Trigger checkpoint {}@{} for {}.", checkpointId, checkpointTimestamp, executionAttemptID);

    // CheckpointType有3种类型:CHECKPOINT(完整或增量的),SAVEPOINT,SYNC_SAVEPOINT(可以调整Watermark的MAX)
    final CheckpointType checkpointType = checkpointOptions.getCheckpointType();
    if (advanceToEndOfEventTime && !(checkpointType.isSynchronous() && checkpointType.isSavepoint())) {
        throw new IllegalArgumentException("Only synchronous savepoints are allowed to advance the watermark to MAX.");
    }

    // 从TaskSlotTable(维护了多个索引,能更快的访问Task和已分配的Slot)中,获取Execution对应的Task实例
    final Task task = taskSlotTable.getTask(executionAttemptID);

    /**
	 * 执行Checkpoint的核心逻辑
	 */
    if (task != null) {
        // Task实例不为null,则可以执行Task实例中的Checkpoint操作
        task.triggerCheckpointBarrier(checkpointId, checkpointTimestamp, checkpointOptions, advanceToEndOfEventTime);

        return CompletableFuture.completedFuture(Acknowledge.get());
    } else {
        // Task为null,说明Task目前正处于异常,无法执行Checkpoint操作。
        final String message = "TaskManager received a checkpoint request for unknown task " + executionAttemptID + '.';

        log.debug(message);
        // 无法执行Checkpoint,就封装CheckpointException交给CheckpointCoordinator处理
        return FutureUtils.completedExceptionally(new CheckpointException(message, CheckpointFailureReason.TASK_CHECKPOINT_FAILURE));
    }
}

最终,由StreamTask提供“触发执行Checkpoint”操作的具体实现:将“触发执行Checkpoint操作”包装成Mail,由MailboxExecutor提交到Mailbox中,等到MailboxProcessor执行。

/**
 * 触发执行StreamTask中的Checkpoint操作:异步的通过MailboxExecutor,将“执行Checkpoint”的请求封装成Mail后,
 * 提交到TaskMailbox中,最终由MailboxProcessor来处理
 */
@Override
public Future<Boolean> triggerCheckpointAsync(
      CheckpointMetaData checkpointMetaData,
      CheckpointOptions checkpointOptions,
      boolean advanceToEndOfEventTime) {

   // 通过MailboxExecutor,将“触发执行Checkpoint”的具体逻辑封装成Mail,提交到Mailbox中,后期会被MailboxProcessor执行
   return mailboxProcessor.getMainMailboxExecutor().submit(
         // 触发Checkpoint的具体逻辑
         () -> triggerCheckpoint(checkpointMetaData, checkpointOptions, advanceToEndOfEventTime),
         "checkpoint %s with %s",
      checkpointMetaData,
      checkpointOptions);
}

如果Checkpoint执行成功,就返回Success。一旦失败,那就让“艾克开大”。

逻辑调用链如下:

# JobStatusListener监听到了作业的状态为RUNNING
					↓
# 由CheckpointCoordinator负责触发执行Checkpoint操作
					↓
# 本质:向“支持周期性调度执行”的线程池中添加Runnable任务,Runnable内定义了触发Checkpoint的具体操作
					↓
# Runnable任务会调用CheckpointCoordinator提供的“触发Checkpoint”的方法
					↓
# 让Execution以同步 or 异步的方式,执行Checkpoint操作
					↓
# 这个Execution分配的Slot资源,所位于的TaskExecutor(根据TaskManagerGateway、TaskExecutorGateway)
					↓
# 让Execution对应的Task实例,执行具体的Checkpoint操作
					↓
# StreamTask定义了“触发执行Checkpoint”的具体逻辑,MailboxExecutor将其提交给到MailboxProcessor执行
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值