CheckpointCoordinator组件是Checkpoint的协调者,负责触发Source节点的的Checkpoint(定时发送Barrier事件)、对整个作用的Checkpoint的管理。
通过“支持周期定时调度执行”的线程池,定时调度执行Source节点的Checkpoint(即Source节点向下游发送Barrier事件)。Checkpoint执行完成后,TaskManager会返回ACK消息给CheckpointCoordinator。
CheckpointCoordinator是由CheckpointCoordinatorDeActivator监控启动的,CheckpointCoordinatorDeActivator的本质就是一个可以监听作业状态的JobStatusListener(监听器)。当作业的JobStatus处于RUNNING时,会被CheckpointCoordinatorDeActivator监听到,从而启动CheckpointCoordinator组件。
/**
* 监听JobStatus为Running时,启动CheckpointCoordinator组件
*/
public class CheckpointCoordinatorDeActivator implements JobStatusListener {
private final CheckpointCoordinator coordinator;
public CheckpointCoordinatorDeActivator(CheckpointCoordinator coordinator) {
this.coordinator = checkNotNull(coordinator);
}
/**
* 专门监听作业状态的监听函数:启动 or 停止Checkpoint
*/
@Override
public void jobStatusChanges(JobID jobId, JobStatus newJobStatus, long timestamp, Throwable error) {
if (newJobStatus == JobStatus.RUNNING) {
// Job的状态为RUNNING,启动CheckpointCoordinator:协调管理Checkpoint的触发
coordinator.startCheckpointScheduler();
} else {
// 停止Job中的Checkpoint操作
coordinator.stopCheckpointScheduler();
}
}
}
CheckpointCoordinator触发Checkpoint,本质就是向**“支持周期性调度执行”的线程池**中添加一个Runnable任务
/**
* CheckpointCoordinator开始触发执行Checkpoint:本质就是向线程池中添加一个定时调度执行的Runnable任务
*/
public void startCheckpointScheduler() {
synchronized (lock) {
if (shutdown) {
throw new IllegalArgumentException("Checkpoint coordinator is shut down");
}
// 保险起见,停止Checkpoint操作,确保以前注册的Timer都已经被注销了
stopCheckpointScheduler();
periodicScheduling = true;
// 往创建好的checkpointCoordinatorTimer线程池中,添加定时调度执行的Runnable任务
currentPeriodicTrigger = scheduleTriggerWithDelay(getRandomInitDelay());
}
}
/**
* Checkpoint的定时调度
* CheckpointCoordinator组件之所以能够周期性的(按照配置好的时间间隔)执行Checkpoint,
* 就是利用ScheduledExecutor(支持周期性调度执行的线程池)能够周期性调度执行Runnable。
*/
private ScheduledFuture<?> scheduleTriggerWithDelay(long initDelay) {
// 周期性的执行线程池内的Runnable任务
return timer.scheduleAtFixedRate(
// ScheduledTrigger是个Runnable任务
new ScheduledTrigger(),
// baseInterval参数:执行Checkpoint的间隔时间
initDelay, baseInterval, TimeUnit.MILLISECONDS);
}
而Checkpoint的触发操作,就是在这个Runnable任务中实现的。
/**
* 由CheckpointCoordinator负责定时调度执行的Runnable任务(执行Checkpoint操作)
*/
private final class ScheduledTrigger implements Runnable {
@Override
public void run() {
try {
// 执行Checkpoint操作
triggerCheckpoint(System.currentTimeMillis(), true);
}
catch (Exception e) {
LOG.error("Exception while triggering checkpoint for job {}.", job, e);
}
}
}
Runnable任务中,调用的是CheckpointCoordinator提供的“触发Checkpoint执行”的方法。CheckpointCoordinator触发执行Checkpoint操作,分为3个阶段:
- 1.前期检查,准备好能够容纳(由CheckpointCoordinator负责触发执行Checkpoint操作的所有Source节点所对应的Execution的)Execution[]数组、需要向CheckpointCoordinator汇报ACK消息的所有节点的ExecutionVertex集合
- 2.创建PendingCheckpoint(为了确保Checkpoint能顺利执行,从开始执行Checkpoint,直到返回ACK,Checkpoint会一直处于Pending状态),并准备好专门用于清理过期Checkpoint操作的Runnable任务
- 3.触发执行Checkpoint操作:在获取CheckpointCoordinator对象锁的前提下,使用ScheduledExecutor调度执行“专门用来清理过期的Checkpoint操作”的Runnable任务,并遍历每个(会被CheckpointCoordinator触发执行Checkpoint的)Source节点所对应的每个Execution,使其以同步 or 异步的方式执行Checkpoint操作
阶段1:前期检查
该阶段就是准备好触发执行Checkpoint的Source节点对应的Execution数组和向CheckpointCoordinator汇报ACK消息的ExecutionVertex集合
/**
* 阶段1:前期检查
* 构建容纳(由CheckpointCoordinator负责触发执行Checkpoint操作的所有Source节点所对应的Execution的)Execution[]数组
* 构建所有需要向CheckpointCoordinator汇报ACK消息的所有节点的ExecutionVertex集合
*/
synchronized (lock) {
// 当前CheckpointCoordinator状态是否为shutdown、Checkpoint次数是否超过配置的最大值
preCheckBeforeTriggeringCheckpoint(isPeriodic, props.forceCheckpoint());
}
// 构建Execution[]数组:容纳了所有需要执行Checkpoint的Source节点(CheckpointCoordinator仅会触发Source节点的Checkpoint操作,其余节点由Barrier对齐触发)
Execution[] executions = new Execution[tasksToTrigger.length];
for (int i = 0; i < tasksToTrigger.length; i++) {
// 需要执行Checkpoint的Task所对应的Execution
Execution ee = tasksToTrigger[i].getCurrentExecutionAttempt();
// Task对应的Execution集合为null,说明Task没有执行,此时得抛异常
if (ee == null) {
LOG.info("Checkpoint triggering task {} of job {} is not being executed at the moment. Aborting checkpoint.",
tasksToTrigger[i].getTaskNameWithSubtaskIndex(),
job);
throw new CheckpointException(CheckpointFailureReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
} else if (ee.getState() == ExecutionState.RUNNING) {
// 如果这个Execution的状态为RUNNING,就添加到Execution数组中。否则就抛出CheckpointException
executions[i] = ee;
} else {
LOG.info("Checkpoint triggering task {} of job {} is not in state {} but {} instead. Aborting checkpoint.",
tasksToTrigger[i].getTaskNameWithSubtaskIndex(),
job,
ExecutionState.RUNNING,
ee.getState());
throw new CheckpointException(CheckpointFailureReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
}
}
// 构建(所有节点)需要向CheckpointCoordinator发送ACK消息的集合:容纳ExecutionGraph中的所有ExecutionVertex
Map<ExecutionAttemptID, ExecutionVertex> ackTasks = new HashMap<>(tasksToWaitFor.length);
// tasksToWaitFor集合保存了ExecutionGraph中所有的ExecutionVertex。
// 也就是说每个ExecutionVertex节点对应的Task实例,都得向CheckpointCoordinator组件汇报ACK消息
for (ExecutionVertex ev : tasksToWaitFor) {
// 每个节点的Task所对应的Execution
Execution ee = ev.getCurrentExecutionAttempt();
if (ee != null) {
// 将Execution保存到Map集合中
ackTasks.put(ee.getAttemptId(), ev);
} else {
LOG.info("Checkpoint acknowledging task {} of job {} is not being executed at the moment. Aborting checkpoint.",
ev.getTaskNameWithSubtaskIndex(),
job);
throw new CheckpointException(CheckpointFailureReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
}
}
阶段2:构建PendingCheckpoint
从开始执行Checkpoint操作到所有节点返回ACK消息,Checkpoint会处于Pending状态
/**
* 阶段2:创建PendingCheckpoint(为了确保Checkpoint能顺利执行,从开始执行Checkpoint,直到返回ACK,Checkpoint会一直处于Pending状态)
*/
// Checkpoint过程中,状态数据的存放位置
final CheckpointStorageLocation checkpointStorageLocation;
// 每个Checkpoint都有唯一的checkpointID作为标记
final long checkpointID;
try {
// this must happen outside the coordinator-wide lock, because it communicates
// with external services (in HA mode) and may block for a while.
// 每个Checkpoint都有唯一的checkpointID
checkpointID = checkpointIdCounter.getAndIncrement();
// 确定Checkpoint过程中,状态快照数据的存储位置
checkpointStorageLocation = props.isSavepoint() ?
checkpointStorage.initializeLocationForSavepoint(checkpointID, externalSavepointLocation) :
checkpointStorage.initializeLocationForCheckpoint(checkpointID);
}
catch (Throwable t) {
int numUnsuccessful = numUnsuccessfulCheckpointsTriggers.incrementAndGet();
LOG.warn("Failed to trigger checkpoint for job {} ({} consecutive failed attempts so far).",
job,
numUnsuccessful,
t);
throw new CheckpointException(CheckpointFailureReason.EXCEPTION, t);
}
// 初始化PendingCheckpoint,它会被存储到Map集合中
final PendingCheckpoint checkpoint = new PendingCheckpoint(
job,
checkpointID,
timestamp,
ackTasks,
masterHooks.keySet(),
props,
checkpointStorageLocation,
executor);
阶段3:触发执行Checkpoint
遍历所有的Source节点所对应的Execution,以同步 or 异步的方式执行Source节点的Checkpoint操作
/**
* 阶段3:Checkpoint的触发、执行
* 只有当任务的执行时间来临,ScheduledExecutor才会真正启动一个线程去执行Runnable任务,其余时间ScheduledExecutor都是处于轮询任务的状态
*/
// 获取CheckpointCoordinator对象锁
synchronized (lock) {
// 预检查:CheckpointCoordinator状态和PendingCheckpoint的尝试次数
preCheckBeforeTriggeringCheckpoint(isPeriodic, props.forceCheckpoint());
LOG.info("Triggering checkpoint {} @ {} for job {}.", checkpointID, timestamp, job);
// 将创建好的PendingCheckpoint存到Map集合中
pendingCheckpoints.put(checkpointID, checkpoint);
// 使用ScheduledExecutor定时调度执行“专门用来清理过期的Checkpoint操作”的Runnable任务
ScheduledFuture<?> cancellerHandle = timer.schedule(
canceller,
checkpointTimeout, TimeUnit.MILLISECONDS);
if (!checkpoint.setCancellerHandle(cancellerHandle)) {
// checkpoint is already disposed!
// Checkpoint已经被释放
cancellerHandle.cancel(false);
}
// TODO, asynchronously snapshots master hook without waiting here
for (MasterTriggerRestoreHook<?> masterHook : masterHooks.values()) {
final MasterState masterState =
MasterHooks.triggerHook(masterHook, checkpointID, timestamp, executor)
.get(checkpointTimeout, TimeUnit.MILLISECONDS);
checkpoint.acknowledgeMasterState(masterHook.getIdentifier(), masterState);
}
Preconditions.checkState(checkpoint.areMasterStatesFullyAcknowledged());
}
// end of lock scope
final CheckpointOptions checkpointOptions = new CheckpointOptions(
props.getCheckpointType(),
checkpointStorageLocation.getLocationReference());
// send the messages to the tasks that trigger their checkpoint
/**
* 核心:遍历执行Execution[]集合中的(由CheckpointCoordinator负责触发的Source节点所对应的)Execution节点,执行Checkpoint操作
*/
for (Execution execution: executions) {
if (props.isSynchronous()) {
// 同步执行Checkpoint操作
execution.triggerSynchronousSavepoint(checkpointID, timestamp, checkpointOptions, advanceToEndOfTime);
} else {
// 异步执行Checkpoint操作
execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);
}
}
numUnsuccessfulCheckpointsTriggers.set(0);
// 返回Checkpoint的完成Future
return checkpoint.getCompletionFuture();
具体的执行过程由Source节点对应的Execution负责完成
/**
* 异步执行Checkpoint操作
*/
public void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
// 完成Execution对应的Source节点的Checkpoint操作,并通过Task实例触发数据源节点的Checkpoint操作
triggerCheckpointHelper(checkpointId, timestamp, checkpointOptions, false);
}
/**
* 由Execution对应的Task,触发数据源节点的Checkpoint操作
*/
private void triggerCheckpointHelper(long checkpointId, long timestamp, CheckpointOptions checkpointOptions, boolean advanceToEndOfEventTime) {
final CheckpointType checkpointType = checkpointOptions.getCheckpointType();
if (advanceToEndOfEventTime && !(checkpointType.isSynchronous() && checkpointType.isSavepoint())) {
throw new IllegalArgumentException("Only synchronous savepoints are allowed to advance the watermark to MAX.");
}
// 获取当前Execution分配的LogicalSlot资源
final LogicalSlot slot = assignedResource;
// LogicalSlot不为null,说明Execution已经成功分配到Slot资源;
// 否则,说明Execution没有执行所需要的Slot资源,Execution对应的Task实例也不会被启动执行
if (slot != null) {
// 通过LogicalSlot,获取到它所在的TaskManager的TaskManagerGateway
final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();
// 基于TaskManagerGateway提供的RPC方法,触发、执行指定Task的Checkpoint操作
// 当TaskExecutor收到“触发Checkpoint”的请求后,会在TaskExecutor实例内完成对应Task实例的Checkpoint操作
taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions, advanceToEndOfEventTime);
} else {
LOG.debug("The execution has no slot assigned. This indicates that the execution is no longer running.");
}
}
首先要判断当前Execution是否正常分配了LogicalSlot资源,只有Execution拥有对应分配的Slot资源,才有资格正常执行Checkpoint。一旦Execution拥有对应分配的Slot资源,就会通过TaskManagerGateway,让TaskExecutor执行具体的Checkpoint操作
/**
* 通知TaskExecutor触发执行Checkpoint
*/
@Override
public void triggerCheckpoint(ExecutionAttemptID executionAttemptID, JobID jobId, long checkpointId, long timestamp, CheckpointOptions checkpointOptions, boolean advanceToEndOfEventTime) {
// 通过TaskExecutorGateway,通知TaskExecutor执行Checkpoint操作
taskExecutorGateway.triggerCheckpoint(
executionAttemptID,
checkpointId,
timestamp,
checkpointOptions,
advanceToEndOfEventTime);
}
TaskExecutor收到Execution发送的“触发执行Checkpoint”请求后,会根据“尝试执行Execution的唯一标识”,从TaskSlotTable中取出对应的Task实例,让Task执行具体的Checkpoint操作。如果Task异常,那就返回CheckpointException给CheckpointCoordinator。
/**
* Execution通过RPC远程调用,将CheckpointCoordinator的“触发执行Checkpoint”的请求通知给TaskExecutor。
* TaskExecutor收到后,根据Execution信息确认Task实例线程,并调用Task实例来触发执行Source节点的Checkpoint操作
* 如果Task异常,那就返回CheckpointException给CheckpointCoordinator。
*/
@Override
public CompletableFuture<Acknowledge> triggerCheckpoint(
ExecutionAttemptID executionAttemptID,
long checkpointId,
long checkpointTimestamp,
CheckpointOptions checkpointOptions,
boolean advanceToEndOfEventTime) {
log.debug("Trigger checkpoint {}@{} for {}.", checkpointId, checkpointTimestamp, executionAttemptID);
// CheckpointType有3种类型:CHECKPOINT(完整或增量的),SAVEPOINT,SYNC_SAVEPOINT(可以调整Watermark的MAX)
final CheckpointType checkpointType = checkpointOptions.getCheckpointType();
if (advanceToEndOfEventTime && !(checkpointType.isSynchronous() && checkpointType.isSavepoint())) {
throw new IllegalArgumentException("Only synchronous savepoints are allowed to advance the watermark to MAX.");
}
// 从TaskSlotTable(维护了多个索引,能更快的访问Task和已分配的Slot)中,获取Execution对应的Task实例
final Task task = taskSlotTable.getTask(executionAttemptID);
/**
* 执行Checkpoint的核心逻辑
*/
if (task != null) {
// Task实例不为null,则可以执行Task实例中的Checkpoint操作
task.triggerCheckpointBarrier(checkpointId, checkpointTimestamp, checkpointOptions, advanceToEndOfEventTime);
return CompletableFuture.completedFuture(Acknowledge.get());
} else {
// Task为null,说明Task目前正处于异常,无法执行Checkpoint操作。
final String message = "TaskManager received a checkpoint request for unknown task " + executionAttemptID + '.';
log.debug(message);
// 无法执行Checkpoint,就封装CheckpointException交给CheckpointCoordinator处理
return FutureUtils.completedExceptionally(new CheckpointException(message, CheckpointFailureReason.TASK_CHECKPOINT_FAILURE));
}
}
最终,由StreamTask提供“触发执行Checkpoint”操作的具体实现:将“触发执行Checkpoint操作”包装成Mail,由MailboxExecutor提交到Mailbox中,等到MailboxProcessor执行。
/**
* 触发执行StreamTask中的Checkpoint操作:异步的通过MailboxExecutor,将“执行Checkpoint”的请求封装成Mail后,
* 提交到TaskMailbox中,最终由MailboxProcessor来处理
*/
@Override
public Future<Boolean> triggerCheckpointAsync(
CheckpointMetaData checkpointMetaData,
CheckpointOptions checkpointOptions,
boolean advanceToEndOfEventTime) {
// 通过MailboxExecutor,将“触发执行Checkpoint”的具体逻辑封装成Mail,提交到Mailbox中,后期会被MailboxProcessor执行
return mailboxProcessor.getMainMailboxExecutor().submit(
// 触发Checkpoint的具体逻辑
() -> triggerCheckpoint(checkpointMetaData, checkpointOptions, advanceToEndOfEventTime),
"checkpoint %s with %s",
checkpointMetaData,
checkpointOptions);
}
如果Checkpoint执行成功,就返回Success。一旦失败,那就让“艾克开大”。
逻辑调用链如下:
# JobStatusListener监听到了作业的状态为RUNNING
↓
# 由CheckpointCoordinator负责触发执行Checkpoint操作
↓
# 本质:向“支持周期性调度执行”的线程池中添加Runnable任务,Runnable内定义了触发Checkpoint的具体操作
↓
# Runnable任务会调用CheckpointCoordinator提供的“触发Checkpoint”的方法
↓
# 让Execution以同步 or 异步的方式,执行Checkpoint操作
↓
# 这个Execution分配的Slot资源,所位于的TaskExecutor(根据TaskManagerGateway、TaskExecutorGateway)
↓
# 让Execution对应的Task实例,执行具体的Checkpoint操作
↓
# StreamTask定义了“触发执行Checkpoint”的具体逻辑,MailboxExecutor将其提交给到MailboxProcessor执行
3609

被折叠的 条评论
为什么被折叠?



