1.概述
一 .前言
在生成ExecutionGraph之后, Flink就可以根据ExecutionGraph生成具体的Task , 调度到TaskManager上开始执行.
二 .概念相关
调度器是 Flink 作业执行的核心组件,管理作业执行的所有相关过程,包括 JobGraph 到ExecutionGraph 的转换、作业生命周期管理(作业的发布、取消、停止)、作业的 Task 生命周期管理(Task 的发布、取消、停止)、资源申请与释放、作业和 Task 的 Failover 等。
调度有几个重要的组件:
- 调度器: SchedulerNG 及其子类、实现类
- 调度策略: SchedulingStrategy 及其实现类
- 调度模式: ScheduleMode 包含流和批的调度,有各自不同的调度模式
2.1. 调度器
调度器作用:
1)作业的生命周期管理,如作业的发布、挂起、取消
2)作业执行资源的申请、分配、释放
3)作业的状态管理,作业发布过程中的状态变化和作业异常时的 FailOver 等
4)作业的信息提供,对外提供作业的详细信息
实现类: DefaultScheduler
2.2. 调度策略
SchedulingStrategy是一个接口, 里面定义了四个方法:
名称 | 描述 |
---|---|
void startScheduling(); | 调度入口,触发调度器的调度行为 |
void restartTasks(Set verticesToRestart); | 重启执行失败的 Task,一般是 Task 执行异常导致 |
void onExecutionStateChange(ExecutionVertexID executionVertexId, ExecutionState executionState); | 当 Execution 改变状态时调用 |
void onPartitionConsumable(IntermediateResultPartitionID resultPartitionId); | 当 IntermediateResultPartition 中的数据可以消费时调用 |
SchedulingStrategy有三种实现:
EagerSchedulingStrategy
: 适用于流计算,同时调度所有的 taskLazyFromSourcesSchedulingStrategy
:适用于批处理,当输入数据准备好时(上游处理完)进行 vertices 调度。PipelinedRegionSchedulingStrategy
:以流水线的局部为粒度进行调度
PipelinedRegionSchedulingStrategy 是 1.11
加入的,从 1.12
开始,将以 pipelined region
为单位进行调度。
pipelined region
是一组流水线连接的任务。这意味着,对于包含多个 region的流作业,在开始部署任务之前,它不再等待所有任务获取 slot。取而代之的是,一旦任何region
获得了足够的任务 slot
就可以部署它。对于批处理作业,将不会为任务分配 slot,也不会单独部署任务。取而代之的是,一旦某个 region 获得了足够的 slot,则该任务将与所有其他任务一起部署在同一区域中
三 .代码浅析
3.1. 代码调用顺序
-> org.apache.flink.runtime.jobmaster#startJobExecution
-> org.apache.flink.runtime.jobmaster#resetAndStartScheduler
-> org.apache.flink.runtime.jobmaster#startScheduling
-> org.apache.flink.runtime.scheduler.DefaultScheduler#startSchedulingInternal
-> org.apache.flink.runtime.scheduler.strategy#startScheduling
-> org.apache.flink.runtime.scheduler.strategy#maybeScheduleRegions
-> org.apache.flink.runtime.scheduler.DefaultScheduler#allocateSlotsAndDeploy
-> org.apache.flink.runtime.scheduler.DefaultScheduler#waitForAllSlotsAndDeploy
-> org.apache.flink.runtime.scheduler.DefaultScheduler#deployAll
-> org.apache.flink.runtime.scheduler.DefaultScheduler#deployOrHandleError
-> org.apache.flink.runtime.scheduler.DefaultScheduler#deployTaskSafe
-> org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations#deploy
-> org.apache.flink.runtime.executiongraph.ExecutionVertex#deploy
-> org.apache.flink.runtime.executiongraph#deploy
-> org.apache.flink.runtime.jobmaster.RpcTaskManagerGateway.submitTask
-> org.apache.flink.runtime.taskexecutor.TaskExecutor#submitTask
3.2. JobMaster#startJobExecution
// ----------------------------------------------------------------------------------------------
// Internal methods
// ----------------------------------------------------------------------------------------------
// -- job 启动&停止
// -- job starting and stopping
// -----------------------------------------------------------------
private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {
validateRunsInMainThread();
checkNotNull(newJobMasterId, "The new JobMasterId must not be null.");
if (Objects.equals(getFencingToken(), newJobMasterId)) {
log.info("Already started the job execution with JobMasterId {}.", newJobMasterId);
return Acknowledge.get();
}
setNewFencingToken(newJobMasterId);
// 真正启动 JobMaster
startJobMasterServices();
// Starting execution of job
// Socket Window WordCount (694474d11da6100e82744c9e47e2f511)
// under job master id
// 00000000000000000000000000000000.
log.info(
"Starting execution of job {} ({}) under job master id {}.",
jobGraph.getName(),
jobGraph.getJobID(),
newJobMasterId);
// 重置 & 启动 Scheduler ...
resetAndStartScheduler();
return Acknowledge.get();
}
3.3. JobMaster#resetAndStartScheduler
// 启动Scheduler ??
private void resetAndStartScheduler() throws Exception {
validateRunsInMainThread();
final CompletableFuture<Void> schedulerAssignedFuture;
if (schedulerNG.requestJobStatus() == JobStatus.CREATED) {
schedulerAssignedFuture = CompletableFuture.completedFuture(null);
schedulerNG.setMainThreadExecutor(getMainThreadExecutor());
} else {
suspendAndClearSchedulerFields(
new FlinkException(
"ExecutionGraph is being reset in order to be rescheduled."));
final JobManagerJobMetricGroup newJobManagerJobMetricGroup =
jobMetricGroupFactory.create(jobGraph);
final SchedulerNG newScheduler =
createScheduler(executionDeploymentTracker, newJobManagerJobMetricGroup);
schedulerAssignedFuture =
schedulerNG
.getTerminationFuture()
.handle(
(ignored, throwable) -> {
newScheduler.setMainThreadExecutor(getMainThreadExecutor());
assignScheduler(newScheduler, newJobManagerJobMetricGroup);
return null;
});
}
// 启动调度 startScheduling
FutureUtils.assertNoException(schedulerAssignedFuture.thenRun(this::startScheduling));
}
3.4. JobMaster#startScheduling
private void startScheduling() {
checkState(jobStatusListener == null);
// register self as job status change listener
jobStatusListener = new JobManagerJobStatusListener();
schedulerNG.registerJobStatusListener(jobStatusListener);
// 开始调度 ???
schedulerNG.startScheduling();
}
3.5. DefaultScheduler#startSchedulingInternal
@Override
protected void startSchedulingInternal() {
log.info(
"Starting scheduling with scheduling strategy [{}]",
schedulingStrategy.getClass().getName());
prepareExecutionGraphForNgScheduling();
// 默认调度策略 : PipelinedRegion SchedulingStrategy
// PipelinedRegionSchedulingStrategy#startScheduling
schedulingStrategy.startScheduling();
}
3.6. strategy#startScheduling
@Override
public void startScheduling() {
final Set<SchedulingPipelinedRegion> sourceRegions =
IterableUtils.toStream(schedulingTopology.getAllPipelinedRegions())
.filter(region -> !region.getConsumedResults().iterator().hasNext())
.collect(Collectors.toSet());
// 这里...
maybeScheduleRegions(sourceRegions);
}
3.7. strategy#maybeScheduleRegions
private void maybeScheduleRegions(final Set<SchedulingPipelinedRegion> regions) {
final List<SchedulingPipelinedRegion> regionsSorted =
SchedulingStrategyUtils.sortPipelinedRegionsInTopologicalOrder(
schedulingTopology, regions);
for (SchedulingPipelinedRegion region : regionsSorted) {
// 继续...
maybeScheduleRegion(region);
}
}
private void maybeScheduleRegion(final SchedulingPipelinedRegion region) {
if (!areRegionInputsAllConsumable(region)) {
return;
}
checkState(
areRegionVerticesAllInCreatedState(region),
"BUG: trying to schedule a region which is not in CREATED state");
final List<ExecutionVertexDeploymentOption> vertexDeploymentOptions =
SchedulingStrategyUtils.createExecutionVertexDeploymentOptions(
regionVerticesSorted.get(region), id -> deploymentOption);
// 这里...
// DefaultScheduler # allocateSlotsAndDeploy
schedulerOperations.allocateSlotsAndDeploy(vertexDeploymentOptions);
}
3.8. DefaultScheduler#allocateSlotsAndDeploy
// ------------------------------------------------------------------------
// SchedulerOperations
// ------------------------------------------------------------------------
@Override
public void allocateSlotsAndDeploy(
final List<ExecutionVertexDeploymentOption> executionVertexDeploymentOptions) {
validateDeploymentOptions(executionVertexDeploymentOptions);
// {ExecutionVertexID@7434} "bc764cd8ddf7a0cff126f51c16239658_0" -> {ExecutionVertexDeploymentOption@7484}
// {ExecutionVertexID@7453} "ea632d67b7d595e5b851708ae9ad79d6_1" -> {ExecutionVertexDeploymentOption@7485}
// {ExecutionVertexID@7445} "0a448493b4782967b150582570326227_2" -> {ExecutionVertexDeploymentOption@7486}
// {ExecutionVertexID@7451} "ea632d67b7d595e5b851708ae9ad79d6_0" -> {ExecutionVertexDeploymentOption@7487}
// {ExecutionVertexID@7448} "0a448493b4782967b150582570326227_3" -> {ExecutionVertexDeploymentOption@7488}
// {ExecutionVertexID@7460} "6d2677a0ecc3fd8df0b72ec675edf8f4_0" -> {ExecutionVertexDeploymentOption@7489}
// {ExecutionVertexID@7457} "ea632d67b7d595e5b851708ae9ad79d6_3" -> {ExecutionVertexDeploymentOption@7490}
// {ExecutionVertexID@7441} "0a448493b4782967b150582570326227_0" -> {ExecutionVertexDeploymentOption@7491}
// {ExecutionVertexID@7455} "ea632d67b7d595e5b851708ae9ad79d6_2" -> {ExecutionVertexDeploymentOption@7492}
// {ExecutionVertexID@7443} "0a448493b4782967b150582570326227_1" -> {ExecutionVertexDeploymentOption@7493}
final Map<ExecutionVertexID, ExecutionVertexDeploymentOption> deploymentOptionsByVertex =
groupDeploymentOptionsByVertexId(executionVertexDeploymentOptions);
// 0 = {ExecutionVertexID@7434} "bc764cd8ddf7a0cff126f51c16239658_0"
// 1 = {ExecutionVertexID@7441} "0a448493b4782967b150582570326227_0"
// 2 = {ExecutionVertexID@7443} "0a448493b4782967b150582570326227_1"
// 3 = {ExecutionVertexID@7445} "0a448493b4782967b150582570326227_2"
// 4 = {ExecutionVertexID@7448} "0a448493b4782967b150582570326227_3"
// 5 = {ExecutionVertexID@7451} "ea632d67b7d595e5b851708ae9ad79d6_0"
// 6 = {ExecutionVertexID@7453} "ea632d67b7d595e5b851708ae9ad79d6_1"
// 7 = {ExecutionVertexID@7455} "ea632d67b7d595e5b851708ae9ad79d6_2"
// 8 = {ExecutionVertexID@7457} "ea632d67b7d595e5b851708ae9ad79d6_3"
// 9 = {ExecutionVertexID@7460} "6d2677a0ecc3fd8df0b72ec675edf8f4_0"
final List<ExecutionVertexID> verticesToDeploy =
executionVertexDeploymentOptions.stream()
.map(ExecutionVertexDeploymentOption::getExecutionVertexId)
.collect(Collectors.toList());
// {ExecutionVertexID@7434} "bc764cd8ddf7a0cff126f51c16239658_0" -> {ExecutionVertexVersion@7566}
// key = {ExecutionVertexID@7434} "bc764cd8ddf7a0cff126f51c16239658_0"
// value = {ExecutionVertexVersion@7566}
// executionVertexId = {ExecutionVertexID@7434} "bc764cd8ddf7a0cff126f51c16239658_0"
// version = 1
// {ExecutionVertexID@7453} "ea632d67b7d595e5b851708ae9ad79d6_1" -> {ExecutionVertexVersion@7567}
// key = {ExecutionVertexID@7453} "ea632d67b7d595e5b851708ae9ad79d6_1"
// value = {ExecutionVertexVersion@7567}
// executionVertexId = {ExecutionVertexID@7453} "ea632d67b7d595e5b851708ae9ad79d6_1"
// version = 1
// {ExecutionVertexID@7445} "0a448493b4782967b150582570326227_2" -> {ExecutionVertexVersion@7568}
// {ExecutionVertexID@7451} "ea632d67b7d595e5b851708ae9ad79d6_0" -> {ExecutionVertexVersion@7569}
// {ExecutionVertexID@7448} "0a448493b4782967b150582570326227_3" -> {ExecutionVertexVersion@7570}
// {ExecutionVertexID@7460} "6d2677a0ecc3fd8df0b72ec675edf8f4_0" -> {ExecutionVertexVersion@7571}
// {ExecutionVertexID@7457} "ea632d67b7d595e5b851708ae9ad79d6_3" -> {ExecutionVertexVersion@7572}
// {ExecutionVertexID@7441} "0a448493b4782967b150582570326227_0" -> {ExecutionVertexVersion@7573}
// {ExecutionVertexID@7455} "ea632d67b7d595e5b851708ae9ad79d6_2" -> {ExecutionVertexVersion@7574}
// {ExecutionVertexID@7443} "0a448493b4782967b150582570326227_1" -> {ExecutionVertexVersion@7575}
final Map<ExecutionVertexID, ExecutionVertexVersion> requiredVersionByVertex =
executionVertexVersioner.recordVertexModifications(verticesToDeploy);
transitionToScheduled(verticesToDeploy);
// allocateSlots ??
// slotExecutionVertexAssignments = {ArrayList@8148} size = 10
// 0 = {SlotExecutionVertexAssignment@8064}
// 1 = {SlotExecutionVertexAssignment@8070}
// 2 = {SlotExecutionVertexAssignment@8072}
// 3 = {SlotExecutionVertexAssignment@8066}
// 4 = {SlotExecutionVertexAssignment@8068}
// 5 = {SlotExecutionVertexAssignment@8067}
// 6 = {SlotExecutionVertexAssignment@8065}
// 7 = {SlotExecutionVertexAssignment@8073}
// 8 = {SlotExecutionVertexAssignment@8071}
// 9 = {SlotExecutionVertexAssignment@8069}
final List<SlotExecutionVertexAssignment> slotExecutionVertexAssignments =
allocateSlots(executionVertexDeploymentOptions);
// DeploymentHandle包含 : 版本, 参数 , slot分配信息
// deploymentHandles = {ArrayList@8194} size = 10
// 0 = {DeploymentHandle@8212}
// requiredVertexVersion = {ExecutionVertexVersion@7566}
// executionVertexDeploymentOption = {ExecutionVertexDeploymentOption@7484}
// slotExecutionVertexAssignment = {SlotExecutionVertexAssignment@8064}
// 1 = {DeploymentHandle@8213}
// requiredVertexVersion = {ExecutionVertexVersion@7573}
// executionVertexDeploymentOption = {ExecutionVertexDeploymentOption@7491}
// slotExecutionVertexAssignment = {SlotExecutionVertexAssignment@8070}
// 2 = {DeploymentHandle@8214}
// 3 = {DeploymentHandle@8215}
// 4 = {DeploymentHandle@8216}
// 5 = {DeploymentHandle@8217}
// 6 = {DeploymentHandle@8218}
// 7 = {DeploymentHandle@8219}
// 8 = {DeploymentHandle@8220}
// 9 = {DeploymentHandle@8221}
final List<DeploymentHandle> deploymentHandles =
createDeploymentHandles(
requiredVersionByVertex,
deploymentOptionsByVertex,
slotExecutionVertexAssignments);
// 开始部署 ...
waitForAllSlotsAndDeploy(deploymentHandles);
}
3.9. DefaultScheduler#waitForAllSlotsAndDeploy
private void waitForAllSlotsAndDeploy(final List<DeploymentHandle> deploymentHandles) {
// 分配资源, 开始部署
FutureUtils.assertNoException(
assignAllResources(deploymentHandles).handle(deployAll(deploymentHandles)));
}
3.10. DefaultScheduler#deployAll
private BiFunction<Void, Throwable, Void> deployAll(
final List<DeploymentHandle> deploymentHandles) {
return (ignored, throwable) -> {
propagateIfNonNull(throwable);
for (final DeploymentHandle deploymentHandle : deploymentHandles) {
final SlotExecutionVertexAssignment slotExecutionVertexAssignment =
deploymentHandle.getSlotExecutionVertexAssignment();
final CompletableFuture<LogicalSlot> slotAssigned =
slotExecutionVertexAssignment.getLogicalSlotFuture();
checkState(slotAssigned.isDone());
// slot 分配任务 : deployOrHandleError
FutureUtils.assertNoException(
slotAssigned.handle(deployOrHandleError(deploymentHandle)));
}
return null;
};
}
3.11. DefaultScheduler#deployOrHandleError
private BiFunction<Object, Throwable, Void> deployOrHandleError(
final DeploymentHandle deploymentHandle) {
final ExecutionVertexVersion requiredVertexVersion =
deploymentHandle.getRequiredVertexVersion();
final ExecutionVertexID executionVertexId = requiredVertexVersion.getExecutionVertexId();
return (ignored, throwable) -> {
if (executionVertexVersioner.isModified(requiredVertexVersion)) {
log.debug(
"Refusing to deploy execution vertex {} because this deployment was "
+ "superseded by another deployment",
executionVertexId);
return null;
}
if (throwable == null) {
// 部署task
deployTaskSafe(executionVertexId);
} else {
handleTaskDeploymentFailure(executionVertexId, throwable);
}
return null;
};
}
3.12. DefaultScheduler#deployTaskSafe
private void deployTaskSafe(final ExecutionVertexID executionVertexId) {
try {
// 获取ExecutionVertex
final ExecutionVertex executionVertex = getExecutionVertex(executionVertexId);
// 开始部署 : ExecutionVertex
// DefaultExecutionVertexOperations#deploy
executionVertexOperations.deploy(executionVertex);
} catch (Throwable e) {
handleTaskDeploymentFailure(executionVertexId, e);
}
}
3.13. DefaultExecutionVertexOperations#deploy
@Override
public void deploy(final ExecutionVertex executionVertex) throws JobException {
executionVertex.deploy();
}
3.14. ExecutionVertex#deploy
public void deploy() throws JobException {
// 部署
currentExecution.deploy();
}
3.15. ExecutionVertex#deploy
/**
* 将 execution 部署到先前分配的资源。
* Deploys the execution to the previously assigned resource.
*
* @throws JobException if the execution cannot be deployed to the assigned resource
*/
public void deploy() throws JobException {
assertRunningInJobMasterMainThread();
// 获取slot
// slotRequestId = {SlotRequestId@8931} "SlotRequestId{7d3611a3599a124ed703d75c55561420}"
// slotContext = {AllocatedSlot@8932} "AllocatedSlot e5eeb5d0e767c407ea81ab345a14ebd8 @ container_1619273419318_0017_01_000002 @ henghe-030 (dataPort=39722) - 0"
// slotSharingGroupId = null
// locality = {Locality@8933} "UNKNOWN"
// slotOwner = {SharedSlot@8934}
// releaseFuture = {CompletableFuture@8935} "java.util.concurrent.CompletableFuture@7ea60a0f[Not completed]"
// state = {SingleLogicalSlot$State@8936} "ALIVE"
// payload = {Execution@8899} "Attempt #0 (Source: Socket Stream (1/1)) @ org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@7f697d27 - [SCHEDULED]"
// willBeOccupiedIndefinitely = true
final LogicalSlot slot = assignedResource;
checkNotNull(
slot,
"In order to deploy the execution we first have to assign a resource via tryAssignResource.");
// Check if the TaskManager died in the meantime
// This only speeds up the response to TaskManagers failing concurrently to deployments.
// The more general check is the rpcTimeout of the deployment call
if (!slot.isAlive()) {
throw new JobException("Target slot (TaskManager) for deployment is no longer alive.");
}
// make sure exactly one deployment call happens from the correct state
// note: the transition from CREATED to DEPLOYING is for testing purposes only
ExecutionState previous = this.state;
if (previous == SCHEDULED || previous == CREATED) {
if (!transitionState(previous, DEPLOYING)) {
// race condition, someone else beat us to the deploying call.
// this should actually not happen and indicates a race somewhere else
throw new IllegalStateException(
"Cannot deploy task: Concurrent deployment call race.");
}
} else {
// vertex may have been cancelled, or it was already scheduled
throw new IllegalStateException(
"The vertex must be in CREATED or SCHEDULED state to be deployed. Found state "
+ previous);
}
if (this != slot.getPayload()) {
throw new IllegalStateException(
String.format(
"The execution %s has not been assigned to the assigned slot.", this));
}
try {
// race double check, did we fail/cancel and do we need to release the slot?
if (this.state != DEPLOYING) {
slot.releaseSlot(
new FlinkException(
"Actual state of execution "
+ this
+ " ("
+ state
+ ") does not match expected state DEPLOYING."));
return;
}
LOG.info(
"Deploying {} (attempt #{}) with attempt id {} to {} with allocation id {}",
vertex.getTaskNameWithSubtaskIndex(),
attemptNumber,
vertex.getCurrentExecutionAttempt().getAttemptId(),
getAssignedResourceLocation(),
slot.getAllocationId());
if (taskRestore != null) {
checkState(
taskRestore.getTaskStateSnapshot().getSubtaskStateMappings().stream()
.allMatch(
entry ->
entry.getValue()
.getInputRescalingDescriptor()
.equals(
InflightDataRescalingDescriptor
.NO_RESCALE)
&& entry.getValue()
.getOutputRescalingDescriptor()
.equals(
InflightDataRescalingDescriptor
.NO_RESCALE)),
"Rescaling from unaligned checkpoint is not yet supported.");
}
// 转换操作...
// 将 IntermediateResultPartition 转化成 ResultPartition
// 将 ExecutionEdge 转成 InputChannelDeploymentDescriptor(最终会在执行时转化成InputGate)
final TaskDeploymentDescriptor deployment =
TaskDeploymentDescriptorFactory.fromExecutionVertex(vertex, attemptNumber) // task
.createDeploymentDescriptor(
slot.getAllocationId(),
slot.getPhysicalSlotNumber(),
taskRestore,
producedPartitions.values());
// null taskRestore to let it be GC'ed
taskRestore = null;
final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();
final ComponentMainThreadExecutor jobMasterMainThreadExecutor =
vertex.getExecutionGraph().getJobMasterMainThreadExecutor();
getVertex().notifyPendingDeployment(this);
// We run the submission in the future executor so that the serialization of large TDDs
// does not block
// the main thread and sync back to the main thread once submission is completed.
CompletableFuture.supplyAsync(
// 提交任务!!!!
() -> taskManagerGateway.submitTask(deployment, rpcTimeout), executor)
.thenCompose(Function.identity())
.whenCompleteAsync(
(ack, failure) -> {
if (failure == null) {
vertex.notifyCompletedDeployment(this);
} else {
if (failure instanceof TimeoutException) {
String taskname =
vertex.getTaskNameWithSubtaskIndex()
+ " ("
+ attemptId
+ ')';
markFailed(
new Exception(
"Cannot deploy task "
+ taskname
+ " - TaskManager ("
+ getAssignedResourceLocation()
+ ") not responding after a rpcTimeout of "
+ rpcTimeout,
failure));
} else {
markFailed(failure);
}
}
},
jobMasterMainThreadExecutor);
} catch (Throwable t) {
markFailed(t);
if (isLegacyScheduling()) {
ExceptionUtils.rethrow(t);
}
}
}
3.16. RpcTaskManagerGateway.submitTask
@Override
public CompletableFuture<Acknowledge> submitTask(TaskDeploymentDescriptor tdd, Time timeout) {
return taskExecutorGateway.submitTask(tdd, jobMasterId, timeout);
}
3.17. TaskExecutor#submitTask
// ----------------------------------------------------------------------
// Task lifecycle RPCs
// 提交 任务 ???
// ----------------------------------------------------------------------
@Override
public CompletableFuture<Acknowledge> submitTask(
TaskDeploymentDescriptor tdd, JobMasterId jobMasterId, Time timeout) {
try {
final JobID jobId = tdd.getJobId();
final ExecutionAttemptID executionAttemptID = tdd.getExecutionAttemptId();
final JobTable.Connection jobManagerConnection =
jobTable.getConnection(jobId)
.orElseThrow(
() -> {
final String message =
"Could not submit task because there is no JobManager "
+ "associated for the job "
+ jobId
+ '.';
log.debug(message);
return new TaskSubmissionException(message);
});
if (!Objects.equals(jobManagerConnection.getJobMasterId(), jobMasterId)) {
final String message =
"Rejecting the task submission because the job manager leader id "
+ jobMasterId
+ " does not match the expected job manager leader id "
+ jobManagerConnection.getJobMasterId()
+ '.';
log.debug(message);
throw new TaskSubmissionException(message);
}
if (!taskSlotTable.tryMarkSlotActive(jobId, tdd.getAllocationId())) {
final String message =
"No task slot allocated for job ID "
+ jobId
+ " and allocation ID "
+ tdd.getAllocationId()
+ '.';
log.debug(message);
throw new TaskSubmissionException(message);
}
// re-integrate offloaded data:
try {
tdd.loadBigData(blobCacheService.getPermanentBlobService());
} catch (IOException | ClassNotFoundException e) {
throw new TaskSubmissionException(
"Could not re-integrate offloaded TaskDeploymentDescriptor data.", e);
}
// deserialize the pre-serialized information
final JobInformation jobInformation;
final TaskInformation taskInformation;
try {
jobInformation =
tdd.getSerializedJobInformation()
.deserializeValue(getClass().getClassLoader());
taskInformation =
tdd.getSerializedTaskInformation()
.deserializeValue(getClass().getClassLoader());
} catch (IOException | ClassNotFoundException e) {
throw new TaskSubmissionException(
"Could not deserialize the job or task information.", e);
}
if (!jobId.equals(jobInformation.getJobId())) {
throw new TaskSubmissionException(
"Inconsistent job ID information inside TaskDeploymentDescriptor ("
+ tdd.getJobId()
+ " vs. "
+ jobInformation.getJobId()
+ ")");
}
TaskMetricGroup taskMetricGroup =
taskManagerMetricGroup.addTaskForJob(
jobInformation.getJobId(),
jobInformation.getJobName(),
taskInformation.getJobVertexId(),
tdd.getExecutionAttemptId(),
taskInformation.getTaskName(),
tdd.getSubtaskIndex(),
tdd.getAttemptNumber());
InputSplitProvider inputSplitProvider =
new RpcInputSplitProvider(
jobManagerConnection.getJobManagerGateway(),
taskInformation.getJobVertexId(),
tdd.getExecutionAttemptId(),
taskManagerConfiguration.getTimeout());
final TaskOperatorEventGateway taskOperatorEventGateway =
new RpcTaskOperatorEventGateway(
jobManagerConnection.getJobManagerGateway(),
executionAttemptID,
(t) -> runAsync(() -> failTask(executionAttemptID, t)));
TaskManagerActions taskManagerActions = jobManagerConnection.getTaskManagerActions();
CheckpointResponder checkpointResponder = jobManagerConnection.getCheckpointResponder();
GlobalAggregateManager aggregateManager =
jobManagerConnection.getGlobalAggregateManager();
LibraryCacheManager.ClassLoaderHandle classLoaderHandle =
jobManagerConnection.getClassLoaderHandle();
ResultPartitionConsumableNotifier resultPartitionConsumableNotifier =
jobManagerConnection.getResultPartitionConsumableNotifier();
PartitionProducerStateChecker partitionStateChecker =
jobManagerConnection.getPartitionStateChecker();
final TaskLocalStateStore localStateStore =
localStateStoresManager.localStateStoreForSubtask(
jobId,
tdd.getAllocationId(),
taskInformation.getJobVertexId(),
tdd.getSubtaskIndex());
final JobManagerTaskRestore taskRestore = tdd.getTaskRestore();
// 构造 TaskStateManager
final TaskStateManager taskStateManager =
new TaskStateManagerImpl(
jobId,
tdd.getExecutionAttemptId(),
localStateStore,
taskRestore,
checkpointResponder);
MemoryManager memoryManager;
try {
memoryManager = taskSlotTable.getTaskMemoryManager(tdd.getAllocationId());
} catch (SlotNotFoundException e) {
throw new TaskSubmissionException("Could not submit task.", e);
}
// 构造一个新的Task
Task task =
new Task(
jobInformation,
taskInformation,
tdd.getExecutionAttemptId(),
tdd.getAllocationId(),
tdd.getSubtaskIndex(),
tdd.getAttemptNumber(),
tdd.getProducedPartitions(),
tdd.getInputGates(),
tdd.getTargetSlotNumber(),
memoryManager,
taskExecutorServices.getIOManager(),
taskExecutorServices.getShuffleEnvironment(),
taskExecutorServices.getKvStateService(),
taskExecutorServices.getBroadcastVariableManager(),
taskExecutorServices.getTaskEventDispatcher(),
externalResourceInfoProvider,
taskStateManager,
taskManagerActions,
inputSplitProvider,
checkpointResponder,
taskOperatorEventGateway,
aggregateManager,
classLoaderHandle,
fileCache,
taskManagerConfiguration,
taskMetricGroup,
resultPartitionConsumableNotifier,
partitionStateChecker,
getRpcService().getExecutor());
taskMetricGroup.gauge(MetricNames.IS_BACKPRESSURED, task::isBackPressured);
// Received task
// Window(
// TumblingProcessingTimeWindows(5000),
// ProcessingTimeTrigger,
// ReduceFunction$1, PassThroughWindowFunction
// ) ->
// Sink: Print to Std. Out (1/1)#0 (141dd597dc560a831b2b4bc195943f0b),
//
// deploy into slot with allocation id
// 3755cb8f9962a9a7738db04f2a02084c.
log.info(
"Received task {} ({}), deploy into slot with allocation id {}.",
task.getTaskInfo().getTaskNameWithSubtasks(),
tdd.getExecutionAttemptId(),
tdd.getAllocationId());
boolean taskAdded;
try {
taskAdded = taskSlotTable.addTask(task);
} catch (SlotNotFoundException | SlotNotActiveException e) {
throw new TaskSubmissionException("Could not submit task.", e);
}
if (taskAdded) {
// 启动线程
task.startTaskThread();
setupResultPartitionBookkeeping(
tdd.getJobId(), tdd.getProducedPartitions(), task.getTerminationFuture());
return CompletableFuture.completedFuture(Acknowledge.get());
} else {
final String message =
"TaskManager already contains a task for id " + task.getExecutionId() + '.';
log.debug(message);
throw new TaskSubmissionException(message);
}
} catch (TaskSubmissionException e) {
return FutureUtils.completedExceptionally(e);
}
}