Flink JobMaster 调度源码分析1：调度过程

北_鱼

已于 2024-06-08 12:10:33 修改

阅读量819

点赞数 11

分类专栏： Flink 文章标签： flink 大数据 java big data

于 2024-03-20 13:03:50 首次发布

本文链接：https://blog.csdn.net/White_Ink_/article/details/136873091

版权

Flink 专栏收录该内容

10 篇文章 1 订阅

订阅专栏

Flink JobMaster 调度源码分析1：调度过程
 Flink JobMaster 调度源码分析2：Slot 分配策略
 Flink JobMaster 调度源码分析3：Physical Slot 分配过程
本文分析的源码为flink 1.18.0_scala2.12版本。

调度

调度代码可查看 flink-runtime 模块中的 org.apache.flink.runtime.scheduler 包。
DefaultScheduler 继承 SchedulerBase，具有方法 startScheduling()。

startScheduling()
-> startSchedulingInternal()
	-> // 这里有不同的调度策略
	schedulingStrategy.startScheduling() 
		->  // 以 PipelinedRegionSchedulingStrategy 为例（以流水线区域的粒度调度任务）
		maybeScheduleRegions(sourceRegions) // 从源算子开始调度
			-> this::scheduleRegion
				-> schedulerOperations.allocateSlotsAndDeploy();

通过 allocateSlotsAndDeploy() 方法为每个示例分配 slot，并在返回 slot 时部署算子实例。只有在所有算子实例都分配到 slot 后，才会部署。将按照给定的拓扑顺序去部署。只接受处于创建状态的算子实例。下面是 DefaultScheduler.allocateSlotsAndDeploy() 代码。（其中 Execution 是执行一个 ExecutionVertex 的一次尝试。当发生故障或者数据需要重算的情况下 ExecutionVertex 可能会有多个 ExecutionAtemptID。一个 Execution 通过 ExecutionAtemptID 来唯一标识。JM 和 TM 之间关于 task 的部署和 task status 的更新都是通过 ExecutionAttemptID 来确定消息接受者。）

// 将 vertices 转换为 Execution
final List<Execution> executionsToDeploy =  
        verticesToDeploy.stream()  
                .map(this::getCurrentExecutionOfVertex)  
                .collect(Collectors.toList());  
// 部署 Execution
executionDeployer.allocateSlotsAndDeploy(executionsToDeploy, requiredVersionByVertex);

在 executionDeployer.allocateSlotsAndDeploy() 中，通过下面代码分配 slot。

@Override  
public void allocateSlotsAndDeploy(  
        final List<Execution> executionsToDeploy,  
        final Map<ExecutionVertexID, ExecutionVertexVersion> requiredVersionByVertex) {  
	// 确定所有 Execution 都已被创建，即状态为 CREATED
	validateExecutionStates(executionsToDeploy);  
	// 将 Execution 状态转换为 SCHEDULED
	transitionToScheduled(executionsToDeploy);  
	// 为每个 Execution 分配 slot
    final Map<ExecutionAttemptID, ExecutionSlotAssignment> executionSlotAssignmentMap =  
            allocateSlotsFor(executionsToDeploy);  
    final List<ExecutionDeploymentHandle> deploymentHandles =  
            createDeploymentHandles(  
                    executionsToDeploy, requiredVersionByVertex, executionSlotAssignmentMap);  
	// 部署实例
    waitForAllSlotsAndDeploy(deploymentHandles);  
}

继续查看 allocateSlotsFor(executionsToDeploy) 的调用过程。

allocateSlotsFor(executionsToDeploy)
-> executionSlotAllocator.allocateSlotsFor(executionAttemptIds)
// executionSlotAllocator 可以是 SimpleExecutionSlotAllocator 或 SlotSharingExecutionSlotAllocator

1. 调度策略

系统已经实现的分配策略有：SimpleExecutionSlotAllocator、SlotSharingExecutionSlotAllocator。
自己实现分配策略，需要实现接口 ExecutionSlotAllocator，需要实现下面的方法。

Map<ExecutionAttemptID, ExecutionSlotAssignment> allocateSlotsFor(  
        List<ExecutionAttemptID> executionAttemptIds);

分配策略是在创建 JobMaster 过程中，创建调度器时指定的，具体代码位置如下。
使用哪个分配策略是在 ExecutionDeployer（如DefaultExecutionDeployer）创建时指定的。ExecutionDeployer 在 Scheduler（如DefaultScheduler）构造函数中创建。代码如下。

// 创建 slot 分配策略
this.executionSlotAllocator =  
		// executionSlotAllocatorFactory 是构造函数中传递的 slot 分配策略
        checkNotNull(executionSlotAllocatorFactory)  
                .createInstance(new DefaultExecutionSlotAllocationContext());  
......
// 创建 ExecutionDeployer
this.executionDeployer =  
        executionDeployerFactory.createInstance(  
                log,  
                executionSlotAllocator,  
                executionOperations,  
                executionVertexVersioner,  
                rpcTimeout,  
                this::startReserveAllocation,  
                mainThreadExecutor);

executionSlotAllocatorFactory 是在 DefaultSchedulerFactory 构造函数中通过下面语句获得

return new DefaultScheduler(
		......
		schedulerComponents.getAllocatorFactory(),  // 指定了分配策略
		......
		)

schedulerComponents 的出现位置仍然在当前构造函数中。

final DefaultSchedulerComponents schedulerComponents =  
        createSchedulerComponents(  
                jobGraph.getJobType(),  
                jobGraph.isApproximateLocalRecoveryEnabled(),  
                jobMasterConfiguration,  
                slotPool,  
                slotRequestTimeout);

查看其调用过程

final DefaultSchedulerComponents schedulerComponents = createSchedulerComponents(......);
-> return createPipelinedRegionSchedulerComponents(......);
	-> final ExecutionSlotAllocatorFactory allocatorFactory = new SlotSharingExecutionSlotAllocatorFactory(......);

也就是说，在上面代码中，指定了 SlotSharingExecutionSlotAllocatorFactory 做为 slot 分配策略。至于 SimpleExecutionSlotAllocator，当前代码查看结果是，该类是在批处理中使用的。

2. slot 选择

在指定 slot 分配策略时，同样对 SlotSelectionStrategy 进行了选择，代码如下。

// 由物理节点提供 slot 的策略
final SlotSelectionStrategy slotSelectionStrategy =  
        SlotSelectionStrategyUtils.selectSlotSelectionStrategy(  
                jobType, jobMasterConfiguration);  
......
// 根据 slotSelectionStrategy 创建 physicalSlotProvider，用于后续分配 slotfinal PhysicalSlotProvider physicalSlotProvider =  
        new PhysicalSlotProviderImpl(slotSelectionStrategy, slotPool);

进入 SlotSelectionStrategyUtils.selectSlotSelectionStrategy() 中，可以看到涉及到两个配置项：ClusterOptions.EVENLY_SPREAD_OUT_SLOTS_STRATEGY 和 CheckpointingOptions.LOCAL_RECOVERY。
EVENLY_SPREAD_OUT_SLOTS_STRATEGY 是 Flink 中的一种资源分配策略，用于控制 TaskManager 如何分配其 slots。这种策略确保了作业的各个任务能够均匀地分布在所有的 TaskManager 上，从而提高任务的并行处理能力。在Flink的配置文件中，可以通过设置 slotmanager.slot-allocator.type 属性为 EVENLY_SPREAD_OUT 来启用这种策略。
如果使用了该策略，则 SlotSelectionStrategy 为 EvenlySpreadOutLocationPreferenceSlotSelectionStrategy，否则为 DefaultLocationPreferenceSlotSelectionStrategy。
LOCAL_RECOVERY 用于指示 Flink 是否应该从最后一个成功的检查点恢复。如果启用，Flink 会尝试从最后一个成功的检查点自动恢复，并可能回滚在该检查点之后开始的所有更改。如果指定了该选项，则 SlotSelectionStrategy 为 PreviousAllocationSlotSelectionStrategy。

北_鱼

关注

11
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
Flink JobMaster 调度源码分析1：调度过程

在Flink中，调度是指将作业的任务分配到集群中的计算资源并管理任务的执行的过程。Flink的调度器负责根据作业的拓扑结构和资源需求，将任务分配给集群中的TaskManager节点，并监控任务的执行状态。调度器还负责任务的故障恢复和任务的重新调度，以确保作业的正确执行。Flink的调度器支持动态资源分配和任务优先级调度，可以根据作业的需求和集群的资源情况进行灵活的调度策略。通过高效的调度管理，Flink能够实现作业的高性能和高可靠性。
复制链接

扫一扫

专栏目录