Flink 作业如何提升资源利用率？- Reactive 模式 ( 下 )

最新推荐文章于 2024-03-04 18:05:52 发布

ziyang_wind

最新推荐文章于 2024-03-04 18:05:52 发布

阅读量1.2k

点赞数

分类专栏：大数据分布式计算 java 文章标签： flink big data java

本文链接：https://blog.csdn.net/weixin_39856351/article/details/123024063

版权

大数据同时被 3 个专栏收录

4 篇文章 2 订阅

订阅专栏

java

4 篇文章 0 订阅

订阅专栏

分布式计算

3 篇文章 0 订阅

订阅专栏

本篇文章是接着上一篇文章讲的，推荐看完上一篇文章《Flink 弹性伸缩 - Reactive 模式 ( 上 )》再来阅读本篇文章。

四、AdaptiveScheduler 运行原理

1、Reactive 模式调度器所涉及到的类型

2、 AdaptiveScheduler 调度器本身是一个状态机

3、AdaptiveScheduler 各个执行状态都在转换

(WaitingForResources -> CreatingExecutionGraph)

org.apache.flink.runtime.scheduler.adaptive.WaitingForResources.java
private void createExecutionGraphWithAvailableResources() {
        context.goToCreatingExecutionGraph();
    }

在这段代码中调用了 AdaptiveScheduler 的 goToCreatingExecutionGraph 方法，进入到创建执行图的状态。

org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.java
@Override
    public void goToCreatingExecutionGraph() {
        final CompletableFuture<CreatingExecutionGraph.ExecutionGraphWithVertexParallelism>
                executionGraphWithAvailableResourcesFuture =
                        createExecutionGraphWithAvailableResourcesAsync();

        transitionToState(
                new CreatingExecutionGraph.Factory(
                        this, executionGraphWithAvailableResourcesFuture, LOG));
    }

这里通过 createExecutionGraphWithAvailableResourcesAsync 方法重新创建了一个执行图，然后将调度器的状态转换到了 CreatingExecutionGraph 状态，在重新创建执行图的过程中，根据当前集群资源的情况重新计算了 JobGraph 中每一个作业顶点的并行度，比如集群中有10个slot， JobGraph中有2个资源共享组，那每一个资源共享组就能分配到5个slot，其并行度也就是5，计算完成后改变 JobGraph 中作业顶点的并行度，然后重新生成一个 ExecutionGraph ，然后恢复到上一个 Checkpoint 的状态后，将执行流程转换到 CreatingExecutionGraph 状态。

org.apache.flink.runtime.scheduler.adaptive.CreatingExecutionGraph.java
private void handleExecutionGraphCreation(
            @Nullable ExecutionGraphWithVertexParallelism executionGraphWithVertexParallelism,
            @Nullable Throwable throwable) {
        if (throwable != null) {
            log.info(
                    "Failed to go from {} to {} because the ExecutionGraph creation failed.",
                    CreatingExecutionGraph.class.getSimpleName(),
                    Executing.class.getSimpleName(),
                    throwable);
            context.goToFinished(context.getArchivedExecutionGraph(JobStatus.FAILED, throwable));
        } else {
            final AssignmentResult result =
                    context.tryToAssignSlots(executionGraphWithVertexParallelism);

            if (result.isSuccess()) {
                log.debug(
                        "Successfully reserved and assigned the required slots for the ExecutionGraph.");
                context.goToExecuting(result.getExecutionGraph());
            } else {
                log.debug(
                        "Failed to reserve and assign the required slots. Waiting for new resources.");
                context.goToWaitingForResources();
            }
        }
    }

(CreatingExecutionGraph -> Executing)

在 CreatingExecutionGraph 创建的过程中会调用 handleExecutionGraphCreation 方法，在这段代码中首先是判断在处理过程中是否有错，如果有，则将调度器的状态转换到完成状态，如果一切正常，则先给执行图设置slot资源，然后将调度器的状态转换到执行状态。

org.apache.flink.runtime.scheduler.adaptive.CreatingExecutionGraph.java
Executing(
            ExecutionGraph executionGraph,
            ExecutionGraphHandler executionGraphHandler,
            OperatorCoordinatorHandler operatorCoordinatorHandler,
            Logger logger,
            Context context,
            ClassLoader userCodeClassLoader) {
        super(context, executionGraph, executionGraphHandler, operatorCoordinatorHandler, logger);
        this.context = context;
        this.userCodeClassLoader = userCodeClassLoader;
        Preconditions.checkState(
                executionGraph.getState() == JobStatus.RUNNING, "Assuming running execution graph");

        deploy();  
        ... 
    }

在Executing状态的创建过程中，调用了 deploy 方法来部署执行 executionGraph 。

org.apache.flink.runtime.scheduler.adaptive.CreatingExecutionGraph.java
private void deploy() {
        for (ExecutionJobVertex executionJobVertex :
                getExecutionGraph().getVerticesTopologically()) {
            for (ExecutionVertex executionVertex : executionJobVertex.getTaskVertices()) {
                if (executionVertex.getExecutionState() == ExecutionState.CREATED
                        || executionVertex.getExecutionState() == ExecutionState.SCHEDULED) {
                    deploySafely(executionVertex);
                }
            }
        }
    }

    private void deploySafely(ExecutionVertex executionVertex) {
        try {
            executionVertex.deploy();
        } catch (JobException e) {
            handleDeploymentFailure(executionVertex, e);
        }
    }

在 deploy 方法中有两层循环，第一层循环是遍历执行图中的执行作业顶点（flink 作业 web 监控界面上图形中的节点），第二层循环是遍历执行作业顶点的子任务（比如这个执行作业顶点的并行度是5，则就有5个子任务），对于每一个子任务会执行 ExecutionVertex 的 deploy 方法。

org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.java
public void deploy() throws JobException {
        currentExecution.deploy();
    }

public void deploy() throws JobException {
           ...        
            final TaskDeploymentDescriptor deployment =
                    TaskDeploymentDescriptorFactory.fromExecutionVertex(vertex, attemptNumber)
                            .createDeploymentDescriptor(
                                    slot.getAllocationId(),
                                    taskRestore,
                                    producedPartitions.values());
             ...
             CompletableFuture.supplyAsync(
                            () -> taskManagerGateway.submitTask(deployment, rpcTimeout), executor)
                    .thenCompose(Function.identity())
                    ...

        } catch (Throwable t) {
            markFailed(t);
        }
    }

在这里，将要部署的子任务封装成 TaskDeploymentDescriptor 对象，再通过 taskManagerGateway.submitTask 方法将子任务部署到具体的 TaskManager 执行。

4、上面介绍了AdaptiveScheduler 从创建到开始调度再到作业执行状态的整个过程，与 AdaptiveScheduler 管理作业生命周期相关的是其实现了接口 SchedulerNG （这个接口也是其他调度器需要实现的接口，是调度器的核心所在）。

org.apache.flink.runtime.scheduler.SchedulerNG.java
public interface SchedulerNG extends AutoCloseableAsync {
    void startScheduling();
    void cancel();
}

startScheduling 方法是调度器开始调度作业的入口方法；

cancel 方法是调度器取消作业执行的入口方法。

5、AdaptiveScheduler 能实现伸缩的机制

org.apache.flink.runtime.scheduler.adaptive.AdaptiveScheduler.java
private void newResourcesAvailable(Collection<? extends PhysicalSlot> physicalSlots) {
        state.tryRun(
                ResourceConsumer.class,
                ResourceConsumer::notifyNewResourcesAvailable,
                "newResourcesAvailable");
    }

当有新的 TaskManager 加入到集群之后，会执行到 newResourcesAvailable 方法，该方法主要是调用调度器当前状态的 tryRun 方法来执行接下来的动作，本质上是调用当前状态的 notifyNewResourcesAvailable 方法。而实现 ResourceConsumer::notifyNewResourcesAvailable 方法的只有 Executing 与WaitingForResources 两种状态，也就说调度器只有处于这两种状态下时才会响应资源伸缩事件。

五、知识延伸-有限状态机的介绍

状态机可归纳为4个要素，即现态、条件、动作、次态。“现态”和“条件”是因，“动作”和“次态”是果。详解如下：

①现态：是指当前所处的状态。

②条件：又称为“事件”。当一个条件被满足，将会触发一个动作，或者执行一次状态的迁移。

③动作：条件满足后执行的动作。动作执行完毕后，可以迁移到新的状态，也可以仍旧保持原状态。动作不是必需的，当条件满足后，也可以不执行任何动作，直接迁移到新状态。

④次态：条件满足后要迁往的新状态。“次态”是相对于“现态”而言的，“次态”一旦被激活，就转变成新的“现态”了。

插曲

下面是本人抖音号（2026775054），欢迎关注，可以将您想要了解的Flink源码部分私信发给我，后续给您录制视频讲解，谢谢支持～～～

在这里插入图片描述

ziyang_wind

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Flink 作业如何提升资源利用率？- Reactive 模式 ( 下 )

本篇文章是接着上一篇文章讲的，推荐看完上一篇文章《Flink 弹性伸缩 - Reactive 模式 ( 上 )》再来阅读本篇文章。四、AdaptiveScheduler 运行原理1、Reactive 模式调度器所涉及到的类型2、 AdaptiveScheduler 调度器本身是一个状态机3、AdaptiveScheduler 各个执行状态都在转换(WaitingForResources -> CreatingExecutionGraph)org.apache.flink.runtime.s
复制链接

扫一扫

专栏目录