Flink 1.13 源码解析——JobManager启动流程概览

EdwardsWang丶

已于 2023-02-21 18:00:47 修改

阅读量2.1k

点赞数 4

分类专栏： Flink 源码解析大数据平台-源码解析大数据平台-架构之道文章标签： java flink flink源码

于 2022-08-25 20:38:51 首次发布

本文链接：https://blog.csdn.net/edwardwong_/article/details/126526013

版权

Flink 源码解析同时被 3 个专栏收录

17 篇文章 23 订阅

订阅专栏

大数据平台-源码解析

17 篇文章 0 订阅

订阅专栏

大数据平台-架构之道

14 篇文章 4 订阅

订阅专栏

点击这里查看 Flink 1.13 源码解析目录汇总

一、基础概念

在开始分析Flink JobManager启动之前流程之前，我们需要了解一些重要的概念。

关于Flink的主节点JobManager，他只是一个逻辑上的主节点，针对不同的部署模式，主节点的实现类也不同

JobManager（逻辑）有三大核心内容，分别为ResourceManager、Dispatcher和WebmonitorEndpoin：

ResourceManager：

Flink集群的资源管理器，只有一个，关于Slot的管理和申请等工作，都有它负责

Dispatcher：

1、负责接收用户提交的JobGraph，然后启动一个JobMaster，蕾西与Yarn中的AppMaster和Spark中的Driver。

2、内有一个持久服务：JobGraphStore，负责存储JobGraph。当构建执行图或物理执行图时主节点宕机并恢复，则可以从这里重新拉取作业JobGraph

WebMonitorEndpoint：

Rest服务，内部有一个Netty服务，客户端的所有请求都由该组件接收处理

用一个例子来描述这三个组件的功能：

当Client提交一个Job到集群时（Client会把Job构建成一个JobGraph），主节点接收到提交的job的Rest请求后，WebMonitorEndpoint 会通过Router进行解析找到对应的Handler来执行处理，处理完毕后交由Dispatcher，Dispatcher负责大气JobMaster来负责这个Job内部的Task的部署执行，执行Task所需的资源，JobMaster向ResourceManager申请。

在了解完以上的核心概念后，我们开始本章的核心内容，JobManager的启动过程！

二、JobManager启动源码分析

2.1、启动流程概述

JobManager的启动流程分为三大部分，分别为：

初始化8个基础服务
创建工厂实例
通过不同的工厂实例创建三大核心组件ResourceManager、Dispatcher、WebMonitorEndpoint

在第一步初始化基础服务中，完成了以下基础服务的初始化

// 初始化和启动 AkkaRpcService，内部其实包装了一个 ActorSystem
commonRpcService = AkkaRpcServiceUtils.createRemoteRpcService(...)
// 启动一个 JMXService，用于客户端链接 JobManager JVM 进行监控
JMXService.startInstance(configuration.getString(JMXServerOptions.JMX_SERVER_PORT);
// 初始化一个负责 IO 的线程池
ioExecutor = Executors.newFixedThreadPool(...)
// 初始化 HA 服务组件，负责 HA 服务的是：ZooKeeperHaServices
haServices = createHaServices(configuration, ioExecutor);
// 初始化 BlobServer 服务端
blobServer = new BlobServer(configuration, haServices.createBlobStore());
blobServer.start();
// 初始化心跳服务组件, heartbeatServices = HeartbeatServices
heartbeatServices = createHeartbeatServices(configuration);
// 启动 metrics（性能监控） 相关的服务，内部也是启动一个 ActorSystem
MetricUtils.startRemoteMetricsRpcService(configuration,
commonRpcService.getAddress());
// 初始化一个用来存储 ExecutionGraph 的 Store, 实现是：
FileArchivedExecutionGraphStore
archivedExecutionGraphStore = createSerializableExecutionGraphStore(...)

在第二步创建工厂中,内部创建了三个重要的工厂:

1、DispatcherRunner工厂(DispatcherRunnerFactory)，默认实现：DefaultDispatcherRunnerFactory，生产 DefaultDispatcherRunner

2、ResourceManager工厂(ResourceManagerFactory)，默认实现：StandaloneResourceManagerFactory，生产 StandaloneResourceManager

3、WebMonitorEndpoint工厂(RestEndpointFactory)，默认实现：SessionRestEndpointFactory，生产 DispatcherRestEndpoint

在第三步中，根据不同的工厂类创建出来三个核心组件：

1、DispatcherRunner，实现是：DefaultDispatcherRunner

2、ResourceManager，实现是：StandaloneResourceManager

3、WebMonitorEndpoint，实现是：DispatcherRestEndpoint

了解完总体流程之后，我们来看代码实现！

2.2、启动流程源码分析

2.2.1 主节点准备工作分析

首先根据flink-damen脚本确定主节点启动类为 StandaloneSessionClusterEntrypoint（本次源码解析我们以Standalong模式进行），我们去看这个类

public class StandaloneSessionClusterEntrypoint extends SessionClusterEntrypoint {

    public StandaloneSessionClusterEntrypoint(Configuration configuration) {
        super(configuration);
    }

    @Override
    protected DefaultDispatcherResourceManagerComponentFactory
            createDispatcherResourceManagerComponentFactory(Configuration configuration) {
        return DefaultDispatcherResourceManagerComponentFactory.createSessionComponentFactory(
                StandaloneResourceManagerFactory.getInstance());
    }

    public static void main(String[] args) {
        // startup checks and logging
        EnvironmentInformation.logEnvironmentInfo(
                LOG, StandaloneSessionClusterEntrypoint.class.getSimpleName(), args);
        SignalHandler.register(LOG);
        JvmShutdownSafeguard.installAsShutdownHook(LOG);

        // TODO 1. 解析flink run。。。的命令参数，如Flink 作业jar的入口类等等
        final EntrypointClusterConfiguration entrypointClusterConfiguration =
                ClusterEntrypointUtils.parseParametersOrExit(
                        args,
                        new EntrypointClusterConfigurationParserFactory(),
                        StandaloneSessionClusterEntrypoint.class);

        // TODO 2. 解析flink-conf.yaml 配置文件
        Configuration configuration = loadConfiguration(entrypointClusterConfiguration);

        // TODO 3. 创建主节点
        StandaloneSessionClusterEntrypoint entrypoint =
                new StandaloneSessionClusterEntrypoint(configuration);

        // TODO 4. 启动主节点
        ClusterEntrypoint.runClusterEntrypoint(entrypoint);
    }
}

在这个入口类里主要做了4件事：

1、解析提交作业命令的参数

2、解析flink-conf.yaml配置文件

3、创建主节点

4、启动主节点

首先我们来看解析Flink-conf.yaml的过程，进入到loadConfiguration方法里

 public static Configuration loadConfiguration(
            final String configDir, @Nullable final Configuration dynamicProperties) {

        if (configDir == null) {
            throw new IllegalArgumentException(
                    "Given configuration directory is null, cannot load configuration");
        }

        final File confDirFile = new File(configDir);
        if (!(confDirFile.exists())) {
            throw new IllegalConfigurationException(
                    "The given configuration directory name '"
                            + configDir
                            + "' ("
                            + confDirFile.getAbsolutePath()
                            + ") does not describe an existing directory.");
        }

        // get Flink yaml configuration file
        // TODO 读取flink-conf.yaml文件
        final File yamlConfigFile = new File(confDirFile, FLINK_CONF_FILENAME);

        // 文件不存在则报错
        if (!yamlConfigFile.exists()) {
            throw new IllegalConfigurationException(
                    "The Flink config file '"
                            + yamlConfigFile
                            + "' ("
                            + yamlConfigFile.getAbsolutePath()
                            + ") does not exist.");
        }
        // TODO 解析flink-conf.yaml文件
        Configuration configuration = loadYAMLResource(yamlConfigFile);

        if (dynamicProperties != null) {
            configuration.addAll(dynamicProperties);
        }

        return configuration;
    }

在方法里，首先根据conf路径将文件读进来，在通过loadYAMLResource()方法解析文件中的配置，并将configuration返回出去。

接下来我们看本章节中最最最最重要的部分——主节点的启动过程

2.2.2 主节点启动过程分析

我们点进ClusterEntrypoint.runClusterEntrypoint(entrypoint)方法里：

final String clusterEntrypointName = clusterEntrypoint.getClass().getSimpleName();
        try {
            // TODO 启动主类
            clusterEntrypoint.startCluster();
        } catch (ClusterEntrypointException e) {
            LOG.error(
                    String.format("Could not start cluster entrypoint %s.", clusterEntrypointName),
                    e);
            System.exit(STARTUP_FAILURE_RETURN_CODE);
        }

再点进startCluster方法里，看这段代码，再进入runCluster方法

SecurityContext securityContext = installSecurityContext(configuration);

securityContext.runSecured(
        (Callable<Void>)
         () -> {
               runCluster(configuration, pluginManager);
               return null;
                 });

现在我们来到主节点启动的核心方法runCluster。在runCLuster方法里，主要做了三个重要的工作：

1、初始化了主节点对外提供服务的时候所需要的三大核心组件启动时所需的基础服务。

2、初始化了一个DispatcherResourceManagerComponentFactory 工厂实例，内部初始化了三大核心组件的工厂实例。

3、根据工厂类和基础环境，创建三大核心组件。

首先我们来看第一步，初始化8大基础服务：

/*
            TODO 初始化了主节点对外提供服务的时候所需要的三大核心组件启动时所需的基础服务
             1. commonRPCService：  基于Akka的RpcService实现。内部包装了ActorSystem
             2. JMXService：        启动一个JMXService
             3. ioExecutor：        启动一个线程池
             4. haServices：        提供对高可用性所需的所有服务的访问注册，分布式计数器和领导人选举
             5. blobServer：        负责侦听传入的请求生成线程来处理这些请求 。它还负责创建要存储的目录结构blob或临时缓存目录
             6. heartbeatServices： 提供心跳所需的所有服务，这包括创建心跳接收器和心跳发送者。
             7. metricRegistry：    跟踪所有已注册的Metric，他作为连接MetricGroup和MetricReporter
             8. archivedExecutionGraphStore：存储执行图ExecutionGraph的可序列化形式。
             */
            initializeServices(configuration, pluginManager);

点进initializeServices方法内，我们可以看到都初始化了哪些服务

分别是：

1. commonRPCService：

基于Akka的RpcService实现。内部包装了ActorSystem，这个服务其实就是一个tcp的Rpc服务，端口为：6123

2. JMXService：

启动一个JMXService，用于客户端连接JobManager JVM监控

3. ioExecutor：

启动一个线程池，大小为当前节点cpu核心数*4

4. haServices：

初始化一个基于Zookeeper的HA服务—ZookeeperHaServices ，提供对高可用性所需的所有服务的访问注册，分布式计数器和领导人选举

5. blobServer：

初始化大文件存储BlobServer服务端，所谓大文件例如上传Flink-job的jar时所依赖的一些需要一起上传的jar，或者TaskManager上传的log文件等

6. heartbeatServices：

提供心跳所需的所有服务，这包括创建心跳接收器和心跳发送者。

7. metricRegistry：

启动Metric（性能监控）相关服务，内部也是启动一个ActorSystem ，跟踪所有已注册的Metric，他作为连接MetricGroup和MetricReporter

8. archivedExecutionGraphStore：

存储执行图ExecutionGraph的可序列化形式。注意此处并不是JobGraphStore，JobGraphStore会在Dispatcher启动时启动。

    protected void initializeServices(Configuration configuration, PluginManager pluginManager)
            throws Exception {

        LOG.info("Initializing cluster services.");

        synchronized (lock) {
            // TODO 1.初始化和启动AkkaRpcService，内部包装了ActorSystem
            // 创建一个AkkaRpc服务，基于Akka的RpcService实现。
            // commonRpcService 是一个基于Akka的ActorSystem，其实就是一个tcp的Rpc服务，端口为：6123
            commonRpcService =
                    AkkaRpcServiceUtils.createRemoteRpcService(
                            configuration,
                            configuration.getString(JobManagerOptions.ADDRESS),
                            getRPCPortRange(configuration),
                            configuration.getString(JobManagerOptions.BIND_HOST),
                            configuration.getOptional(JobManagerOptions.RPC_BIND_PORT));

            // TODO 2.启动了一个JMXService，用于客户端连接JobManager JVM监控
            JMXService.startInstance(configuration.getString(JMXServerOptions.JMX_SERVER_PORT));

            // update the configuration used to create the high availability services
            configuration.setString(JobManagerOptions.ADDRESS, commonRpcService.getAddress());
            configuration.setInteger(JobManagerOptions.PORT, commonRpcService.getPort());

            // TODO 3.初始化IO线程池，大小为当前节点cpu核心数*4
            // Flink有很多地方的代码都是异步编程
            ioExecutor =
                    Executors.newFixedThreadPool(
                            ClusterEntrypointUtils.getPoolSize(configuration),
                            new ExecutorThreadFactory("cluster-io"));

            // TODO 4.初始化一个基于Zookeeper的HA服务：ZookeeperHaServices
            haServices = createHaServices(configuration, ioExecutor);
            // TODO 5.初始化大文件存储BlobServer服务端，所谓大文件例如上传Flink-job的jar时所依赖的一些需要一起上传的jar，或者TaskManager上传的log文件等
            blobServer = new BlobServer(configuration, haServices.createBlobStore());
            blobServer.start();
            // TODO 6.心跳服务
            heartbeatServices = createHeartbeatServices(configuration);
            // TODO 7.启动Metric（性能监控） 相关服务，内部也是启动一个ActorSystem
            metricRegistry = createMetricRegistry(configuration, pluginManager);

            final RpcService metricQueryServiceRpcService =
                    MetricUtils.startRemoteMetricsRpcService(
                            configuration, commonRpcService.getAddress());
            metricRegistry.startQueryService(metricQueryServiceRpcService, null);

            final String hostname = RpcUtils.getHostname(commonRpcService);

            processMetricGroup =
                    MetricUtils.instantiateProcessMetricGroup(
                            metricRegistry,
                            hostname,
                            ConfigurationUtils.getSystemResourceMetricsProbingInterval(
                                    configuration));
            // TODO 8.初始化一个用来存储ExecutionGraph的Store，实现是FileArchivedExecutionGraphStore
            // JobGraphStore会在Dispatcher启动时启动
            executionGraphInfoStore =
                    createSerializableExecutionGraphStore(
                            configuration, commonRpcService.getScheduledExecutor());
        }
    }

这个方法里面基本都是从配置文件拿出配置，再赋给服务，没有很复杂很重要的工作，就不在赘述了，下面开始分析工厂类的构建过程

第二步核心工厂类的构建：

 /*
            TODO 此处核心方法，初始化了一个DispatcherResourceManagerComponentFactory 工厂实例，
             内部初始化了三大核心组件的工厂实例：
             1. Dispatcher = DefaultDispatcherRunnerFactory，生产DefaultDispatcherRunner
             2. ResourceManager = StandalongResourceManagerFactory， 生产StandalongResourceManager
             3. WebMonitorEndpoint = SessionRestEndpointFactory，生产 DispatcherRestEndpoint

             */
            final DispatcherResourceManagerComponentFactory
                    dispatcherResourceManagerComponentFactory =
                            createDispatcherResourceManagerComponentFactory(configuration);

在这个工厂内部,初始化了三大核心组件工厂实例:

1. Dispatcher = DefaultDispatcherRunnerFactory，生产DefaultDispatcherRunner

2. ResourceManager = StandalongResourceManagerFactory,生产StandalongResourceManager

3. WebMonitorEndpoint = SessionRestEndpointFactory，生产 DispatcherRestEndpoint

我们点进createDispatcherResourceManagerComponentFactory方法，找StandaloneSessionClusterEntrypoint的实现

    @Override
    protected DefaultDispatcherResourceManagerComponentFactory
            createDispatcherResourceManagerComponentFactory(Configuration configuration) {
        // 创建第一个工厂 StandaloneResourceManagerFactory
        return DefaultDispatcherResourceManagerComponentFactory.createSessionComponentFactory(
                StandaloneResourceManagerFactory.getInstance());
    }

在这里通过StandaloneResourceManagerFactory.getInstance()，创建了第一个工厂StandaloneResourceManagerFactory

我们进入createSessionComponentFactory方法来继续看剩下两个工厂的构建

    public static DefaultDispatcherResourceManagerComponentFactory createSessionComponentFactory(
            ResourceManagerFactory<?> resourceManagerFactory) {
        // TODO 构建工厂
        return new DefaultDispatcherResourceManagerComponentFactory(
                // TODO 第二个工厂
                DefaultDispatcherRunnerFactory.createSessionRunner(
                        SessionDispatcherFactory.INSTANCE),
                // TODO 第一个工厂
                resourceManagerFactory,

                // TODO 第三个工厂
                SessionRestEndpointFactory.INSTANCE);
    }

代码很清楚很简洁，将ResourceManager的工厂和DispatcherRunner的工厂创建了出来。

到此为止，主节点的三个核心组件的工厂类已经创建完毕，接下来我们来看三个核心组件

第三步，创建ResourceManager、DispatcherRunner、WebMonitorEndpoint

            /*
            TODO 根据第一步中已创建基础服务，创建JobManager的三大核心角色实例
            1. WebMonitorEndpoint：用于接受客户端发送的执行任务的Rest请求
            2. resourceManager：负责资源的分配和记账
            3. dispatcher：负责用于接收作业提交，持久化他们，生成要执行的作业管理器任务，并在主任务失败时恢复它们。
             */
            clusterComponent =
                    dispatcherResourceManagerComponentFactory.create(
                            configuration,
                            ioExecutor,
                            commonRpcService,
                            haServices,
                            blobServer,
                            heartbeatServices,
                            metricRegistry,
                            executionGraphInfoStore,
                            new RpcMetricQueryServiceRetriever(
                                    metricRegistry.getMetricQueryServiceRpcService()),
                            this);

在这一步中，根据第二步创建的三个核心工厂实例，分别创建了WebMonitorEndpoint、ResourceManager以及DispatcherRunner。

1、DispatcherRunner，实现是：DefaultDispatcherRunner

2、ResourceManager，实现是：StandaloneResourceManager

3、WebMonitorEndpoint，实现是：DispatcherRestEndpoint

我们来看代码，首先进入create方法，在这个方法里完成了很多操作，我们来逐一分析。

第一步、首先初始化了一些监控服务：

  // TODO DefaultLeaderRetrievalService 监控 Dispatcher
            dispatcherLeaderRetrievalService =
                    highAvailabilityServices.getDispatcherLeaderRetriever();

            // TODO DefaultLeaderRetrievalService 监控 ResourceManager
            resourceManagerRetrievalService =
                    highAvailabilityServices.getResourceManagerLeaderRetriever();

            // TODO Dispatcher 的 GatewayRetriever
            final LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever =
                    new RpcGatewayRetriever<>(
                            rpcService,
                            DispatcherGateway.class,
                            DispatcherId::fromUuid,
                            new ExponentialBackoffRetryStrategy(
                                    12, Duration.ofMillis(10), Duration.ofMillis(50)));

            // TODO ResourceManager 的 GatewayRetriever
            final LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever =
                    new RpcGatewayRetriever<>(
                            rpcService,
                            ResourceManagerGateway.class,
                            ResourceManagerId::fromUuid,
                            new ExponentialBackoffRetryStrategy(
                                    12, Duration.ofMillis(10), Duration.ofMillis(50)));

第二步、构建了一个线程池用于执行WebMonitorEndpointEndpoint所接收到的client发送过来的请求

 // TODO 创建线程池，用于执行WebMonitorEndpointEndpoint所接收到的client发送过来的请求
 final ScheduledExecutorService executor =
             WebMonitorEndpoint.createExecutorService(
                   configuration.getInteger(RestOptions.SERVER_NUM_THREADS),
                   configuration.getInteger(RestOptions.SERVER_THREAD_PRIORITY),
                   "DispatcherRestEndpoint");

第三步、初始化 MetricFetcher，刷新间隔是10s

// TODO 初始化 MetricFetcher， 间隔是10s
final long updateInterval =
        configuration.getLong(MetricOptions.METRIC_FETCHER_UPDATE_INTERVAL);
final MetricFetcher metricFetcher =
        updateInterval == 0
                ? VoidMetricFetcher.INSTANCE
                : MetricFetcherImpl.fromConfiguration(
                        configuration,
                        metricQueryServiceRetriever,
                        dispatcherGatewayRetriever,
                        executor);

第四步、创建WebMonitorEndpoint实例，并启动，在Standalong模式下为：DispatcherRestEndpoint 该实例内部会启动一个Netty服务端，绑定了一堆Handler

 /*
 TODO 创建WebMonitorEndpoint实例，在Standalong模式下为：DispatcherRestEndpoint
  该实例内部会启动一个Netty服务端，绑定了一堆Handler
  */
 webMonitorEndpoint =
         restEndpointFactory.createRestEndpoint(
                 configuration,
                 dispatcherGatewayRetriever,
                 resourceManagerGatewayRetriever,
                 blobServer,
                 executor,
                 metricFetcher,
                 highAvailabilityServices.getClusterRestEndpointLeaderElectionService(),
                 fatalErrorHandler);
 // TODO 启动WebMonitorEndpoint
 log.debug("Starting Dispatcher REST endpoint.");
 webMonitorEndpoint.start();

第五步、创建ResourceManager对象，这里有三个要点：

1. ResourceManager是一个RpcEndpoint(Actor),当构建好对象后启动时会触发onStart(Actor的perStart生命周期方法)方法

2. ResourceManager也是一个LeaderContendr,也会执行竞选, 会执行竞选结果方法

3. ResourceManagerService 具有两个心跳服务和两个定时服务:

a. 两个心跳服务: 从节点和主节点之间的心跳，Job的主控程序和主节点之间的心跳

b. 两个定时服务: TaskManager 的超时检查服务 Slot申请的超时检查服务

/*
TODO 创建ResourceManager实例
 三个要点:
 1. ResourceManager是一个RpcEndpoint(Actor),当构建好对象后启动时会触发onStart(Actor的p
 2. ResourceManager也是一个LeaderContendr,也会执行竞选, 会执行竞选结果方法
 3. ResourceManagerService 具有两个心跳服务和两个定时服务:
        两个心跳服务:
            从节点  和   主节点之间的心跳
            Job的主控程序 和 主节点之间的心跳
        两个定时服务:
            TaskManager 的超时检查服务
            Slot申请的 超时检查服务
 */
resourceManager =
        resourceManagerFactory.createResourceManager(
                configuration,
                ResourceID.generate(),
                rpcService,
                highAvailabilityServices,
                heartbeatServices,
                fatalErrorHandler,
                new ClusterInformation(hostname, blobServer.getPort()
                webMonitorEndpoint.getRestBaseUrl(),
                metricRegistry,
                hostname,
                ioExecutor);

第六步、构建了一个DispatcherRunner，注意不是Dispatcher，Dispatcher的构建和启动是在DispatcherRunner内部实现的

/*
TODO 在该代码的内部会创建Dispatcher组件，并调用start() 方法启动
 */
dispatcherRunner =
        dispatcherRunnerFactory.createDispatcherRunner(
                highAvailabilityServices.getDispatcherLeaderElectionService(),
                fatalErrorHandler,
                new HaServicesJobGraphStoreFactory(highAvailabilityServices),
                ioExecutor,
                rpcService,
                partialDispatcherServices);

第七步、启动ResourceManager

// TODO 启动ResourceManager
log.debug("Starting ResourceManager.");
resourceManager.start();

到此为止，主节点（逻辑JobManager）已启动完毕

到这里你们可能要问了，ResourceManager、WebMonitorEndpoint、Dispatcher具体的启动是怎么做的呢，由于涉及到的代码量太大，我在这里拆分为三章来分别解析这三大核心组件的构建过程。在下一章，我们就先来看看WebMonitorEndpoint是如何启动的！