Flink 1.13 源码解析——Flink作业提交流程下

EdwardsWang丶

已于 2023-02-21 17:58:25 修改

阅读量1.1k

点赞数

分类专栏： Flink 源码解析大数据平台-源码解析文章标签： flink 大数据 java

于 2022-09-07 18:25:43 首次发布

本文链接：https://blog.csdn.net/edwardwong_/article/details/126746960

版权

Flink 源码解析同时被 2 个专栏收录

17 篇文章 23 订阅

订阅专栏

大数据平台-源码解析

17 篇文章 0 订阅

订阅专栏

本文相关内容：

Flink 1.13 源码解析目录汇总

Flink 1.13 源码解析前导——Akka通信模型

Flink 1.13 源码解析——JobManager启动流程 WebMonitorEndpoint启动

Flink 1.13 源码解析——Flink 作业提交流程上

前言

一、JobSubmitHandler解析JobGraph并交给Dispatcher

二、Dispatcher接收JobGraph并初始化JobMaster并启动JobMaster

2.1、初始化JobMaster所需的相关基础服务

2.2、JobMaster的Leader竞选流程

2.3、JobMaster的初始化和启动

总结

前言

在上一章中我们讲到，在env.execute环节中，根据我们构建的Transformations集合,构建出StreamGraph，再将StreamGraph转化为JobGraph，并将JobGraph持久化，最终将我们的JobGraphFile以及依赖Jar以及其他一些配置构建为一个RequestBody，通过RestClient内部构建的Netty客户端发送至JobManager中的WebMonitorEndpoint中的Netty 服务端，再由Netty服务端解析url交给JobSubmitHandler处理。

在这一章中，我们来分析一下JobManager接收到RestClient发送来的HttpRequest后的处理流程。

一、JobSubmitHandler解析JobGraph并交给Dispatcher

客户端构建好的JobGraph以及所需的资源会发送给WebMonitorEndpoint。在WebMonitorEndpoint内部有一个Router,用来解析url,并发送给url对应的handler，然后回调该handler的handleRequest方法，我们直接来看JobSubmitHandler的handleRequest方法：

    /*
    TODO 从磁盘文件反序列化得到JobGraph, 并转交给Dispatcher
     */
    @Override
    protected CompletableFuture<JobSubmitResponseBody> handleRequest(
            @Nonnull HandlerRequest<JobSubmitRequestBody, EmptyMessageParameters> request,
            @Nonnull DispatcherGateway gateway)
            throws RestHandlerException {
        // TODO 从请求中获取文件: 包含JobGraph序列化文件nameToFile
        final Collection<File> uploadedFiles = request.getUploadedFiles();
        final Map<String, Path> nameToFile =
                uploadedFiles.stream()
                        .collect(Collectors.toMap(File::getName, Path::fromLocalFile));

        if (uploadedFiles.size() != nameToFile.size()) {
            throw new RestHandlerException(
                    String.format(
                            "The number of uploaded files was %s than the expected count. Expected: %s Actual %s",
                            uploadedFiles.size() < nameToFile.size() ? "lower" : "higher",
                            nameToFile.size(),
                            uploadedFiles.size()),
                    HttpResponseStatus.BAD_REQUEST);
        }

        // TODO 拿到请求体
        final JobSubmitRequestBody requestBody = request.getRequestBody();

        if (requestBody.jobGraphFileName == null) {
            throw new RestHandlerException(
                    String.format(
                            "The %s field must not be omitted or be null.",
                            JobSubmitRequestBody.FIELD_NAME_JOB_GRAPH),
                    HttpResponseStatus.BAD_REQUEST);
        }

        // TODO 反序列化得到JobGraph
        // TODO 由此可见,服务端接收到客户端提交的,其实就是一个JobGraph
        CompletableFuture<JobGraph> jobGraphFuture = loadJobGraph(requestBody, nameToFile);

        // TODO 获取Job本体jar
        Collection<Path> jarFiles = getJarFilesToUpload(requestBody.jarFileNames, nameToFile);

        // TODO 获取job的依赖Jar
        Collection<Tuple2<String, Path>> artifacts =
                getArtifactFilesToUpload(requestBody.artifactFileNames, nameToFile);

        // TODO 将JobGraph + 程序Jar + 依赖Jar 上传至BlobServer
        CompletableFuture<JobGraph> finalizedJobGraphFuture =
                uploadJobGraphFiles(gateway, jobGraphFuture, jarFiles, artifacts, configuration);

        // TODO 转交给Dispatcher
        CompletableFuture<Acknowledge> jobSubmissionFuture =
                finalizedJobGraphFuture.thenCompose(
                        // TODO 由JobSubmitHandler转交给Dispatcher来执行处理
                        // TODO 此处的Gateway为Dispatcher的代理对象
                        jobGraph -> gateway.submitJob(jobGraph, timeout));

        return jobSubmissionFuture.thenCombine(
                jobGraphFuture,
                (ack, jobGraph) -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID()));
    }

在这个方法里做了以下工作：

1、从请求中获取文件: 包含JobGraph序列化文件nameToFile。

2、从请求中取出请求体

3、从请求体中取出JobGraph

4、从请求体中取出job本身的Jar

5、从请求体中拿到Job的依赖Jar

6、将JobGraph、Job本身Jar、Job依赖Jar上传至BlobServer

7、将JobGraph交给Dispatcher组件

我们先来看JobGraph的解析过程，点开loadJobGraph方法：

    private CompletableFuture<JobGraph> loadJobGraph(
            JobSubmitRequestBody requestBody, Map<String, Path> nameToFile)
            throws MissingFileException {
        final Path jobGraphFile =
                getPathAndAssertUpload(
                        requestBody.jobGraphFileName, FILE_TYPE_JOB_GRAPH, nameToFile);

        // TODO 从文件中反序列化JobGraph
        return CompletableFuture.supplyAsync(
                () -> {
                    JobGraph jobGraph;
                    try (ObjectInputStream objectIn =
                            new ObjectInputStream(
                                    jobGraphFile.getFileSystem().open(jobGraphFile))) {
                        jobGraph = (JobGraph) objectIn.readObject();
                    } catch (Exception e) {
                        throw new CompletionException(
                                new RestHandlerException(
                                        "Failed to deserialize JobGraph.",
                                        HttpResponseStatus.BAD_REQUEST,
                                        e));
                    }
                    return jobGraph;
                },
                executor);
    }

可以看到，是从文件系统中拿到JobGraphFile，并进行反序列化得到JobGraph。

我们再来看将JobGraph + 程序Jar + 依赖Jar 上传至BlobServer的过程，点开uploadJobGraphFiles方法：

    private CompletableFuture<JobGraph> uploadJobGraphFiles(
            DispatcherGateway gateway,
            CompletableFuture<JobGraph> jobGraphFuture,
            Collection<Path> jarFiles,
            Collection<Tuple2<String, Path>> artifacts,
            Configuration configuration) {
        CompletableFuture<Integer> blobServerPortFuture = gateway.getBlobServerPort(timeout);

        return jobGraphFuture.thenCombine(
                blobServerPortFuture,
                (JobGraph jobGraph, Integer blobServerPort) -> {
                    final InetSocketAddress address =
                            new InetSocketAddress(gateway.getHostname(), blobServerPort);
                    try {
                        // TODO BIO通信,BlobClient => BlobServer
                        ClientUtils.uploadJobGraphFiles(
                                jobGraph,
                                jarFiles,
                                artifacts,
                                () -> new BlobClient(address, configuration));
                    } catch (FlinkException e) {
                        throw new CompletionException(
                                new RestHandlerException(
                                        "Could not upload job files.",
                                        HttpResponseStatus.INTERNAL_SERVER_ERROR,
                                        e));
                    }
                    return jobGraph;
                });
    }

上传JobGraph相关的资源文件，这里是通过BlobClient进行上传，上传到BlobServer，在JobManager启动时我们还讲过BlobServer会有一个1小时的定时任务,会定时清理用不到的资源文件。

二、Dispatcher接收JobGraph并初始化JobMaster并启动JobMaster

在转交JobGraph给Dispatcher时，是通过调用Dispatcher的代理对象方法实现的，我们点进gateway.submitJob方法，选择Dispatcher实现：

@Override
public CompletableFuture<Acknowledge> submitJob(JobGraph jobGraph, Time timeout) {
    log.info("Received JobGraph submission {} ({}).", jobGraph.getJobID(), jobGraph.getName());
    try {
        // TODO jobID的去重判断
        if (isDuplicateJob(jobGraph.getJobID())) {
            return FutureUtils.completedExceptionally(
                    new DuplicateJobSubmissionException(jobGraph.getJobID()));
        } else if (isPartialResourceConfigured(jobGraph)) {
            return FutureUtils.completedExceptionally(
                    new JobSubmissionException(
                            jobGraph.getJobID(),
                            "Currently jobs is not supported if parts of the vertices have "
                                    + "resources configured. The limitation will be removed in future versions."));
        } else {
            // TODO 提交Job,此时JobGraph所需的jar和文件都已经上传
            // TODO 此处携带的JobGraph,会在一会启动JobMaster的时候,会用来构建ExecutionGraph
            return internalSubmitJob(jobGraph);
        }
    } catch (FlinkException e) {
        return FutureUtils.completedExceptionally(e);
    }
}

代码执行到这里时，JobGraph所需的Jar和其他资源文件已上传至BlobServer服务器，我们继续点进internalSubmitJob(jobGraph)：

    private CompletableFuture<Acknowledge> internalSubmitJob(JobGraph jobGraph) {
        log.info("Submitting job {} ({}).", jobGraph.getJobID(), jobGraph.getName());

        final CompletableFuture<Acknowledge> persistAndRunFuture =
                // TODO 先持久化,然后运行(拉起JobMaster),this::persistAndRunJob
                waitForTerminatingJob(jobGraph.getJobID(), jobGraph, this::persistAndRunJob)
                        .thenApply(ignored -> Acknowledge.get());

        return persistAndRunFuture.handleAsync(
                (acknowledge, throwable) -> {
                    if (throwable != null) {
                        cleanUpJobData(jobGraph.getJobID(), true);

                        ClusterEntryPointExceptionUtils.tryEnrichClusterEntryPointError(throwable);
                        final Throwable strippedThrowable =
                                ExceptionUtils.stripCompletionException(throwable);
                        log.error(
                                "Failed to submit job {}.", jobGraph.getJobID(), strippedThrowable);
                        throw new CompletionException(
                                new JobSubmissionException(
                                        jobGraph.getJobID(),
                                        "Failed to submit job.",
                                        strippedThrowable));
                    } else {
                        return acknowledge;
                    }
                },
                ioExecutor);
    }

我们继续点进this::persistAndRunJob方法：

 private void persistAndRunJob(JobGraph jobGraph) throws Exception {
        // TODO 服务端保存JobGraph此处是将JobGraph持久化到FileSystem(例如hdfs)上,返回一个stateHandle(句柄),并将状态句柄保存在zk里面
        // TODO 之前在讲主节点启动时Dispatcher会启动一个JobGraphStore服务,并且如果里面还有未执行完的JobGraph,会先进行恢复
        // TODO JobGraphWriter = DefaultJobGraphStore
        jobGraphWriter.putJobGraph(jobGraph);
        // TODO
        runJob(jobGraph, ExecutionType.SUBMISSION);
    }

之前在讲主节点启动时Dispatcher会启动一个JobGraphStore服务，并且如果里面还有未执行完的JobGraph，会先进行恢复。这里的JobGraphWriter就是JobGraphStore，我们点进jobGraphWriter.putJobGraph(jobGraph)方法，选择DefaultJobGraphStore实现：

    @Override
    public void putJobGraph(JobGraph jobGraph) throws Exception {
        checkNotNull(jobGraph, "Job graph");

        final JobID jobID = jobGraph.getJobID();
        final String name = jobGraphStoreUtil.jobIDToName(jobID);

        LOG.debug("Adding job graph {} to {}.", jobID, jobGraphStateHandleStore);

        boolean success = false;

        while (!success) {
            synchronized (lock) {
                verifyIsRunning();

                final R currentVersion = jobGraphStateHandleStore.exists(name);

                if (!currentVersion.isExisting()) {
                    try {
                        // TODO
                        jobGraphStateHandleStore.addAndLock(name, jobGraph);

                        addedJobGraphs.add(jobID);

                        success = true;
                    } catch (StateHandleStore.AlreadyExistException ignored) {
                        LOG.warn("{} already exists in {}.", jobGraph, jobGraphStateHandleStore);
                    }
                } else if (addedJobGraphs.contains(jobID)) {
                    try {
                        jobGraphStateHandleStore.replace(name, currentVersion, jobGraph);
                        LOG.info("Updated {} in {}.", jobGraph, getClass().getSimpleName());

                        success = true;
                    } catch (StateHandleStore.NotExistException ignored) {
                        LOG.warn("{} does not exists in {}.", jobGraph, jobGraphStateHandleStore);
                    }
                } else {
                    throw new IllegalStateException(
                            "Trying to update a graph you didn't "
                                    + "#getAllSubmittedJobGraphs() or #putJobGraph() yourself before.");
                }
            }
        }

        LOG.info("Added {} to {}.", jobGraph, jobGraphStateHandleStore);
    }

在这段代码里，获取了一些Job的相关信息，并确认Job的运行状态，我们点进jobGraphStateHandleStore.addAndLock方法，选择zookeeper的实现：

    @Override
    public RetrievableStateHandle<T> addAndLock(String pathInZooKeeper, T state)
            throws PossibleInconsistentStateException, Exception {
        checkNotNull(pathInZooKeeper, "Path in ZooKeeper");
        checkNotNull(state, "State");
        final String path = normalizePath(pathInZooKeeper);
        if (exists(path).isExisting()) {
            throw new AlreadyExistException(
                    String.format("ZooKeeper node %s already exists.", path));
        }
        // TODO 保存在fileSystem上,并返回一个状态句柄
        final RetrievableStateHandle<T> storeHandle = storage.store(state);
        // TODO 先序列化该状态句柄,转为字节序列化数据
        final byte[] serializedStoreHandle = serializeOrDiscard(storeHandle);
        try {
            // TODO 存储在zk上
            writeStoreHandleTransactionally(path, serializedStoreHandle);
            return storeHandle;
        } catch (KeeperException.NodeExistsException e) {
            // Transactions are not idempotent in the curator version we're currently using, so it
            // is actually possible that we've re-tried a transaction that has already succeeded.
            // We've ensured that the node hasn't been present prior executing the transaction, so
            // we can assume that this is a result of the retry mechanism.
            return storeHandle;
        } catch (Exception e) {
            if (indicatesPossiblyInconsistentState(e)) {
                throw new PossibleInconsistentStateException(e);
            }
            // In case of any other failure, discard the state and rethrow the exception.
            storeHandle.discardState();
            throw e;
        }
    }

可以看到，这里将JobGraph先持久化到外部存储系统，例如hdfs，再获句柄，再将句柄保存在zookeeper上，这里将句柄保存在zk上是处于性能效率考虑。

在完成了JobGraph的持久化后，将开始执行Job，我们回到这段代码：

private void persistAndRunJob(JobGraph jobGraph) throws Exception {
        // TODO 服务端保存JobGraph此处是将JobGraph持久化到FileSystem(例如hdfs)上,返回一个stateHandle(句柄),并将状态句柄保存在zk里面
        // TODO 之前在讲主节点启动时Dispatcher会启动一个JobGraphStore服务,并且如果里面还有未执行完的JobGraph,会先进行恢复
        // TODO JobGraphWriter = DefaultJobGraphStore
        jobGraphWriter.putJobGraph(jobGraph);
        // TODO
        runJob(jobGraph, ExecutionType.SUBMISSION);
    }

点进runJob方法：

    private void runJob(JobGraph jobGraph, ExecutionType executionType) throws Exception {
        Preconditions.checkState(!runningJobs.containsKey(jobGraph.getJobID()));
        long initializationTimestamp = System.currentTimeMillis();
        /*
        TODO 创建JobManagerRunner,这是一个启动器,内部会初始化DefaultJobMasterServiceProcessFactory对象
         在JobMaster竞选完成后,DefaultJobMasterServiceProcessFactory对象会做两件重要的事情:
         1. 创建JobMaster实例
         2. 在创建JobMaster的时候,同时会把JobGraph变成ExecutionGraph

         TODO Flink集群的两个主从架构:
          1. 资源管理 ResourceManager + TaskExecutor
          2. 任务运行 JobMaster + StreamTask
         */
        JobManagerRunner jobManagerRunner =
                createJobManagerRunner(jobGraph, initializationTimestamp);

        // TODO 加入 runningJobs 队列
        runningJobs.put(jobGraph.getJobID(), jobManagerRunner);

        final JobID jobId = jobGraph.getJobID();

        final CompletableFuture<CleanupJobState> cleanupJobStateFuture =
                jobManagerRunner
                        .getResultFuture()
                        .handleAsync(
                                (jobManagerRunnerResult, throwable) -> {
                                    Preconditions.checkState(
                                            runningJobs.get(jobId) == jobManagerRunner,
                                            "The job entry in runningJobs must be bound to the lifetime of the JobManagerRunner.");

                                    if (jobManagerRunnerResult != null) {
                                        return handleJobManagerRunnerResult(
                                                jobManagerRunnerResult, executionType);
                                    } else {
                                        return jobManagerRunnerFailed(jobId, throwable);
                                    }
                                },
                                getMainThreadExecutor());

        final CompletableFuture<Void> jobTerminationFuture =
                cleanupJobStateFuture
                        .thenApply(cleanupJobState -> removeJob(jobId, cleanupJobState))
                        .thenCompose(Function.identity());

        FutureUtils.assertNoException(jobTerminationFuture);
        registerJobManagerRunnerTerminationFuture(jobId, jobTerminationFuture);
    }

在这段代码里，首先构建了一个JobManagerRunner这么一个启动器，但是这个JobManager并不是我们所说的主节点JobManager，我们点进createJobManagerRunner方法：

    JobManagerRunner createJobManagerRunner(JobGraph jobGraph, long initializationTimestamp)
            throws Exception {
        final RpcService rpcService = getRpcService();

        // TODO 构建JobManagerRunner,内部分装了一个DefaultJobMasterServiceProcessFactory,
        //  此对象内部会在后面leader竞选完成后构建JobMaster并启动
        JobManagerRunner runner =
                jobManagerRunnerFactory.createJobManagerRunner(
                        jobGraph,
                        configuration,
                        rpcService,
                        highAvailabilityServices,
                        heartbeatServices,
                        jobManagerSharedServices,
                        new DefaultJobManagerJobMetricGroupFactory(jobManagerMetricGroup),
                        fatalErrorHandler,
                        initializationTimestamp);
        // TODO 开始JobMaster的选举,选举成功后会在ZooKeeperLeaderElectionDriver的isLeader方法中创建并启动JobMaster
        runner.start();
        return runner;
    }

可以看到，这里通过工厂方法构建了一个JobManagerRunner，并启动了这个runner。

2.1、初始化JobMaster所需的相关基础服务

我们点jobManagerRunnerFactory.createJobManagerRunner：

    @Override
    public JobManagerRunner createJobManagerRunner(
            JobGraph jobGraph,
            Configuration configuration,
            RpcService rpcService,
            HighAvailabilityServices highAvailabilityServices,
            HeartbeatServices heartbeatServices,
            JobManagerSharedServices jobManagerServices,
            JobManagerJobMetricGroupFactory jobManagerJobMetricGroupFactory,
            FatalErrorHandler fatalErrorHandler,
            long initializationTimestamp)
            throws Exception {

        checkArgument(jobGraph.getNumberOfVertices() > 0, "The given job is empty");

        final JobMasterConfiguration jobMasterConfiguration =
                JobMasterConfiguration.fromConfiguration(configuration);

        final RunningJobsRegistry runningJobsRegistry =
                highAvailabilityServices.getRunningJobsRegistry();
        // TODO 获取选举服务,准备进行JobMaster的leader选举
        final LeaderElectionService jobManagerLeaderElectionService =
                highAvailabilityServices.getJobManagerLeaderElectionService(jobGraph.getJobID());

        final SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory =
                DefaultSlotPoolServiceSchedulerFactory.fromConfiguration(
                        configuration, jobGraph.getJobType());

        if (jobMasterConfiguration.getConfiguration().get(JobManagerOptions.SCHEDULER_MODE)
                == SchedulerExecutionMode.REACTIVE) {
            Preconditions.checkState(
                    slotPoolServiceSchedulerFactory.getSchedulerType()
                            == JobManagerOptions.SchedulerType.Adaptive,
                    "Adaptive Scheduler is required for reactive mode");
        }

        final ShuffleMaster<?> shuffleMaster =
                ShuffleServiceLoader.loadShuffleServiceFactory(configuration)
                        .createShuffleMaster(configuration);

        final LibraryCacheManager.ClassLoaderLease classLoaderLease =
                jobManagerServices
                        .getLibraryCacheManager()
                        .registerClassLoaderLease(jobGraph.getJobID());

        final ClassLoader userCodeClassLoader =
                classLoaderLease
                        .getOrResolveClassLoader(
                                jobGraph.getUserJarBlobKeys(), jobGraph.getClasspaths())
                        .asClassLoader();

        // TODO 构建DefaultJobMasterServiceFactory,封装了JobMaster启动所需的基础服务
        final DefaultJobMasterServiceFactory jobMasterServiceFactory =
                new DefaultJobMasterServiceFactory(
                        jobManagerServices.getScheduledExecutorService(),
                        rpcService,
                        jobMasterConfiguration,
                        jobGraph,
                        highAvailabilityServices,
                        slotPoolServiceSchedulerFactory,
                        jobManagerServices,
                        heartbeatServices,
                        jobManagerJobMetricGroupFactory,
                        fatalErrorHandler,
                        userCodeClassLoader,
                        shuffleMaster,
                        initializationTimestamp);

        final DefaultJobMasterServiceProcessFactory jobMasterServiceProcessFactory =
                new DefaultJobMasterServiceProcessFactory(
                        jobGraph.getJobID(),
                        jobGraph.getName(),
                        jobGraph.getCheckpointingSettings(),
                        initializationTimestamp,
                        jobMasterServiceFactory);

        return new JobMasterServiceLeadershipRunner(
                jobMasterServiceProcessFactory,
                jobManagerLeaderElectionService,
                runningJobsRegistry,
                classLoaderLease,
                fatalErrorHandler);
    }

这段代码蛮长的，但是脉络很清晰，在里面初始化了一些基础JobMaster所需要的基础服务，例如JobMaster的Leader竞选服务jobManagerLeaderElectionService，并且初始化了一个很重要的组件DefaultJobMasterServiceProcessFactory，JobMaster的初始化以及启动都是在这个里面完成的。

接下来我们回去看刚才那段代码，看runner的启动流程，

2.2、JobMaster的Leader竞选流程

我们点进runner.start()方法：

@Override
    public void start() throws Exception {
        LOG.debug("Start leadership runner for job {}.", getJobID());
        // TODO
        leaderElectionService.start(this);
    }

再点进leaderElectionService.start方法，选择DefaultLeaderElectionService实现：

    @Override
    public final void start(LeaderContender contender) throws Exception {
        checkNotNull(contender, "Contender must not be null.");
        Preconditions.checkState(leaderContender == null, "Contender was already set.");

        synchronized (lock) {
            /*
             TODO 在WebMonitorEndpoint中调用时，此contender为DispatcherRestEndPoint
              在ResourceManager中调用时,contender为ResourceManager
              在DispatcherRunner中调用时,contender为DispatcherRunner
              当JobMaster竞选时contender为JobMasterServiceLeadershipRunner
             */
            leaderContender = contender;

            // TODO 此处创建选举对象 leaderElectionDriver
            leaderElectionDriver =
                    leaderElectionDriverFactory.createLeaderElectionDriver(
                            this,
                            new LeaderElectionFatalErrorHandler(),
                            leaderContender.getDescription());
            LOG.info("Starting DefaultLeaderElectionService with {}.", leaderElectionDriver);

            running = true;
        }
    }

可以看到我们又回到了这里，在之前分析JobManager启动流程的时候，JobManager中的三大核心组件的选举都使用过这个方法，由于目前是JobMaster的选举，这里的contender是JobMasterServiceLeadershipRunner。我们继续点进leaderElectionDriverFactory.createLeaderElectionDriver方法，选择zookeeper实现：

@Override
    public ZooKeeperLeaderElectionDriver createLeaderElectionDriver(
            LeaderElectionEventHandler leaderEventHandler,
            FatalErrorHandler fatalErrorHandler,
            String leaderContenderDescription)
            throws Exception {
        // TODO
        return new ZooKeeperLeaderElectionDriver(
                client,
                latchPath,
                leaderPath,
                leaderEventHandler,
                fatalErrorHandler,
                leaderContenderDescription);
    }

再进入ZooKeeperLeaderElectionDriver的构造方法：

    public ZooKeeperLeaderElectionDriver(
            CuratorFramework client,
            String latchPath,
            String leaderPath,
            LeaderElectionEventHandler leaderElectionEventHandler,
            FatalErrorHandler fatalErrorHandler,
            String leaderContenderDescription)
            throws Exception {
        this.client = checkNotNull(client);
        this.leaderPath = checkNotNull(leaderPath);
        this.leaderElectionEventHandler = checkNotNull(leaderElectionEventHandler);
        this.fatalErrorHandler = checkNotNull(fatalErrorHandler);
        this.leaderContenderDescription = checkNotNull(leaderContenderDescription);

        leaderLatch = new LeaderLatch(client, checkNotNull(latchPath));
        cache = new NodeCache(client, leaderPath);

        client.getUnhandledErrorListenable().addListener(this);

        running = true;

        // TODO 开始选举
        leaderLatch.addListener(this);
        leaderLatch.start();

        /*
        TODO 选举开始后，不就会接收到响应：
         1.如果竞选成功，则回调该类的isLeader方法
         2.如果竞选失败，则回调该类的notLeader方法
         每一个竞选者对应一个竞选Driver
         */

        cache.getListenable().addListener(this);
        cache.start();

        client.getConnectionStateListenable().addListener(listener);
    }

可以看到在这里将开始进行Leader的选举。正如我们之前再JobManager启动时讲到的，在选举完成之后，如果选举成功则会回调当前类的isLeader方法，我们直接去看该方法：

 /*
    选举成功
     */
    @Override
    public void isLeader() {
        // TODO
        leaderElectionEventHandler.onGrantLeadership();
    }

再进入leaderElectionEventHandler.onGrantLeadership()：

@Override
    @GuardedBy("lock")
    public void onGrantLeadership() {
        synchronized (lock) {
            if (running) {
                issuedLeaderSessionID = UUID.randomUUID();
                clearConfirmedLeaderInformation();

                if (LOG.isDebugEnabled()) {
                    LOG.debug(
                            "Grant leadership to contender {} with session ID {}.",
                            leaderContender.getDescription(),
                            issuedLeaderSessionID);
                }

                /*
                TODO 有4种竞选者类型，LeaderContender有4种情况
                 1.Dispatcher = DefaultDispatcherRunner
                 2.JobMaster = JobMasterServiceLeadershipRunner
                 3.ResourceManager = ResourceManager
                 4.WebMonitorEndpoint = WebMonitorEndpoint
                 */
                leaderContender.grantLeadership(issuedLeaderSessionID);
            } else {
                if (LOG.isDebugEnabled()) {
                    LOG.debug(
                            "Ignoring the grant leadership notification since the {} has "
                                    + "already been closed.",
                            leaderElectionDriver);
                }
            }
        }
    }

再进入leaderContender.grantLeadership方法，选择JobMasterServiceLeadershipRunner实现：

   @Override
    public void grantLeadership(UUID leaderSessionID) {
        // TODO 检验启动状态
        runIfStateRunning(
                // TODO 创建JobMaster并启动
                () -> startJobMasterServiceProcessAsync(leaderSessionID),
                "starting a new JobMasterServiceProcess");
    }

我们再进入startJobMasterServiceProcessAsync方法：

 @GuardedBy("lock")
    private void startJobMasterServiceProcessAsync(UUID leaderSessionId) {
        sequentialOperation =
                sequentialOperation.thenRun(
                        // TODO 校验leader状态
                        () ->
                                runIfValidLeader(
                                        leaderSessionId,
                                        ThrowingRunnable.unchecked(
                                                // TODO 创建jobMaster并启动
                                                () ->
                                                        verifyJobSchedulingStatusAndCreateJobMasterServiceProcess(
                                                                leaderSessionId)),
                                        "verify job scheduling status and create JobMasterServiceProcess"));

        handleAsyncOperationError(sequentialOperation, "Could not start the job manager.");
    }

可以看到这里做了一个leader状态校验，我们继续点进verifyJobSchedulingStatusAndCreateJobMasterServiceProcess方法：

    @GuardedBy("lock")
    private void verifyJobSchedulingStatusAndCreateJobMasterServiceProcess(UUID leaderSessionId)
            throws FlinkException {
        final RunningJobsRegistry.JobSchedulingStatus jobSchedulingStatus =
                getJobSchedulingStatus();

        if (jobSchedulingStatus == RunningJobsRegistry.JobSchedulingStatus.DONE) {
            jobAlreadyDone();
        } else {
            // TODO 创建JobMaster并启动
            createNewJobMasterServiceProcess(leaderSessionId);
        }
    }

这里会进行一个Job状态的校验，看Job是否已完成，我们再进入createNewJobMasterServiceProcess方法：

    @GuardedBy("lock")
    private void createNewJobMasterServiceProcess(UUID leaderSessionId) throws FlinkException {
        Preconditions.checkState(jobMasterServiceProcess.closeAsync().isDone());

        LOG.debug(
                "Create new JobMasterServiceProcess because we were granted leadership under {}.",
                leaderSessionId);

        try {
            // TODO 状态注册,标识当前Job为Running状态
            runningJobsRegistry.setJobRunning(getJobID());
        } catch (IOException e) {
            throw new FlinkException(
                    String.format(
                            "Failed to set the job %s to running in the running jobs registry.",
                            getJobID()),
                    e);
        }

        // TODO 创建JobMaster并启动
        jobMasterServiceProcess = jobMasterServiceProcessFactory.create(leaderSessionId);

        forwardIfValidLeader(
                leaderSessionId,
                jobMasterServiceProcess.getJobMasterGatewayFuture(),
                jobMasterGatewayFuture,
                "JobMasterGatewayFuture from JobMasterServiceProcess");
        forwardResultFuture(leaderSessionId, jobMasterServiceProcess.getResultFuture());
        confirmLeadership(leaderSessionId, jobMasterServiceProcess.getLeaderAddressFuture());
    }

可以看到，这里先对当前Job进行了状态注册，注册为Running状态，我们再进入jobMasterServiceProcessFactory.create方法：

 @Override
    public JobMasterServiceProcess create(UUID leaderSessionId) {
        // TODO 内部构建JobMaster并启动
        return new DefaultJobMasterServiceProcess(
                jobId,
                leaderSessionId,
                jobMasterServiceFactory,
                cause -> createArchivedExecutionGraph(JobStatus.FAILED, cause));
    }

再点进DefaultJobMasterServiceProcess的构造方法：

    public DefaultJobMasterServiceProcess(
            JobID jobId,
            UUID leaderSessionId,
            JobMasterServiceFactory jobMasterServiceFactory,
            Function<Throwable, ArchivedExecutionGraph> failedArchivedExecutionGraphFactory) {
        this.jobId = jobId;
        this.leaderSessionId = leaderSessionId;
        // TODO 构建JobMaster并启动
        this.jobMasterServiceFuture =
                jobMasterServiceFactory.createJobMasterService(leaderSessionId, this);

        jobMasterServiceFuture.whenComplete(
                (jobMasterService, throwable) -> {
                    if (throwable != null) {
                        final JobInitializationException jobInitializationException =
                                new JobInitializationException(
                                        jobId, "Could not start the JobMaster.", throwable);

                        LOG.debug(
                                "Initialization of the JobMasterService for job {} under leader id {} failed.",
                                jobId,
                                leaderSessionId,
                                jobInitializationException);

                        resultFuture.complete(
                                JobManagerRunnerResult.forInitializationFailure(
                                        new ExecutionGraphInfo(
                                                failedArchivedExecutionGraphFactory.apply(
                                                        jobInitializationException)),
                                        jobInitializationException));
                    } else {
                        registerJobMasterServiceFutures(jobMasterService);
                    }
                });
    }

这里使用了异步编程构建并启动JobMaster，并对启动结果进行检查是否有异常，我们点进jobMasterServiceFactory.createJobMasterService方法：

    @Override
    public CompletableFuture<JobMasterService> createJobMasterService(
            UUID leaderSessionId, OnCompletionActions onCompletionActions) {

        return CompletableFuture.supplyAsync(
                FunctionUtils.uncheckedSupplier(
                        // TODO 内部构建JobMaster并启动
                        () -> internalCreateJobMasterService(leaderSessionId, onCompletionActions)),
                executor);
    }

2.3、JobMaster的初始化和启动

再点进internalCreateJobMasterService方法：

    private JobMasterService internalCreateJobMasterService(
            UUID leaderSessionId, OnCompletionActions onCompletionActions) throws Exception {

        final JobMaster jobMaster =
                new JobMaster(
                        rpcService,
                        JobMasterId.fromUuidOrNull(leaderSessionId),
                        jobMasterConfiguration,
                        ResourceID.generate(),
                        jobGraph,
                        haServices,
                        slotPoolServiceSchedulerFactory,
                        jobManagerSharedServices,
                        heartbeatServices,
                        jobManagerJobMetricGroupFactory,
                        onCompletionActions,
                        fatalErrorHandler,
                        userCodeClassloader,
                        shuffleMaster,
                        lookup ->
                                new JobMasterPartitionTrackerImpl(
                                        jobGraph.getJobID(), shuffleMaster, lookup),
                        new DefaultExecutionDeploymentTracker(),
                        DefaultExecutionDeploymentReconciler::new,
                        initializationTimestamp);

        // TODO JobMaster继承了Endpoint,所以在初始化完成后会回调JobMaster的onStart方法
        jobMaster.start();

        return jobMaster;
    }

可以看到在这里完成了JobMaster的初始化以及启动。由于JobMaster继承自RpcEndpoint，在之前的Akka章节中我们讲到过，所以这里在完成JobMaster的初始化后会回调JobMaster的onStart生命周期方法，此处的JobMaster.start并没有什么实质性的工作，只是向自己发送了一条消息告知已启动完毕。我们去看JobMaster的onStart方法：

    @Override
    protected void onStart() throws JobMasterException {
        try {
            // TODO JobMaster向 ResourceManager注册,开始申请Slot并且调度部署StreamTask
            startJobExecution();
        } catch (Exception e) {
            final JobMasterException jobMasterException =
                    new JobMasterException("Could not start the JobMaster.", e);
            handleJobMasterError(jobMasterException);
            throw jobMasterException;
        }
    }

在这个方法里，JobMaster即将通过startJobExecution()进行注册动作，以及Slot的申请工作，我们点进startJobExecution()方法：

    private void startJobExecution() throws Exception {
        validateRunsInMainThread();

        // TODO 启动一些服务
        startJobMasterServices();

        log.info(
                "Starting execution of job {} ({}) under job master id {}.",
                jobGraph.getName(),
                jobGraph.getJobID(),
                getFencingToken());

        // TODO 解析ExecutionGraph,申请Slot,部署Task到TaskExecutor
        startScheduling();
    }

我们首先来看startJobMasterServices()方法，点进来：

    private void startJobMasterServices() throws Exception {
        try {
            // TODO 启动两个心跳服务
            this.taskManagerHeartbeatManager = createTaskManagerHeartbeatManager(heartbeatServices);
            this.resourceManagerHeartbeatManager =
                    createResourceManagerHeartbeatManager(heartbeatServices);

            // start the slot pool make sure the slot pool now accepts messages for this leader
            // TODO 启动Slot管理服务,内部启动了3个定时任务
            slotPoolService.start(getFencingToken(), getAddress(), getMainThreadExecutor());

            // job is ready to go, try to establish connection with resource manager
            //   - activate leader retrieval for the resource manager
            //   - on notification of the leader, the connection will be established and
            //     the slot pool will start requesting slots
            // TODO 监听ResourceManager的地址更改
            resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
        } catch (Exception e) {
            handleStartJobMasterServicesError(e);
        }
    }

可以看到这里做了三件事：

1、启动两个心跳服务

2、启动Slot管理服务，内部启动了3个定时任务

3、监听ResourceManager的地址

由于在后面的章节中我们会专门来讲Slot 的管理以及调度，所以这里就先不分析Slot了，我们回到上层方法看 startScheduling()方法，一路点进来，选择SchedulerBase实现：

    @Override
    public final void startScheduling() {
        mainThreadExecutor.assertRunningInMainThread();
        registerJobMetrics();
        operatorCoordinatorHandler.startAllOperatorCoordinators();
        // TODO
        startSchedulingInternal();
    }

在点进startSchedulingInternal方法：

@Override
    protected void startSchedulingInternal() {
        log.info(
                "Starting scheduling with scheduling strategy [{}]",
                schedulingStrategy.getClass().getName());
        transitionToRunning();
        // TODO  申请Slot,并部署StreamTask运行
        schedulingStrategy.startScheduling();
    }

再点进schedulingStrategy.startScheduling()方法：

 @Override
    public void startScheduling() {
        final Set<SchedulingPipelinedRegion> sourceRegions =
                IterableUtils.toStream(schedulingTopology.getAllPipelinedRegions())
                        .filter(this::isSourceRegion)
                        .collect(Collectors.toSet());
        // TODO 申请Slot,并部署StreamTask运行
        maybeScheduleRegions(sourceRegions);
    }

在这里，即将进行Slot的申请，我们再点进maybeScheduleRegions方法：

private void maybeScheduleRegions(final Set<SchedulingPipelinedRegion> regions) {
        final List<SchedulingPipelinedRegion> regionsSorted =
                SchedulingStrategyUtils.sortPipelinedRegionsInTopologicalOrder(
                        schedulingTopology, regions);

        final Map<ConsumedPartitionGroup, Boolean> consumableStatusCache = new HashMap<>();
        for (SchedulingPipelinedRegion region : regionsSorted) {
            // TODO 申请Slot,并部署StreamTask运行
            maybeScheduleRegion(region, consumableStatusCache);
        }
    }

再点进maybeScheduleRegion方法：

@Override
    public void allocateSlotsAndDeploy(
            final List<ExecutionVertexDeploymentOption> executionVertexDeploymentOptions) {
        validateDeploymentOptions(executionVertexDeploymentOptions);

        final Map<ExecutionVertexID, ExecutionVertexDeploymentOption> deploymentOptionsByVertex =
                groupDeploymentOptionsByVertexId(executionVertexDeploymentOptions);

        final List<ExecutionVertexID> verticesToDeploy =
                executionVertexDeploymentOptions.stream()
                        .map(ExecutionVertexDeploymentOption::getExecutionVertexId)
                        .collect(Collectors.toList());

        final Map<ExecutionVertexID, ExecutionVertexVersion> requiredVersionByVertex =
                executionVertexVersioner.recordVertexModifications(verticesToDeploy);

        transitionToScheduled(verticesToDeploy);

        // TODO 申请Slot
        final List<SlotExecutionVertexAssignment> slotExecutionVertexAssignments =
                allocateSlots(executionVertexDeploymentOptions);

        final List<DeploymentHandle> deploymentHandles =
                createDeploymentHandles(
                        requiredVersionByVertex,
                        deploymentOptionsByVertex,
                        slotExecutionVertexAssignments);

        // TODO 部署Task
        waitForAllSlotsAndDeploy(deploymentHandles);
    }

在这个方法里，主要做了两件事：

1、Slot的申请

2、Task的部署

具体的实现过程我们会在后续Slot的管理章节中详细分析。

到这里，JobMaster的已经启动完成了。

总结

客户端构建好的JobGraph以及所需的资源会发送给WebMonitorEndpoint。在WebMonitorEndpoint内部有一个Router,用来解析url,并发送给url对应的handler，然后回调该handler，也就是JobSubmitHandler的handleRequest方法。

在handleRequest的方法内会解析请求体中的Job信息以及Job所需的资源，包括JobGraph、Job本身的Jar、Job依赖的Jar等，解析完成后JobSubmitHandler将JobGraph交给Dispatcher来处理。

Dispatcher在接收到JobGraph后开始着手准备JobMaster的初始化和启动，最先做的事是初始化了一堆JobMaster所需的基础服务，然后构建了一个重要对象DefaultJobMasterServiceFactory，然后开始准备JobMaster的Leader竞选。

在JobMaster完成选举之后，会回调isLeader方法，并开始进行JobMaster的初始化，由于JobMaster继承了RPCEndpoint，JobMaster会在初始化完成后回调onStart生命周期方法。

在onStart生命周期方法里，JobMaster进行了Slot的申请以及Task的部署工作。