Flink源码-JobGraph的处理和JobMaster的创建

源码挖掘机

已于 2023-12-05 16:02:15 修改

阅读量299

点赞数

分类专栏： flink 文章标签： flink 大数据

于 2023-08-27 19:41:33 首次发布

本文链接：https://blog.csdn.net/bigdatakenan/article/details/132524782

版权

flink 专栏收录该内容

23 篇文章 3 订阅

订阅专栏

上节我们看到了RestClusterClient上传了jobgraph和作业相关的依赖，jar包等，可是flink集群是如何获取jobgraph如何继续下一步操作的呢，今天我们接着看。

1.handleRequest方法

flink集群是通过获取WebMonitorEndpoint来处理来自RestClient的JobSubmit请求的，而真正负责处理JobSubmit请求的是JobSubmitHandler，下面我们可以看一下这个方法的handleRequest方法

该方法中主要做了以下：

1.获取restClient的文件

2.获取jobgraph和对应的依赖，jar包

3.通过dispatchergateway提交jobgraph

@Override
    protected CompletableFuture<JobSubmitResponseBody> handleRequest(
            @Nonnull HandlerRequest<JobSubmitRequestBody, EmptyMessageParameters> request,
            @Nonnull DispatcherGateway gateway)
            throws RestHandlerException {
        
        //获取restClient上传的文件
        final Collection<File> uploadedFiles = request.getUploadedFiles();
        final Map<String, Path> nameToFile =
                uploadedFiles.stream()
                        .collect(Collectors.toMap(File::getName, Path::fromLocalFile));

        if (uploadedFiles.size() != nameToFile.size()) {
            throw new RestHandlerException(
                    String.format(
                            "The number of uploaded files was %s than the expected count. Expected: %s Actual %s",
                            uploadedFiles.size() < nameToFile.size() ? "lower" : "higher",
                            nameToFile.size(),
                            uploadedFiles.size()),
                    HttpResponseStatus.BAD_REQUEST);
        }

        final JobSubmitRequestBody requestBody = request.getRequestBody();

        if (requestBody.jobGraphFileName == null) {
            throw new RestHandlerException(
                    String.format(
                            "The %s field must not be omitted or be null.",
                            JobSubmitRequestBody.FIELD_NAME_JOB_GRAPH),
                    HttpResponseStatus.BAD_REQUEST);
        }
        //获取Jobgraph和对应的jar包和依赖
        CompletableFuture<JobGraph> jobGraphFuture = loadJobGraph(requestBody, nameToFile);

        Collection<Path> jarFiles = getJarFilesToUpload(requestBody.jarFileNames, nameToFile);

        Collection<Tuple2<String, Path>> artifacts =
                getArtifactFilesToUpload(requestBody.artifactFileNames, nameToFile);

        CompletableFuture<JobGraph> finalizedJobGraphFuture =
                uploadJobGraphFiles(gateway, jobGraphFuture, jarFiles, artifacts, configuration);        
        
        //通过dispatcherGateway提交jobgraph
        CompletableFuture<Acknowledge> jobSubmissionFuture =
                finalizedJobGraphFuture.thenCompose(
                        jobGraph -> gateway.submitJob(jobGraph, timeout));

        return jobSubmissionFuture.thenCombine(
                jobGraphFuture,
                (ack, jobGraph) -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID()));
    }

2.submitJob方法

submitjob有两个实现类：

dispatcher是在集群中使用的，MiniDispatcher适用于idea本地调试，所以这里我们直接看dispatcher的submitJob方法，该方法中开始了进行了一些简单的判断，然后调用internalSubmitJob方法

3.internalSubmitJob方法

此方法共做了两件事：

1.持久化并且运行提交的job，persistAndRunJob

2.捕捉提交运行job时的异常

 private CompletableFuture<Acknowledge> internalSubmitJob(JobGraph jobGraph) {
        log.info("Submitting job {} ({}).", jobGraph.getJobID(), jobGraph.getName());

        //持久化并且运行提交得job
        final CompletableFuture<Acknowledge> persistAndRunFuture =
                waitForTerminatingJob(jobGraph.getJobID(), jobGraph, this::persistAndRunJob)
                        .thenApply(ignored -> Acknowledge.get());

        return persistAndRunFuture.handleAsync(
                (acknowledge, throwable) -> {
                    if (throwable != null) {
                        cleanUpJobData(jobGraph.getJobID(), true);
 //捕捉提交和运行job时得异常
                        ClusterEntryPointExceptionUtils.tryEnrichClusterEntryPointError(throwable);
                        final Throwable strippedThrowable =
                                ExceptionUtils.stripCompletionException(throwable);
                        log.error(
                                "Failed to submit job {}.", jobGraph.getJobID(), strippedThrowable);
                        throw new CompletionException(
                                new JobSubmissionException(
                                        jobGraph.getJobID(),
                                        "Failed to submit job.",
                                        strippedThrowable));
                    } else {
                        return acknowledge;
                    }
                },
                ioExecutor);
    }

4.persistAndRunJob方法

5.runJob方法

该方法主要做三件事：
1.创建并启动jobmaster,运行提交的job

2.将该jobid和jobmaster加入运行job的集合中

3.异步获取jobmaster的执行结果

private void runJob(JobGraph jobGraph, ExecutionType executionType) throws Exception {
        Preconditions.checkState(!runningJobs.containsKey(jobGraph.getJobID()));
        long initializationTimestamp = System.currentTimeMillis();
        //创建jobmaster并启动，运行对应的job
        JobManagerRunner jobManagerRunner =
                createJobManagerRunner(jobGraph, initializationTimestamp);
        
        //将该jobid和jobmaster加入运行job的map中
        runningJobs.put(jobGraph.getJobID(), jobManagerRunner);

        final JobID jobId = jobGraph.getJobID();
        //异步获取jobmaster的执行结果
        final CompletableFuture<CleanupJobState> cleanupJobStateFuture =
                jobManagerRunner
                        .getResultFuture()
                        .handleAsync(
                                (jobManagerRunnerResult, throwable) -> {
                                    Preconditions.checkState(
                                            runningJobs.get(jobId) == jobManagerRunner,
                                            "The job entry in runningJobs must be bound to the lifetime of the JobManagerRunner.");

                                    if (jobManagerRunnerResult != null) {
                                        return handleJobManagerRunnerResult(
                                                jobManagerRunnerResult, executionType);
                                    } else {
                                        return jobManagerRunnerFailed(jobId, throwable);
                                    }
                                },
                                getMainThreadExecutor());

        final CompletableFuture<Void> jobTerminationFuture =
                cleanupJobStateFuture
                        .thenApply(cleanupJobState -> removeJob(jobId, cleanupJobState))
                        .thenCompose(Function.identity());

        FutureUtils.assertNoException(jobTerminationFuture);
        registerJobManagerRunnerTerminationFuture(jobId, jobTerminationFuture);
    }

6.creatJobManagerRunner方法

该方法主要的功能是创建jobmaster,下面我详细看一下中间都有哪些步骤

@Override
    public JobManagerRunner createJobManagerRunner(
            JobGraph jobGraph,
            Configuration configuration,
            RpcService rpcService,
            HighAvailabilityServices highAvailabilityServices,
            HeartbeatServices heartbeatServices,
            JobManagerSharedServices jobManagerServices,
            JobManagerJobMetricGroupFactory jobManagerJobMetricGroupFactory,
            FatalErrorHandler fatalErrorHandler,
            long initializationTimestamp)
            throws Exception {

        checkArgument(jobGraph.getNumberOfVertices() > 0, "The given job is empty");
        //从配置文件中获取jobmaster的相关配置
        final JobMasterConfiguration jobMasterConfiguration =
                JobMasterConfiguration.fromConfiguration(configuration);
        //下面两个是做jobmaster高可用的一些配置服务
        final RunningJobsRegistry runningJobsRegistry =
                highAvailabilityServices.getRunningJobsRegistry();
        final LeaderElectionService jobManagerLeaderElectionService =
                highAvailabilityServices.getJobManagerLeaderElectionService(jobGraph.getJobID());
        //这个是用来调度taskmanager slot的
        final SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory =
                DefaultSlotPoolServiceSchedulerFactory.fromConfiguration(
                        configuration, jobGraph.getJobType());

        if (jobMasterConfiguration.getConfiguration().get(JobManagerOptions.SCHEDULER_MODE)
                == SchedulerExecutionMode.REACTIVE) {
            Preconditions.checkState(
                    slotPoolServiceSchedulerFactory.getSchedulerType()
                            == JobManagerOptions.SchedulerType.Adaptive,
                    "Adaptive Scheduler is required for reactive mode");
        }
        //管理shuffle的
        final ShuffleMaster<?> shuffleMaster =
                ShuffleServiceLoader.loadShuffleServiceFactory(configuration)
                        .createShuffleMaster(configuration);

        //下面是获取job的主类和依赖
        final LibraryCacheManager.ClassLoaderLease classLoaderLease =
                jobManagerServices
                        .getLibraryCacheManager()
                        .registerClassLoaderLease(jobGraph.getJobID());

        final ClassLoader userCodeClassLoader =
                classLoaderLease
                        .getOrResolveClassLoader(
                                jobGraph.getUserJarBlobKeys(), jobGraph.getClasspaths())
                        .asClassLoader();
     
        final DefaultJobMasterServiceFactory jobMasterServiceFactory =
                new DefaultJobMasterServiceFactory(
                        jobManagerServices.getScheduledExecutorService(),
                        rpcService,
                        jobMasterConfiguration,
                        jobGraph,
                        highAvailabilityServices,
                        slotPoolServiceSchedulerFactory,
                        jobManagerServices,
                        heartbeatServices,
                        jobManagerJobMetricGroupFactory,
                        fatalErrorHandler,
                        userCodeClassLoader,
                        shuffleMaster,
                        initializationTimestamp);

        final DefaultJobMasterServiceProcessFactory jobMasterServiceProcessFactory =
                new DefaultJobMasterServiceProcessFactory(
                        jobGraph.getJobID(),
                        jobGraph.getName(),
                        jobGraph.getCheckpointingSettings(),
                        initializationTimestamp,
                        jobMasterServiceFactory);

        return new JobMasterServiceLeadershipRunner(
                jobMasterServiceProcessFactory,
                jobManagerLeaderElectionService,
                runningJobsRegistry,
                classLoaderLease,
                fatalErrorHandler);
    }

到这里jobmaster就创建好了，然后jobgraph在jobmaster中会被转化成executionGraph，开启它的下一形态，下节我们再继续看