上节我们看到了RestClusterClient上传了jobgraph和作业相关的依赖,jar包等,可是flink集群是如何获取jobgraph如何继续下一步操作的呢,今天我们接着看。
1.handleRequest方法
flink集群是通过获取WebMonitorEndpoint来处理来自RestClient的JobSubmit请求的,而真正负责处理JobSubmit请求的是JobSubmitHandler,下面我们可以看一下这个方法的handleRequest方法
该方法中主要做了以下:
1.获取restClient的文件
2.获取jobgraph和对应的依赖,jar包
3.通过dispatchergateway提交jobgraph
@Override
protected CompletableFuture<JobSubmitResponseBody> handleRequest(
@Nonnull HandlerRequest<JobSubmitRequestBody, EmptyMessageParameters> request,
@Nonnull DispatcherGateway gateway)
throws RestHandlerException {
//获取restClient上传的文件
final Collection<File> uploadedFiles = request.getUploadedFiles();
final Map<String, Path> nameToFile =
uploadedFiles.stream()
.collect(Collectors.toMap(File::getName, Path::fromLocalFile));
if (uploadedFiles.size() != nameToFile.size()) {
throw new RestHandlerException(
String.format(
"The number of uploaded files was %s than the expected count. Expected: %s Actual %s",
uploadedFiles.size() < nameToFile.size() ? "lower" : "higher",
nameToFile.size(),
uploadedFiles.size()),
HttpResponseStatus.BAD_REQUEST);
}
final JobSubmitRequestBody requestBody = request.getRequestBody();
if (requestBody.jobGraphFileName == null) {
throw new RestHandlerException(
String.format(
"The %s field must not be omitted or be null.",
JobSubmitRequestBody.FIELD_NAME_JOB_GRAPH),
HttpResponseStatus.BAD_REQUEST);
}
//获取Jobgraph和对应的jar包和依赖
CompletableFuture<JobGraph> jobGraphFuture = loadJobGraph(requestBody, nameToFile);
Collection<Path> jarFiles = getJarFilesToUpload(requestBody.jarFileNames, nameToFile);
Collection<Tuple2<String, Path>> artifacts =
getArtifactFilesToUpload(requestBody.artifactFileNames, nameToFile);
CompletableFuture<JobGraph> finalizedJobGraphFuture =
uploadJobGraphFiles(gateway, jobGraphFuture, jarFiles, artifacts, configuration);
//通过dispatcherGateway提交jobgraph
CompletableFuture<Acknowledge> jobSubmissionFuture =
finalizedJobGraphFuture.thenCompose(
jobGraph -> gateway.submitJob(jobGraph, timeout));
return jobSubmissionFuture.thenCombine(
jobGraphFuture,
(ack, jobGraph) -> new JobSubmitResponseBody("/jobs/" + jobGraph.getJobID()));
}
2.submitJob方法
submitjob有两个实现类:
dispatcher是在集群中使用的,MiniDispatcher适用于idea本地调试,所以这里我们直接看dispatcher的submitJob方法,该方法中开始了进行了一些简单的判断,然后调用internalSubmitJob方法
3.internalSubmitJob方法
此方法共做了两件事:
1.持久化并且运行提交的job,persistAndRunJob
2.捕捉提交运行job时的异常
private CompletableFuture<Acknowledge> internalSubmitJob(JobGraph jobGraph) {
log.info("Submitting job {} ({}).", jobGraph.getJobID(), jobGraph.getName());
//持久化并且运行提交得job
final CompletableFuture<Acknowledge> persistAndRunFuture =
waitForTerminatingJob(jobGraph.getJobID(), jobGraph, this::persistAndRunJob)
.thenApply(ignored -> Acknowledge.get());
return persistAndRunFuture.handleAsync(
(acknowledge, throwable) -> {
if (throwable != null) {
cleanUpJobData(jobGraph.getJobID(), true);
//捕捉提交和运行job时得异常
ClusterEntryPointExceptionUtils.tryEnrichClusterEntryPointError(throwable);
final Throwable strippedThrowable =
ExceptionUtils.stripCompletionException(throwable);
log.error(
"Failed to submit job {}.", jobGraph.getJobID(), strippedThrowable);
throw new CompletionException(
new JobSubmissionException(
jobGraph.getJobID(),
"Failed to submit job.",
strippedThrowable));
} else {
return acknowledge;
}
},
ioExecutor);
}
4.persistAndRunJob方法
5.runJob方法
该方法主要做三件事:
1.创建并启动jobmaster,运行提交的job
2.将该jobid和jobmaster加入运行job的集合中
3.异步获取jobmaster的执行结果
private void runJob(JobGraph jobGraph, ExecutionType executionType) throws Exception {
Preconditions.checkState(!runningJobs.containsKey(jobGraph.getJobID()));
long initializationTimestamp = System.currentTimeMillis();
//创建jobmaster并启动,运行对应的job
JobManagerRunner jobManagerRunner =
createJobManagerRunner(jobGraph, initializationTimestamp);
//将该jobid和jobmaster加入运行job的map中
runningJobs.put(jobGraph.getJobID(), jobManagerRunner);
final JobID jobId = jobGraph.getJobID();
//异步获取jobmaster的执行结果
final CompletableFuture<CleanupJobState> cleanupJobStateFuture =
jobManagerRunner
.getResultFuture()
.handleAsync(
(jobManagerRunnerResult, throwable) -> {
Preconditions.checkState(
runningJobs.get(jobId) == jobManagerRunner,
"The job entry in runningJobs must be bound to the lifetime of the JobManagerRunner.");
if (jobManagerRunnerResult != null) {
return handleJobManagerRunnerResult(
jobManagerRunnerResult, executionType);
} else {
return jobManagerRunnerFailed(jobId, throwable);
}
},
getMainThreadExecutor());
final CompletableFuture<Void> jobTerminationFuture =
cleanupJobStateFuture
.thenApply(cleanupJobState -> removeJob(jobId, cleanupJobState))
.thenCompose(Function.identity());
FutureUtils.assertNoException(jobTerminationFuture);
registerJobManagerRunnerTerminationFuture(jobId, jobTerminationFuture);
}
6.creatJobManagerRunner方法
该方法主要的功能是创建jobmaster,下面我详细看一下中间都有哪些步骤
@Override
public JobManagerRunner createJobManagerRunner(
JobGraph jobGraph,
Configuration configuration,
RpcService rpcService,
HighAvailabilityServices highAvailabilityServices,
HeartbeatServices heartbeatServices,
JobManagerSharedServices jobManagerServices,
JobManagerJobMetricGroupFactory jobManagerJobMetricGroupFactory,
FatalErrorHandler fatalErrorHandler,
long initializationTimestamp)
throws Exception {
checkArgument(jobGraph.getNumberOfVertices() > 0, "The given job is empty");
//从配置文件中获取jobmaster的相关配置
final JobMasterConfiguration jobMasterConfiguration =
JobMasterConfiguration.fromConfiguration(configuration);
//下面两个是做jobmaster高可用的一些配置服务
final RunningJobsRegistry runningJobsRegistry =
highAvailabilityServices.getRunningJobsRegistry();
final LeaderElectionService jobManagerLeaderElectionService =
highAvailabilityServices.getJobManagerLeaderElectionService(jobGraph.getJobID());
//这个是用来调度taskmanager slot的
final SlotPoolServiceSchedulerFactory slotPoolServiceSchedulerFactory =
DefaultSlotPoolServiceSchedulerFactory.fromConfiguration(
configuration, jobGraph.getJobType());
if (jobMasterConfiguration.getConfiguration().get(JobManagerOptions.SCHEDULER_MODE)
== SchedulerExecutionMode.REACTIVE) {
Preconditions.checkState(
slotPoolServiceSchedulerFactory.getSchedulerType()
== JobManagerOptions.SchedulerType.Adaptive,
"Adaptive Scheduler is required for reactive mode");
}
//管理shuffle的
final ShuffleMaster<?> shuffleMaster =
ShuffleServiceLoader.loadShuffleServiceFactory(configuration)
.createShuffleMaster(configuration);
//下面是获取job的主类和依赖
final LibraryCacheManager.ClassLoaderLease classLoaderLease =
jobManagerServices
.getLibraryCacheManager()
.registerClassLoaderLease(jobGraph.getJobID());
final ClassLoader userCodeClassLoader =
classLoaderLease
.getOrResolveClassLoader(
jobGraph.getUserJarBlobKeys(), jobGraph.getClasspaths())
.asClassLoader();
final DefaultJobMasterServiceFactory jobMasterServiceFactory =
new DefaultJobMasterServiceFactory(
jobManagerServices.getScheduledExecutorService(),
rpcService,
jobMasterConfiguration,
jobGraph,
highAvailabilityServices,
slotPoolServiceSchedulerFactory,
jobManagerServices,
heartbeatServices,
jobManagerJobMetricGroupFactory,
fatalErrorHandler,
userCodeClassLoader,
shuffleMaster,
initializationTimestamp);
final DefaultJobMasterServiceProcessFactory jobMasterServiceProcessFactory =
new DefaultJobMasterServiceProcessFactory(
jobGraph.getJobID(),
jobGraph.getName(),
jobGraph.getCheckpointingSettings(),
initializationTimestamp,
jobMasterServiceFactory);
return new JobMasterServiceLeadershipRunner(
jobMasterServiceProcessFactory,
jobManagerLeaderElectionService,
runningJobsRegistry,
classLoaderLease,
fatalErrorHandler);
}
到这里jobmaster就创建好了,然后jobgraph在jobmaster中会被转化成executionGraph,开启它的下一形态,下节我们再继续看