flink源码分析-flink-yarn-session共享模式下job提交流程分析

最新推荐文章于 2024-07-24 11:58:44 发布

攻城狮AA

最新推荐文章于 2024-07-24 11:58:44 发布

阅读量1.7k

点赞数 1

分类专栏：大数据文章标签： java flink hadoop

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/Sunzhongwei1988/article/details/105939020

版权

大数据专栏收录该内容

3 篇文章 0 订阅

订阅专栏

前言

之前的文章中已对flink-yarn-session的独立模式下job的提交流程进行了分析，今天在此基础上，对flink-yarn-session的共享模式下job提交流程进行分析。

flink-yarn-seesion共享模式job提交流程分析

前面文章中已经介绍了flink-yarn-session的独立模式与共享模式出现的分支节点在org.apache.flink.client.cli.CliFrontend这个类的runProgram方法中，该方法见下：

private <T> void runProgram(
			CustomCommandLine<T> customCommandLine,
			CommandLine commandLine,
			RunOptions runOptions,
			PackagedProgram program) throws ProgramInvocationException, FlinkException {
		final ClusterDescriptor<T> clusterDescriptor = customCommandLine.createClusterDescriptor(commandLine);

		try {
			final T clusterId = customCommandLine.getClusterId(commandLine);
			final ClusterClient<T> client;
			// directly deploy the job if the cluster is started in job mode and detached
			if (clusterId == null && runOptions.getDetachedMode()) {
				int parallelism = runOptions.getParallelism() == -1 ? defaultParallelism : runOptions.getParallelism();
				final JobGraph jobGraph = PackagedProgramUtils.createJobGraph(program, configuration, parallelism);
				final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
				client = clusterDescriptor.deployJobCluster(
					clusterSpecification,
					jobGraph,
					runOptions.getDetachedMode());
				logAndSysout("Job has been submitted with JobID " + jobGraph.getJobID());
				try {
					client.shutdown();
				} catch (Exception e) {
					LOG.info("Could not properly shut down the client.", e);
				}
			} else {
				final Thread shutdownHook;
				if (clusterId != null) {
					client = clusterDescriptor.retrieve(clusterId);
					shutdownHook = null;
				} else {
					// also in job mode we have to deploy a session cluster because the job
					// might consist of multiple parts (e.g. when using collect)
					final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
					client = clusterDescriptor.deploySessionCluster(clusterSpecification);
					// if not running in detached mode, add a shutdown hook to shut down cluster if client exits
					// there's a race-condition here if cli is killed before shutdown hook is installed
					if (!runOptions.getDetachedMode() && runOptions.isShutdownOnAttachedExit()) {
						shutdownHook = ShutdownHookUtil.addShutdownHook(client::shutDownCluster, client.getClass().getSimpleName(), LOG);
					} else {
						shutdownHook = null;
					}
				}

				try {
					client.setPrintStatusDuringExecution(runOptions.getStdoutLogging());
					client.setDetached(runOptions.getDetachedMode());

					LOG.debug("{}", runOptions.getSavepointRestoreSettings());

					int userParallelism = runOptions.getParallelism();
					LOG.debug("User parallelism is set to {}", userParallelism);
					if (ExecutionConfig.PARALLELISM_DEFAULT == userParallelism) {
						userParallelism = defaultParallelism;
					}

					executeProgram(program, client, userParallelism);
				} finally {
					if (clusterId == null && !client.isDetached()) {
						// terminate the cluster only if we have started it before and if it's not detached
						try {
							client.shutDownCluster();
						} catch (final Exception e) {
							LOG.info("Could not properly terminate the Flink cluster.", e);
						}
						if (shutdownHook != null) {
							// we do not need the hook anymore as we have just tried to shutdown the cluster.
							ShutdownHookUtil.removeShutdownHook(shutdownHook, client.getClass().getSimpleName(), LOG);
						}
					}
					try {
						client.shutdown();
					} catch (Exception e) {
						LOG.info("Could not properly shut down the client.", e);
					}
				}
			}
		} finally {
			try {
				clusterDescriptor.close();
			} catch (Exception e) {
				LOG.info("Could not properly close the cluster descriptor.", e);
			}
		}
	}

我们继续来分析flink-yarn-session的共享模式下的job提交流程。在分析之前，我得提2个问题：flink-yarn-session共享的是对象是什么？如何与共享对象进行交互？下面的分析也将围绕着这2个问题进行展。

用户在提交job的时候，如果传递了clusterId的值（该值对应的是yarn的applicationId），则会根据该值来获取ClusterClient对象；如果没有设置，则会先创建ClusterClient对象。我们先看一下，源码中是如何根据clusterId来获取ClusterClient对象，另外ClusterClient是抽象类，我们最终需要的ClusterClient实例到底是什么？我们来看一下clusterDescriptor.retrieve(clusterId)这个方法。在之前的文章中已经介绍了这里的clusterDescriptor实际上是YarnClusterDescriptor的对象，所以我们来到YarnClusterDescriptor类的父类AbstractYarnClusterDescriptor中来看retrieve方法：

@Override
	public ClusterClient<ApplicationId> retrieve(ApplicationId applicationId) throws ClusterRetrieveException {
		try {
			// check if required Hadoop environment variables are set. If not, warn user
			if (System.getenv("HADOOP_CONF_DIR") == null &&
				System.getenv("YARN_CONF_DIR") == null) {
				LOG.warn("Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set." +
					"The Flink YARN Client needs one of these to be set to properly load the Hadoop " +
					"configuration for accessing YARN.");
			}
			final ApplicationReport appReport = yarnClient.getApplicationReport(applicationId);
			if (appReport.getFinalApplicationStatus() != FinalApplicationStatus.UNDEFINED) {
				// Flink cluster is not running anymore
				LOG.error("The application {} doesn't run anymore. It has previously completed with final status: {}",
					applicationId, appReport.getFinalApplicationStatus());
				throw new RuntimeException("The Yarn application " + applicationId + " doesn't run anymore.");
			}
			final String host = appReport.getHost();
			final int rpcPort = appReport.getRpcPort();
			LOG.info("Found application JobManager host name '{}' and port '{}' from supplied application id '{}'",
				host, rpcPort, applicationId);
			flinkConfiguration.setString(JobManagerOptions.ADDRESS, host);
			flinkConfiguration.setInteger(JobManagerOptions.PORT, rpcPort);

			flinkConfiguration.setString(RestOptions.ADDRESS, host);
			flinkConfiguration.setInteger(RestOptions.PORT, rpcPort);

			return createYarnClusterClient(
				this,
				-1, // we don't know the number of task managers of a started Flink cluster
				-1, // we don't know how many slots each task manager has for a started Flink cluster
				appReport,
				flinkConfiguration,
				false);
		} catch (Exception e) {
			throw new ClusterRetrieveException("Couldn't retrieve Yarn cluster", e);
		}
	}

通过上面代码，我们知道了这样的一个流程：首先yarnClient根据clusterId来获取ApplicationReport对象（ApplicationReport是yarn client里的类，该类的作用是保存通过yarnclient获取到的应用状态信息）。这里ApplicationReport对象保存的信息实际上就是flink jobManager的信息。那么如何与jobManager进行通信呢？答案就是ApplicationReport对象的host与RpcPort值。这就是与jobmanager通信的地址与端口。下面有点好奇的是，同样的数据为啥要在flinkConfiguration设置2次，而且2次的属性名不一样？我们还是来看一下源码吧：

/**
 * Configuration options for the JobManager.
 */
@PublicEvolving
public class JobManagerOptions {

	/**
	 * The config parameter defining the network address to connect to
	 * for communication with the job manager.
	 *
	 * <p>This value is only interpreted in setups where a single JobManager with static
	 * name or address exists (simple standalone setups, or container setups with dynamic
	 * service name resolution). It is not used in many high-availability setups, when a
	 * leader-election service (like ZooKeeper) is used to elect and discover the JobManager
	 * leader from potentially multiple standby JobManagers.
	 */
	public static final ConfigOption<String> ADDRESS =
		key("jobmanager.rpc.address")
		.noDefaultValue()
		.withDescription("The config parameter defining the network address to connect to" +
			" for communication with the job manager." +
			" This value is only interpreted in setups where a single JobManager with static" +
			" name or address exists (simple standalone setups, or container setups with dynamic" +
			" service name resolution). It is not used in many high-availability setups, when a" +
			" leader-election service (like ZooKeeper) is used to elect and discover the JobManager" +
			" leader from potentially multiple standby JobManagers.");

JobManagerOptions.ADDRESS对应的key为：jobmanager.rpc.address，该值为与jobManager通信的地址，该值不适用于高可用环境下。我们再看一下RestOptions代码：

/**
	 * The address that should be used by clients to connect to the server.
	 */
	public static final ConfigOption<String> ADDRESS =
		key("rest.address")
			.noDefaultValue()
			.withFallbackKeys(JobManagerOptions.ADDRESS.key())
			.withDescription("The address that should be used by clients to connect to the server.");

RestOptions.ADDRESS对应的key为：rest.addrees。该地址是client与server进行通信的地址，是restful格式的地址。通过对比我们知道了为啥要设置2次。设置的2次对应的key是不同的，也就是处于不同的业务语境。

到这里我们已经知道了共享模式下共享的对象就是jobManager以及jobManager所管理的taskManager。也就是共享一套flink集群。我们回头再看一下AbstractYarnClusterDescriptor中来看retrieve方法。最终通过调用createYarnClusterClient来返回ClusterClient。由于createYarnClusterClient是抽象方法，我们得看一下具体的实现方法：

@Override
	protected ClusterClient<ApplicationId> createYarnClusterClient(
			AbstractYarnClusterDescriptor descriptor,
			int numberTaskManagers,
			int slotsPerTaskManager,
			ApplicationReport report,
			Configuration flinkConfiguration,
			boolean perJobCluster) throws Exception {
		return new RestClusterClient<>(
			flinkConfiguration,
			report.getApplicationId());
	}

到这里，我们知道了在runProgram方法里，共享模式下，通过clusterId获取的ClusterClient实际上是RestClusterClient对象。我们再看一下，如果没有传递clusterId,flink是如何获取ClusterClient？获取的ClusterClient实际对象又是什么呢？在没有传递clusterId的时候，flink先去在yarn部署一个集群。调用的方法是

client = clusterDescriptor.deploySessionCluster(clusterSpecification);

之前的分析中已经知道了clusterDescriptor实际上就是YarnClusterDescriptor的实例，因此我们来看看是如何发布的：

@Override
	public ClusterClient<ApplicationId> deploySessionCluster(ClusterSpecification clusterSpecification) throws ClusterDeploymentException {
		try {
			return deployInternal(
				clusterSpecification,
				"Flink session cluster",
				getYarnSessionClusterEntrypoint(),
				null,
				false);
		} catch (Exception e) {
			throw new ClusterDeploymentException("Couldn't deploy Yarn session cluster", e);
		}
	}

在这个deploySessionCluster方法中直接调用了deployInternal方法。看过之前的文章，就知道deployInternal在独立模式下提交job最终也被调用了。现在只不过参数有点变化而已。而在deployInternal方法里最终也调用了前面所讲的createYarnClusterClient方法，最终返回的也是RestClusterClient对象。从字面上看，RestClusterClient与Server(JobManager)通信走的是Http协议并且是restful风格。那到底是不是呢？我们看一下RestClusterClient类上的注释吧：

/**
 * A {@link ClusterClient} implementation that communicates via HTTP REST requests.
 */
public class RestClusterClient<T> extends ClusterClient<T> implements NewClusterClient {
  //此处省略n字符
}

没错！！！也就是说，在AM/JobManager启动的时候,实际上启动了一个web应用，对应的就是web监控页面。

我们需要的ClusterClient也有了，接下来，我们看一下executeProgram方法执行过程：

protected void executeProgram(PackagedProgram program, ClusterClient<?> client, int parallelism) throws ProgramMissingJobException, ProgramInvocationException {
		logAndSysout("Starting execution of program");

		final JobSubmissionResult result = client.run(program, parallelism);

		if (null == result) {
			throw new ProgramMissingJobException("No JobSubmissionResult returned, please make sure you called " +
				"ExecutionEnvironment.execute()");
		}

		if (result.isJobExecutionResult()) {
			logAndSysout("Program execution finished");
			JobExecutionResult execResult = result.getJobExecutionResult();
			System.out.println("Job with JobID " + execResult.getJobID() + " has finished.");
			System.out.println("Job Runtime: " + execResult.getNetRuntime() + " ms");
			Map<String, Object> accumulatorsResult = execResult.getAllAccumulatorResults();
			if (accumulatorsResult.size() > 0) {
				System.out.println("Accumulator Results: ");
				System.out.println(AccumulatorHelper.getResultsFormatted(accumulatorsResult));
			}
		} else {
			logAndSysout("Job has been submitted with JobID " + result.getJobID());
		}
	}

在executeProgram中又调用了RestClient的run方法，并且返回JobSubmissionResult对象。我们继续跟踪代码，看一下RestClient的run方法：

	/**
	 * General purpose method to run a user jar from the CliFrontend in either blocking or detached mode, depending
	 * on whether {@code setDetached(true)} or {@code setDetached(false)}.
	 * @param prog the packaged program
	 * @param parallelism the parallelism to execute the contained Flink job
	 * @return The result of the execution
	 * @throws ProgramMissingJobException
	 * @throws ProgramInvocationException
	 */
	public JobSubmissionResult run(PackagedProgram prog, int parallelism)
			throws ProgramInvocationException, ProgramMissingJobException {
		final ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader();
		try {
			Thread.currentThread().setContextClassLoader(prog.getUserCodeClassLoader());
			if (prog.isUsingProgramEntryPoint()) {
				final JobWithJars jobWithJars = prog.getPlanWithJars();
				return run(jobWithJars, parallelism, prog.getSavepointSettings());
			}
			else if (prog.isUsingInteractiveMode()) {
				log.info("Starting program in interactive mode (detached: {})", isDetached());

				final List<URL> libraries = prog.getAllLibraries();

				ContextEnvironmentFactory factory = new ContextEnvironmentFactory(this, libraries,
				prog.getClasspaths(), prog.getUserCodeClassLoader(), parallelism, isDetached(),
				prog.getSavepointSettings());
				ContextEnvironment.setAsContext(factory);

				try {
					// invoke main method
					prog.invokeInteractiveModeForExecution();
					if (lastJobExecutionResult == null && factory.getLastEnvCreated() == null) {
						throw new ProgramMissingJobException("The program didn't contain a Flink job.");
					}
					if (isDetached()) {
						// in detached mode, we execute the whole user code to extract the Flink job, afterwards we run it here
						return ((DetachedEnvironment) factory.getLastEnvCreated()).finalizeExecute();
					}
					else {
						// in blocking mode, we execute all Flink jobs contained in the user code and then return here
						return this.lastJobExecutionResult;
					}
				}
				finally {
					ContextEnvironment.unsetContext();
				}
			}
			else {
				throw new ProgramInvocationException("PackagedProgram does not have a valid invocation mode.");
			}
		}
		finally {
			Thread.currentThread().setContextClassLoader(contextClassLoader);
		}
	}

前面文章中，我们提到过PackagedProgram这个类。我们在顺便提一下。PackagedProgram这个类要干以下三件事情：

从jar中抽取出job依赖的jar包（解析jar目录结构，如果存在lib目录，则对该目录进行扫描，看是否有jar包，如果有，则会提取jar包并写入到本地的临时目录中，供后期使用。）；
从jar包中获取job的入口类（解析jar中的manifest，如果有program-class配置，则优先使用该配置，否则再看一下是否有Main-Class配置）；
从jar包中获取job的执行计划；

RestClient的run方法执行根据prog的入口类类型分为2个分支：如果存在入口类为org.apache.flink.api.common.Program的实现类，则走一个分支；如果存在入口类且不是org.apache.flink.api.common.Program的实现类，则走另外一个分支。下面先说第一个分支：

当入口类为org.apache.flink.api.common.Program的实现类时，flink回去掉用重载方法run:

/**
	 * Runs a program on the Flink cluster to which this client is connected. The call blocks until the
	 * execution is complete, and returns afterwards.
	 *
	 * @param jobWithJars The program to be executed.
	 * @param parallelism The default parallelism to use when running the program. The default parallelism is used
	 *                    when the program does not set a parallelism by itself.
	 *
	 * @throws CompilerException Thrown, if the compiler encounters an illegal situation.
	 * @throws ProgramInvocationException Thrown, if the program could not be instantiated from its jar file,
	 *                                    or if the submission failed. That might be either due to an I/O problem,
	 *                                    i.e. the job-manager is unreachable, or due to the fact that the
	 *                                    parallel execution failed.
	 */
	public JobSubmissionResult run(JobWithJars jobWithJars, int parallelism, SavepointRestoreSettings savepointSettings)
			throws CompilerException, ProgramInvocationException {
		ClassLoader classLoader = jobWithJars.getUserCodeClassLoader();
		if (classLoader == null) {
			throw new IllegalArgumentException("The given JobWithJars does not provide a usercode class loader.");
		}

		OptimizedPlan optPlan = getOptimizedPlan(compiler, jobWithJars, parallelism);
		return run(optPlan, jobWithJars.getJarFiles(), jobWithJars.getClasspaths(), classLoader, savepointSettings);
	}

该方法是阻塞的，在该方法中获取优化后的执行计划，然后调用了重载方法run：

public JobSubmissionResult run(FlinkPlan compiledPlan,
			List<URL> libraries, List<URL> classpaths, ClassLoader classLoader, SavepointRestoreSettings savepointSettings)
			throws ProgramInvocationException {
		JobGraph job = getJobGraph(flinkConfig, compiledPlan, libraries, classpaths, savepointSettings);
		return submitJob(job, classLoader);
	}

在这个方法中获取了JobGraph对象，然后调用了submitJob方法，我们继续跟：

@Override
	public JobSubmissionResult submitJob(JobGraph jobGraph, ClassLoader classLoader) throws ProgramInvocationException {
		log.info("Submitting job {} (detached: {}).", jobGraph.getJobID(), isDetached());

		final CompletableFuture<JobSubmissionResult> jobSubmissionFuture = submitJob(jobGraph);

		if (isDetached()) {
			try {
				return jobSubmissionFuture.get();
			} catch (Exception e) {
				throw new ProgramInvocationException("Could not submit job",
					jobGraph.getJobID(), ExceptionUtils.stripExecutionException(e));
			}
		} else {
			final CompletableFuture<JobResult> jobResultFuture = jobSubmissionFuture.thenCompose(
				ignored -> requestJobResult(jobGraph.getJobID()));

			final JobResult jobResult;
			try {
				jobResult = jobResultFuture.get();
			} catch (Exception e) {
				throw new ProgramInvocationException("Could not retrieve the execution result.",
					jobGraph.getJobID(), ExceptionUtils.stripExecutionException(e));
			}

			try {
				this.lastJobExecutionResult = jobResult.toJobExecutionResult(classLoader);
				return lastJobExecutionResult;
			} catch (JobExecutionException e) {
				throw new ProgramInvocationException("Job failed.", jobGraph.getJobID(), e);
			} catch (IOException | ClassNotFoundException e) {
				throw new ProgramInvocationException("Job failed.", jobGraph.getJobID(), e);
			}
		}
	}

在该方法中又调用了重载方法submitJob，我们继续跟踪：

/**
	 * Submits the given {@link JobGraph} to the dispatcher.
	 *
	 * @param jobGraph to submit
	 * @return Future which is completed with the submission response
	 */
	@Override
	public CompletableFuture<JobSubmissionResult> submitJob(@Nonnull JobGraph jobGraph) {
		// we have to enable queued scheduling because slot will be allocated lazily
		jobGraph.setAllowQueuedScheduling(true);

		CompletableFuture<java.nio.file.Path> jobGraphFileFuture = CompletableFuture.supplyAsync(() -> {
			try {
				final java.nio.file.Path jobGraphFile = Files.createTempFile("flink-jobgraph", ".bin");
				try (ObjectOutputStream objectOut = new ObjectOutputStream(Files.newOutputStream(jobGraphFile))) {
					objectOut.writeObject(jobGraph);
				}
				return jobGraphFile;
			} catch (IOException e) {
				throw new CompletionException(new FlinkException("Failed to serialize JobGraph.", e));
			}
		}, executorService);

		CompletableFuture<Tuple2<JobSubmitRequestBody, Collection<FileUpload>>> requestFuture = jobGraphFileFuture.thenApply(jobGraphFile -> {
			List<String> jarFileNames = new ArrayList<>(8);
			List<JobSubmitRequestBody.DistributedCacheFile> artifactFileNames = new ArrayList<>(8);
			Collection<FileUpload> filesToUpload = new ArrayList<>(8);

			filesToUpload.add(new FileUpload(jobGraphFile, RestConstants.CONTENT_TYPE_BINARY));

			for (Path jar : jobGraph.getUserJars()) {
				jarFileNames.add(jar.getName());
				filesToUpload.add(new FileUpload(Paths.get(jar.toUri()), RestConstants.CONTENT_TYPE_JAR));
			}

			for (Map.Entry<String, DistributedCache.DistributedCacheEntry> artifacts : jobGraph.getUserArtifacts().entrySet()) {
				artifactFileNames.add(new JobSubmitRequestBody.DistributedCacheFile(artifacts.getKey(), new Path(artifacts.getValue().filePath).getName()));
				filesToUpload.add(new FileUpload(Paths.get(artifacts.getValue().filePath), RestConstants.CONTENT_TYPE_BINARY));
			}

			final JobSubmitRequestBody requestBody = new JobSubmitRequestBody(
				jobGraphFile.getFileName().toString(),
				jarFileNames,
				artifactFileNames);

			return Tuple2.of(requestBody, Collections.unmodifiableCollection(filesToUpload));
		});

		final CompletableFuture<JobSubmitResponseBody> submissionFuture = requestFuture.thenCompose(
			requestAndFileUploads -> sendRetriableRequest(
				JobSubmitHeaders.getInstance(),
				EmptyMessageParameters.getInstance(),
				requestAndFileUploads.f0,
				requestAndFileUploads.f1,
				isConnectionProblemOrServiceUnavailable())
		);

		submissionFuture
			.thenCombine(jobGraphFileFuture, (ignored, jobGraphFile) -> jobGraphFile)
			.thenAccept(jobGraphFile -> {
			try {
				Files.delete(jobGraphFile);
			} catch (IOException e) {
				log.warn("Could not delete temporary file {}.", jobGraphFile, e);
			}
		});

		return submissionFuture
			.thenApply(
				(JobSubmitResponseBody jobSubmitResponseBody) -> new JobSubmissionResult(jobGraph.getJobID()))
			.exceptionally(
				(Throwable throwable) -> {
					throw new CompletionException(new JobSubmissionException(jobGraph.getJobID(), "Failed to submit JobGraph.", ExceptionUtils.stripCompletionException(throwable)));
				});
	}

到这里，这个方法是最终提交job的方法。这个方法主要干了如下事情：

将JobGraph对象内容写入到本地临时文件中；
向JobManager上传所需要的资源文件；
向jobManager发起submit请求最终返回JobSubmissionResult对象；

到此第一个分支已结束。我们再看第二个分支。在第二个分支中最终调用了：

prog.invokeInteractiveModeForExecution();方法

我们看一下这个方法具体实现：

/**
	 * This method assumes that the context environment is prepared, or the execution
	 * will be a local execution by default.
	 */
	public void invokeInteractiveModeForExecution() throws ProgramInvocationException{
		if (isUsingInteractiveMode()) {
			callMainMethod(mainClass, args);
		} else {
			throw new ProgramInvocationException("Cannot invoke a plan-based program directly.");
		}
	}

在这个方法中又调用了callMainMethod方法，我们继续跟踪：

private static void callMainMethod(Class<?> entryClass, String[] args) throws ProgramInvocationException {
			Method mainMethod;
			if (!Modifier.isPublic(entryClass.getModifiers())) {
				throw new ProgramInvocationException("The class " + entryClass.getName() + " must be public.");
			}

			try {
				mainMethod = entryClass.getMethod("main", String[].class);
			} catch (NoSuchMethodException e) {
				throw new ProgramInvocationException("The class " + entryClass.getName() + " has no main(String[]) method.");
			}
			catch (Throwable t) {
				throw new ProgramInvocationException("Could not look up the main(String[]) method from the class " +
					entryClass.getName() + ": " + t.getMessage(), t);
			}

			if (!Modifier.isStatic(mainMethod.getModifiers())) {
				throw new ProgramInvocationException("The class " + entryClass.getName() + " declares a non-static main method.");
			}
			if (!Modifier.isPublic(mainMethod.getModifiers())) {
				throw new ProgramInvocationException("The class " + entryClass.getName() + " declares a non-public main method.");
			}

			try {
				mainMethod.invoke(null, (Object) args);
			}
			catch (IllegalArgumentException e) {
				throw new ProgramInvocationException("Could not invoke the main method, arguments are not matching.", e);
			}
			catch (IllegalAccessException e) {
				throw new ProgramInvocationException("Access to the main method was denied: " + e.getMessage(), e);
			}
			catch (InvocationTargetException e) {
				Throwable exceptionInMethod = e.getTargetException();
				if (exceptionInMethod instanceof Error) {
					throw (Error) exceptionInMethod;
				} else if (exceptionInMethod instanceof ProgramParametrizationException) {
					throw (ProgramParametrizationException) exceptionInMethod;
				} else if (exceptionInMethod instanceof ProgramInvocationException) {
					throw (ProgramInvocationException) exceptionInMethod;
				} else {
					throw new ProgramInvocationException("The main method caused an error: " + exceptionInMethod.getMessage(), exceptionInMethod);
				}
			}
			catch (Throwable t) {
			throw new ProgramInvocationException("An error occurred while invoking the program's main method: " + t.getMessage(), t);
		}
	}

在这个方法中，最终调用job中的main方法。也就是说job直接在client端执行。

到此为止，针对flink源码分析-flink-yarn-session 共享模式下job提交流程已结束。