flink源码解析之系统启动和job提交

最新推荐文章于 2024-07-09 09:48:25 发布

大数据技术与应用实战

最新推荐文章于 2024-07-09 09:48:25 发布

阅读量1.1k

点赞数 1

分类专栏： flink 文章标签： flink yarn 源码 job 集群

本文链接：https://blog.csdn.net/zhangjun5965/article/details/86660451

版权

flink 专栏收录该内容

58 篇文章 33 订阅

订阅专栏

在这里插入图片描述

文章目录

前沿
入口
启动flink集群
开始执行用户job
集群接收用户job

前沿

flink on yarn 主要有两种部署方式

1.on session
2.单独任务

我们主要讲一些第二种单独提交到yarn的任务，这种方式下提交任务的流程。

具体的可参考

https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/yarn_setup.html

启动的脚本是
./bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar
我目前看的代码是基于最新的从git上面下载的版本flink-1.8-SNAPSHOT

入口

通过查询bin/flink 的内容看到最终执行的是下面的类，我们进入这个类的main方法。
org.apache.flink.client.cli.CliFrontend

parseParameters ,不同的action进入不同的方法，我们这里./bin/flink后面跟的run，所以action是run，进入run方法。

运行程序进入run方法
final CustomCommandLine<?> customCommandLine = getActiveCustomCommandLine(commandLine);
根据参数获取CustomCommandLine的子类FlinkYarnSessionCli

进入runProgram


	private <T> void runProgram(
			CustomCommandLine<T> customCommandLine,
			CommandLine commandLine,
			RunOptions runOptions,
			PackagedProgram program) throws ProgramInvocationException, FlinkException {
			
			
		//这个customCommandLine就是上面FlinkYarnSessionCli的实例，通过createClusterDescriptor方法获取ClusterDescriptor的实现类YarnClusterDescriptor的实例
		final ClusterDescriptor<T> clusterDescriptor = customCommandLine.createClusterDescriptor(commandLine);

		try {
			final T clusterId = customCommandLine.getClusterId(commandLine);

			final ClusterClient<T> client;
			
			...............................
			
			
			//部署集群
			final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
					client = clusterDescriptor.deploySessionCluster(clusterSpecification);

在方法里获取AbstractYarnClusterDescriptor
org.apache.flink.yarn.cli.FlinkYarnSessionCli#createDescriptor

构造AbstractYarnClusterDescriptor，然后设置参数，返回实例


	private AbstractYarnClusterDescriptor createDescriptor(
			Configuration configuration,
			YarnConfiguration yarnConfiguration,
			String configurationDirectory,
			CommandLine cmd) {

		AbstractYarnClusterDescriptor yarnClusterDescriptor = getClusterDescriptor(
			configuration,
			yarnConfiguration,
			configurationDirectory);
      
      ..........................

    return yarnClusterDescriptor;

构造YarnClient，用于和yarn集群交互，yarn application的相关处理

	private AbstractYarnClusterDescriptor getClusterDescriptor(
			Configuration configuration,
			YarnConfiguration yarnConfiguration,
			String configurationDirectory) {
		final YarnClient yarnClient = YarnClient.createYarnClient();
		yarnClient.init(yarnConfiguration);
		yarnClient.start();

		return new YarnClusterDescriptor(
			configuration,
			yarnConfiguration,
			configurationDirectory,
			yarnClient,
			false);
	}

启动flink集群

final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
client = clusterDescriptor.deploySessionCluster(clusterSpecification);

跟踪代码，最后进入了org.apache.flink.yarn.AbstractYarnClusterDescriptor#deployInternal

在这个方法里，将向yarn申请资源，启动ApplicationMaster/JobManager。

这个方法东西比较多，会做一系列的检查，处理，如配置文件是否有效，yarn集群的资源等，然后调用startAppMaster方法启动ApplicationMaster/JobManager。

startAppMaster大概有400行，会做大量的检查和初始化的工作，如加载jar包等，最后通过yarn 客户端提交application。

	LOG.info("Submitting application master " + appId);
	yarnClient.submitApplication(appContext);

开始执行用户job

再回到CliFrontend的runProgram方法，上面的操作启动集群之后，然后开始真正执行用户的job

executeProgram(program, client, userParallelism);

然后调用run方法
org.apache.flink.client.program.ClusterClient#run(org.apache.flink.client.program.PackagedProgram, int)

最后通过反射方式法来执行

// invoke main method
prog.invokeInteractiveModeForExecution();

集群接收用户job

用户的任务只有在最后执行execute方法的时候才开始真正提交。

// set up the execution environment
		final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

首先构建一个StreamExecutionEnvironment流的子流，我们看到StreamExecutionEnvironment类有下面的子类。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sziSHfyW-1590241819470)(https://note.youdao.com/yws/public/resource/d18a3afb979d44dad77ca88f753a7ecd/xmlnote/AA97CF5A575F4961B208594A405BA9CF/14495)]

我们以RemoteStreamEnvironment为例，进入其execute方法


	@Override
	public JobExecutionResult execute(String jobName) throws ProgramInvocationException {
		StreamGraph streamGraph = getStreamGraph();
		streamGraph.setJobName(jobName);
		transformations.clear();
		return executeRemotely(streamGraph, jarFiles);
	}

首先会根据job信息获取StreamGraph，


	public JobSubmissionResult run(FlinkPlan compiledPlan,
			List<URL> libraries, List<URL> classpaths, ClassLoader classLoader, SavepointRestoreSettings savepointSettings)
			throws ProgramInvocationException {
		JobGraph job = getJobGraph(flinkConfig, compiledPlan, libraries, classpaths, savepointSettings);
		return submitJob(job, classLoader);
	}

计入跟踪代码,进入ClusterClient的run方法，生成JobGraph。



	public JobSubmissionResult run(FlinkPlan compiledPlan,
			List<URL> libraries, List<URL> classpaths, ClassLoader classLoader, SavepointRestoreSettings savepointSettings)
			throws ProgramInvocationException {
		JobGraph job = getJobGraph(flinkConfig, compiledPlan, libraries, classpaths, savepointSettings);
		return submitJob(job, classLoader);
	}

进入ClusterClient子类
org.apache.flink.yarn.YarnClusterClient#submitJob方法，判断下执行的两种模式，是否是detach，分别执行不同的方法

	@Override
	public JobSubmissionResult submitJob(JobGraph jobGraph, ClassLoader classLoader) throws ProgramInvocationException {
		if (isDetached()) {
			if (newlyCreatedCluster) {
				stopAfterJob(jobGraph.getJobID());
			}
			return super.runDetached(jobGraph, classLoader);
		} else {
			return super.run(jobGraph, classLoader);
		}
	}