flink深入研究(07) flink的StreamExecutionEnvironment.execute()函数调用过程01

今天我们开始分析Flink中第一个流程图StreamGraph的生成,StreamGraph 相关的代码主要在 org.apache.flink.streaming.api.graph 包中。构造StreamGraph的入口函数是 StreamGraphGenerator.generate(env, transformations)。该函数会由触发程序执行的方法StreamExecutionEnvironment.execute()调用到。也就是说 StreamGraph 是在 Client 端构造的,也就是说我们可以在本地通过调试观察 StreamGraph 的构造过程。接下来我们分析一下源码,首先我们从StreamExecutionEnvironment.execute()函数进入,代码如下:

/**
	 * Triggers the program execution. The environment will execute all parts of
	 * the program that have resulted in a "sink" operation. Sink operations are
	 * for example printing results or forwarding them to a message queue.
	 *
	 * <p>The program execution will be logged and displayed with the provided name
	 *
	 * @param jobName
	 * 		Desired name of the job
	 * @return The result of the job execution, containing elapsed time and accumulators.
	 * @throws Exception which occurs during job execution.
	 */
	public JobExecutionResult execute(String jobName) throws Exception {
		Preconditions.checkNotNull(jobName, "Streaming Job name should not be null.");

		return execute(getStreamGraph(jobName));
	}

这里jobName不能为null,我们进入到getStreamGraph函数中,代码如下:

/**
	 * Getter of the {@link org.apache.flink.streaming.api.graph.StreamGraph} of the streaming job.
	 *
	 * @param jobName Desired name of the job
	 * @return The streamgraph representing the transformations
	 */
	@Internal
	public StreamGraph getStreamGraph(String jobName) {
		return getStreamGraphGenerator().setJobName(jobName).generate();
	}

首先getStreamGraphGenerator函数会去创建一个StreamGraphGenerator类对象,代码如下:
 

private StreamGraphGenerator getStreamGraphGenerator() {
		if (transformations.size() <= 0) {
			throw new IllegalStateException("No operators defined in streaming topology. Cannot execute.");
		}
		return new StreamGraphGenerator(transformations, config, checkpointCfg)
			.setStateBackend(defaultStateBackend)
			.setChaining(isChainingEnabled)
			.setUserArtifacts(cacheFile)
			.setTimeCharacteristic(timeCharacteristic)
			.setDefaultBufferTimeout(bufferTimeout);
	}

这里transformations是List<Transformation<?>>类型,是一个Transformation类对象的集合,我们先来看看Transformation类对象,以及transformations的形成。Transformation类的继承关系如下:

Transformation类继承图
标Transformation类继承图题

DataStream中的map、flatMap、filter、window等函数会创建一个个Transformation类对象,这些对象会存储到环境变量的List<Transformation<?>>类对象中,接下来我们先来分析DataStream类。它的继承结构如下:

DataStream类的继承结构
标题DataStream类的继承结构

我们首先看看map函数,代码如下:

/**
	 * Applies a Map transformation on a {@link DataStream}. The transformation
	 * calls a {@link MapFunction} for each element of the DataStream. Each
	 * MapFunction call returns exactly one element. The user can also extend
	 * {@link RichMapFunction} to gain access to other features provided by the
	 * {@link org.apache.flink.api.common.functions.RichFunction} interface.
	 *
	 * @param mapper
	 *            The MapFunction that is called for each element of the
	 *            DataStream.
	 * @param <R>
	 *            output type
	 * @return The transformed {@link DataStream}.
	 */
	public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper) {
        // 通过java reflection抽出mapper的返回值类型
		TypeInformation<R> outType = TypeExtractor.getMapReturnTypes(clean(mapper), getType(),
				Utils.getCallLocationName(), true);
        // 返回一个新的DataStream,SteramMap 为 StreamOperator 的实现类
		return transform("Map", outType, new StreamMap<>(clean(mapper)));
	}

从上面的函数可以看到,它的入参是一个MapFunction<T,R>泛型类对象,返回SingleOutputStreamOperator<R>类对象,我们先来看看函数中第一行代码中的clean函数,该函数代码如下:

/**
	 * Invokes the {@link org.apache.flink.api.java.ClosureCleaner}
	 * on the given function if closure cleaning is enabled in the {@link ExecutionConfig}.
	 *
	 * @return The cleaned Function
	 */
	protected <F> F clean(F f) {
        //调用环境变量中的clean函数
		return getExecutionEnvironment().clean(f);
	}

从上面的代码可以看出,先获取环境变量,这个环境变量是在DataStreamSource<String> text = env.socketTextStream("localhost", port, "\n");存放到text中的,然后调用clean函数,代码如下:

/**
	 * Returns a "closure-cleaned" version of the given function. Cleans only if closure cleaning
	 * is not disabled in the {@link org.apache.flink.api.common.ExecutionConfig}
	 */
	@Internal
	public <F> F clean(F f) {
        //从配置中获取是否需要清除f类对象的外部类引用
		if (getConfig().isClosureCleanerEnabled()) {
            //开始将类对象f中的外部类引用清除掉,防止序列化失败
			ClosureCleaner.clean(f, getConfig().getClosureCleanerLevel(), true);
		}
     
		ClosureCleaner.ensureSerializable(f);
		return f;
	}

代码中我们看到调用了ClosureCleaner中的函数clean,我们进入到该函数,代码如下:

/**
	 * Tries to clean the closure of the given object, if the object is a non-static inner
	 * class.
	 *
	 * @param func The object whose closure should be cleaned.
	 * @param level the clean up level.
	 * @param checkSerializable Flag to indicate whether serializability should be checked after
	 *                          the closure cleaning attempt.
	 *
	 * @throws InvalidProgramException Thrown, if 'checkSerializable' is true, and the object was
	 *                                 not serializable after the closure cleaning.
	 *
	 * @throws RuntimeException A RuntimeException may be thrown, if the code of the class could not
	 *                          be loaded, in order to process during the closure cleaning.
	 */
	public static void clean(Object func, ExecutionConfig.ClosureCleanerLevel level, boolean checkSerializable) {
		clean(func, level, checkSerializable, Collections.newSetFromMap(new IdentityHashMap<>()));
	}

里面调用了clean函数,这个我们下篇文章再深入分析。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值