今天我们开始分析Flink中第一个流程图StreamGraph的生成,StreamGraph 相关的代码主要在 org.apache.flink.streaming.api.graph
包中。构造StreamGraph的入口函数是 StreamGraphGenerator.generate(env, transformations)
。该函数会由触发程序执行的方法StreamExecutionEnvironment.execute()
调用到。也就是说 StreamGraph 是在 Client 端构造的,也就是说我们可以在本地通过调试观察 StreamGraph 的构造过程。接下来我们分析一下源码,首先我们从StreamExecutionEnvironment.execute()函数进入,代码如下:
/**
* Triggers the program execution. The environment will execute all parts of
* the program that have resulted in a "sink" operation. Sink operations are
* for example printing results or forwarding them to a message queue.
*
* <p>The program execution will be logged and displayed with the provided name
*
* @param jobName
* Desired name of the job
* @return The result of the job execution, containing elapsed time and accumulators.
* @throws Exception which occurs during job execution.
*/
public JobExecutionResult execute(String jobName) throws Exception {
Preconditions.checkNotNull(jobName, "Streaming Job name should not be null.");
return execute(getStreamGraph(jobName));
}
这里jobName不能为null,我们进入到getStreamGraph函数中,代码如下:
/**
* Getter of the {@link org.apache.flink.streaming.api.graph.StreamGraph} of the streaming job.
*
* @param jobName Desired name of the job
* @return The streamgraph representing the transformations
*/
@Internal
public StreamGraph getStreamGraph(String jobName) {
return getStreamGraphGenerator().setJobName(jobName).generate();
}
首先getStreamGraphGenerator函数会去创建一个StreamGraphGenerator类对象,代码如下:
private StreamGraphGenerator getStreamGraphGenerator() {
if (transformations.size() <= 0) {
throw new IllegalStateException("No operators defined in streaming topology. Cannot execute.");
}
return new StreamGraphGenerator(transformations, config, checkpointCfg)
.setStateBackend(defaultStateBackend)
.setChaining(isChainingEnabled)
.setUserArtifacts(cacheFile)
.setTimeCharacteristic(timeCharacteristic)
.setDefaultBufferTimeout(bufferTimeout);
}
这里transformations是List<Transformation<?>>类型,是一个Transformation类对象的集合,我们先来看看Transformation类对象,以及transformations的形成。Transformation类的继承关系如下:
![Transformation类继承图](https://img-blog.csdnimg.cn/20191221142100517.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl8zOTkzNTg4Nw==,size_16,color_FFFFFF,t_70)
DataStream中的map、flatMap、filter、window等函数会创建一个个Transformation类对象,这些对象会存储到环境变量的List<Transformation<?>>类对象中,接下来我们先来分析DataStream类。它的继承结构如下:
![DataStream类的继承结构](https://img-blog.csdnimg.cn/20191221214256621.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl8zOTkzNTg4Nw==,size_16,color_FFFFFF,t_70)
我们首先看看map函数,代码如下:
/**
* Applies a Map transformation on a {@link DataStream}. The transformation
* calls a {@link MapFunction} for each element of the DataStream. Each
* MapFunction call returns exactly one element. The user can also extend
* {@link RichMapFunction} to gain access to other features provided by the
* {@link org.apache.flink.api.common.functions.RichFunction} interface.
*
* @param mapper
* The MapFunction that is called for each element of the
* DataStream.
* @param <R>
* output type
* @return The transformed {@link DataStream}.
*/
public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper) {
// 通过java reflection抽出mapper的返回值类型
TypeInformation<R> outType = TypeExtractor.getMapReturnTypes(clean(mapper), getType(),
Utils.getCallLocationName(), true);
// 返回一个新的DataStream,SteramMap 为 StreamOperator 的实现类
return transform("Map", outType, new StreamMap<>(clean(mapper)));
}
从上面的函数可以看到,它的入参是一个MapFunction<T,R>泛型类对象,返回SingleOutputStreamOperator<R>类对象,我们先来看看函数中第一行代码中的clean函数,该函数代码如下:
/**
* Invokes the {@link org.apache.flink.api.java.ClosureCleaner}
* on the given function if closure cleaning is enabled in the {@link ExecutionConfig}.
*
* @return The cleaned Function
*/
protected <F> F clean(F f) {
//调用环境变量中的clean函数
return getExecutionEnvironment().clean(f);
}
从上面的代码可以看出,先获取环境变量,这个环境变量是在DataStreamSource<String> text = env.socketTextStream("localhost", port, "\n");存放到text中的,然后调用clean函数,代码如下:
/**
* Returns a "closure-cleaned" version of the given function. Cleans only if closure cleaning
* is not disabled in the {@link org.apache.flink.api.common.ExecutionConfig}
*/
@Internal
public <F> F clean(F f) {
//从配置中获取是否需要清除f类对象的外部类引用
if (getConfig().isClosureCleanerEnabled()) {
//开始将类对象f中的外部类引用清除掉,防止序列化失败
ClosureCleaner.clean(f, getConfig().getClosureCleanerLevel(), true);
}
ClosureCleaner.ensureSerializable(f);
return f;
}
代码中我们看到调用了ClosureCleaner中的函数clean,我们进入到该函数,代码如下:
/**
* Tries to clean the closure of the given object, if the object is a non-static inner
* class.
*
* @param func The object whose closure should be cleaned.
* @param level the clean up level.
* @param checkSerializable Flag to indicate whether serializability should be checked after
* the closure cleaning attempt.
*
* @throws InvalidProgramException Thrown, if 'checkSerializable' is true, and the object was
* not serializable after the closure cleaning.
*
* @throws RuntimeException A RuntimeException may be thrown, if the code of the class could not
* be loaded, in order to process during the closure cleaning.
*/
public static void clean(Object func, ExecutionConfig.ClosureCleanerLevel level, boolean checkSerializable) {
clean(func, level, checkSerializable, Collections.newSetFromMap(new IdentityHashMap<>()));
}
里面调用了clean函数,这个我们下篇文章再深入分析。