一、transformation主要实现类图
1、Transformatiom的实现子类包含了所有DataStream转换操作,如常用到的StreamMap、StremFlatMap、StreamFilter算子封装在OneInputTransformation中,也就是常见的单输入类型的转换操作。双输入类型算子有join、connect、union等,对应双输入转换的TwoInputTransformation操作。
二、在Transformation实现类分析
2.1 PhysicalTransformation
public abstract class PhysicalTransformation<T> extends Transformation<T> {
private boolean supportsConcurrentExecutionAttempts = true;
PhysicalTransformation(String name, TypeInformation<T> outputType, int parallelism) {
super(name, outputType, parallelism);
}
PhysicalTransformation(
String name,
TypeInformation<T> outputType,
int parallelism,
boolean parallelismConfigured) {
super(name, outputType, parallelism, parallelismConfigured);
}
/**
ChainingStrategy枚举类:
ALWAYS:该Transformation中的算子会和上游算子尽可能的合并,将不同算子尽可能的运行在同一个subtask实例中,减少算子之间的网络传输
NEVER:代表该Transformation中的Operator永远不会和上下游
算子之间链化,因此对应的Operator会运行在独立的SubTask实例中。
HEAD:代表该Transformation对应的Operator为头部算子,不支
持上游算子链化,但是可以和下游算子链化,实际上就是OperatorChain
中的HeaderOperator。
*/
public abstract void setChainingStrategy(ChainingStrategy strategy);
public boolean isSupportsConcurrentExecutionAttempts() {
return supportsConcurrentExecutionAttempts;
}
public void setSupportsConcurrentExecutionAttempts(
boolean supportsConcurrentExecutionAttempts) {
this.supportsConcurrentExecutionAttempts = supportsConcurrentExecutionAttempts;
}
}
2.2 OneInputTransformation
单进单出的数据集之间的转换操作,例如map、flatMap、filter
OneInputTransformation类
以flatMap为例解释解释说明其转换过程:flatMap API 调用后会生成"Flat Map",outType和StreamOperator为transformation成员变量赋值,不同的API的会统一到transform()和doTransform()方法调用上
public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T, R> flatMapper) {
TypeInformation<R> outType =
TypeExtractor.getFlatMapReturnTypes(
clean(flatMapper), getType(), Utils.getCallLocationName(), true);
return flatMap(flatMapper, outType);
}
public <R> SingleOutputStreamOperator<R> flatMap(
FlatMapFunction<T, R> flatMapper, TypeInformation<R> outputType) {
return transform("Flat Map", outputType, new StreamFlatMap<>(clean(flatMapper)));
}
public <R> SingleOutputStreamOperator<R> transform(
String operatorName,
TypeInformation<R> outTypeInfo,
OneInputStreamOperator<T, R> operator) {
return doTransform(operatorName, outTypeInfo, SimpleOperatorFactory.of(operator));
}
OneInputTransformation有一个Transformation input成员变量,代表Transformation由上游Transformation生成而来。每个DataStream都有一个Transformation对象,表示该DataStream从上游的DataStream使用该Transformation得到的
protected <R> SingleOutputStreamOperator<R> doTransform(
String operatorName,
TypeInformation<R> outTypeInfo,
StreamOperatorFactory<R> operatorFactory) {
// read the output type of the input Transform to coax out errors about MissingTypeInfo
transformation.getOutputType();
OneInputTransformation<T, R> resultTransform =
new OneInputTransformation<>(
this.transformation,
operatorName,
operatorFactory,
outTypeInfo,
environment.getParallelism(),
false);
@SuppressWarnings({"unchecked", "rawtypes"})
SingleOutputStreamOperator<R> returnStream =
new SingleOutputStreamOperator(environment, resultTransform);
// //代码调用会将resultTransform添加到StreamExecutionEnvironment的transformations集合列表中,transformations收集每个DataStream api调用生成的Transformation,将业务逻辑处理串联起来。后续生成StreamGraph、JobGraph时会用到该集合列表
getExecutionEnvironment().addOperator(resultTransform);
return returnStream;
}
像TwoInputTransformation、SinkTransformation、SourceTransformation等PhysicalTransformation的子类转换过程基本类似,不再做过多的阐述。