source算子并行度问题
source算子比较特殊,在普通算子和实现了source funtion接口的并行度只能被设置为1
如果想要改变source的算子则会抛出异常,此时如果想要能够设置source的算子并行度,需要实现ParallelSourceFunction
接口
以下为flink 1.15版本的源码
public DataStreamSource(StreamExecutionEnvironment environment, TypeInformation<T> outTypeInfo, StreamSource<T, ?> operator, boolean isParallel, String sourceName, Boundedness boundedness) {
super(environment, new LegacySourceTransformation(sourceName, operator, outTypeInfo, environment.getParallelism(), boundedness));
this.isParallel = isParallel;
if (!isParallel) {
this.setParallelism(1); // 设置默认并行度为1
}
}
public DataStreamSource<T> setParallelism(int parallelism) {
// 会根据该方法校验并行度是否合法
OperatorValidationUtils.validateParallelism(parallelism, this.isParallel);
super.setParallelism(parallelism);
return this;
}
public static void validateParallelism(int parallelism, boolean canBeParallel) {
// 该语句判断当前是否能设置大于1的并行度
Preconditions.checkArgument(canBeParallel || parallelism == 1, "The parallelism of non parallel operator must be 1.");
Preconditions.checkArgument(parallelism > 0 || parallelism == -1, "The parallelism of an operator must be at least 1, or ExecutionConfig.PARALLELISM_DEFAULT (use system default).");
}
而这里可以看出一切都决定于参数canBeParallel
,这个参数是在我们调用addSource
方法的时候实现的
private <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function, String sourceName, @Nullable TypeInformation<OUT> typeInfo, Boundedness boundedness) {
Preconditions.checkNotNull(function);
Preconditions.checkNotNull(sourceName);
Preconditions.checkNotNull(boundedness);
TypeInformation<OUT> resolvedTypeInfo = this.getTypeInfo(function, sourceName, SourceFunction.class, typeInfo);
boolean isParallel = function instanceof ParallelSourceFunction;
this.clean(function);
StreamSource<OUT, ?> sourceOperator = new StreamSource(function);
return new DataStreamSource(this, resolvedTypeInfo, sourceOperator, isParallel, sourceName, boundedness);
}
从代码第六行可以看出,此处判断是否属于ParallelSourceFunction类,如果不属于则只能设置并行度为1