Window
也即窗口,是Flink流处理的特性之一。前一篇文章我们谈到了Winodw
的相关概念及其实现。窗口的目的是将无界的流转换为有界的元素集合,但这还不是最终的目的,最终的目的是在这有限的集合上apply(应用)某种函数,这就是我们本篇要谈的主题——WindowFunction
(窗口函数)。
那么窗口函数会在什么时候被应用呢?实际上,在触发器触发后会返回TriggerResult
这个枚举类型的其中一个枚举值。当返回的是FIRE
或者FIRE_AND_PURGE
时,窗口函数就会在窗口上应用。
Flink中将窗口函数分为两种:
- AllWindowFunction : 针对全局的不基于某个key进行分组的window的窗口函数的实现
- WindowFunction : 针对基于某个key进行分组的window的窗口函数的实现
这里重点研究windowFunction类, 通过代码跟踪来总结 在windowOperator中如何来调用用户自定义的窗口函数
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
WindowFunction
。
public interface WindowFunction<IN, OUT, KEY, W extends Window> extends Function, Serializable {
/**
* Evaluates the window and outputs none or several elements.
*
* @param key The key for which this window is evaluated.
* @param window The window that is being evaluated.
* @param input The elements in the window being evaluated.
* @param out A collector for emitting elements.
*
* @throws Exception The function may throw exceptions to fail the program and trigger recovery.
*/
void apply(KEY key, W window, Iterable<IN> input, Collector<OUT> out) throws Exception;
}
用户自定义的窗口函数如何被windowOperator最终调用:
1. 首先用户在生成的windowStream对象上 调用apply函数,将用户定义的窗口函数添加到系统当中,
apply代码跟踪:
public <R> SingleOutputStreamOperator<R> apply(WindowFunction<T, R, K, W> function, TypeInformation<R> resultType) {
//clean the closure
function = input.getExecutionEnvironment().clean(function);
String callLocation = Utils.getCallLocationName();
String udfName = "WindowedStream." + callLocation;
SingleOutputStreamOperator<R> result = createFastTimeOperatorIfValid(function, resultType, udfName);
if (result != null) {
return result;
}
LegacyWindowOperatorType legacyWindowOpType = getLegacyWindowType(function);
String opName;
KeySelector<T, K> keySel = input.getKeySelector();
WindowOperator<K, T, Iterable<T>, R, W> operator;
if (evictor != null) {
@SuppressWarnings({"unchecked", "rawtypes"})
TypeSerializer<StreamRecord<T>> streamRecordSerializer =
(TypeSerializer<StreamRecord<T>>) new StreamElementSerializer(input.getType().createSerializer(getExecutionEnvironment().getConfig()));
ListStateDescriptor<StreamRecord<T>> stateDesc =
new ListStateDescriptor<>("window-contents", streamRecordSerializer);
opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + evictor + ", " + udfName + ")";
operator =
new EvictingWindowOperator<>(windowAssigner,
windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()),
keySel,
input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()),
stateDesc,
new InternalIterableWindowFunction<>(function),
trigger,
evictor,
allowedLateness);
} else {
ListStateDescriptor<T> stateDesc = new ListStateDescriptor<>("window-contents",
input.getType().createSerializer(getExecutionEnvironment().getConfig()));
opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + udfName + ")";
operator =
new WindowOperator<>(windowAssigner,
windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()),
keySel,
input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()),
stateDesc,
new InternalIterableWindowFunction<>(function), //在这个地方,用户自定义的窗口函数生成了一个 InternalIterableWindowFunction对象
trigger,
allowedLateness,
legacyWindowOpType);
}
return input.transform(opName, resultType, operator);
}
继续看InternalIterableWindowFunction:
public final class InternalIterableWindowFunction<IN, OUT, KEY, W extends Window>
extends WrappingFunction<WindowFunction<IN, OUT, KEY, W>>
implements InternalWindowFunction<Iterable<IN>, OUT, KEY, W> {
private static final long serialVersionUID = 1L;
public InternalIterableWindowFunction(WindowFunction<IN, OUT, KEY, W> wrappedFunction) {
super(wrappedFunction);
}
@Override
public void apply(KEY key, W window, Iterable<IN> input, Collector<OUT> out) throws Exception {
wrappedFunction.apply(key, window, input, out);
}
@Override
public RuntimeContext getRuntimeContext() {
throw new RuntimeException("This should never be called.");
}
@Override
public IterationRuntimeContext getIterationRuntimeContext() {
throw new RuntimeException("This should never be called.");
}
}
InternalIterableWindowFunction的apply函数调用的就是用户自定义窗口函数的apply操作,可以看到用户自定义的窗口函数最终封装到了windowOperator当中
接下来看windowOperator的执行:
在window Operator中InternalIterableWindowFunction对象被复制给了this.userFunction,
在基于事件时间处理方法中:
public void onEventTime(InternalTimer<K, W> timer) throws Exception {
context.key = timer.getKey();
context.window = timer.getNamespace();
AppendingState<IN, ACC> windowState;
MergingWindowSet<W> mergingWindows = null;
if (windowAssigner instanceof MergingWindowAssigner) {
mergingWindows = getMergingWindowSet();
W stateWindow = mergingWindows.getStateWindow(context.window);
if (stateWindow == null) {
// timer firing for non-existent window, ignore
windowState = null;
} else {
windowState = getPartitionedState(
stateWindow,
windowSerializer,
windowStateDescriptor);
}
} else {
windowState = getPartitionedState(context.window, windowSerializer, windowStateDescriptor);
}
ACC contents = null;
if (windowState != null) {
contents = windowState.get();//从窗口状态器中获取所有元素
}
//通过上下文对象调用窗口触发器的on-event Time 处理方法并取得是否触发的结果
if (contents != null) {
TriggerResult triggerResult = context.onEventTime(timer.getTimestamp());
//如果返回的是出发窗口计算的结果,则调用emitWindowContents方法,
//在emitWindowContents方法中,会调用用户定义的窗口函数,对窗口中的所有元素进行计算处理
if (triggerResult.isFire()) {
emitWindowContents(context.window, contents);
}
//如果返回的是清理窗口元素的结果,则进行清理窗口操作
if (triggerResult.isPurge()) {
windowState.clear();
}
}
if (windowAssigner.isEventTime() && isCleanupTime(context.window, timer.getTimestamp())) {
clearAllState(context.window, windowState, mergingWindows);
}
if (mergingWindows != null) {
// need to make sure to update the merging state in state
mergingWindows.persist();
}
}
一旦触发计算后,通过emitWindowContents来对窗口中的元素进行计算,
再看emitWindowContents代码:
private void emitWindowContents(W window, ACC contents) throws Exception {
timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp());
userFunction.apply(context.key, context.window, contents, timestampedCollector);
}
我们看到了调用了userFunction的apply方法来完成实际的计算,即最终调用了用户自定义的窗口函数来完成计算并发送数据到下一个operator