窗口算子负责处理窗口中的数据,数据流源源不断地进入算子,每一个数据元素进入算子时,首先会交给 WindowAssigner。WindowAssigner 决定元素被分配到哪个或哪些窗口,在这个过程中可能会创建新窗口或者合并旧的窗口。在WindowOperator中可能同时存在多个窗口,一个元素可以被放到多个窗口中。
此处需要注意,Window本身只是一个ID标识符,其内部可能存储一些元数据,如TimeWindow中有开始和结束时间,但是并不会存储窗口中的元素。窗口中的元素实际存储在 Key/Value State 中,Key为Window,Value为数据集合(聚合值)。
每一个窗口都拥有一个属于自己的Trigger,Trigger上有定时器,用来决定一个窗口何时能够被计算或清除。每当有元素被分配到该窗口,或者之前注册的定时器超时时,Trigger都会被调用。
Trigger被触发后,窗口中的元素集合就会交给 Evictor(如果指定了的话),Evictor主要用来遍历窗口中的元素列表,并决定最先进入窗口的多少个元素需要被移除。剩余的元素会交给用户指定的窗口函数进行窗口的计算。如果没有Evictor的话,窗口中的所有元素会一起交给窗口函数进行计算。
窗口函数收到窗口的元素,计算出窗口的结果值,并发送给下游。窗口的结果值可以是一个或多个。DataStream API 上可以接收不同类型的计算函数,包括预定义的 sum()、min()、max(),以及 ReduceFunction、WindowFunction等。WindowFunction是最通用的计算函数,其他预定义的函数基本上都是基于该函数实现的。
Window类包含两个子类:GlobalWindow和TimeWindow。GlobalWindow是全局窗口,
- GlobalWindow是全局窗口,比如count window
- TimeWindow是具有起止时间的时间段窗口。定义了明确的起止时间(start和end),有明确时间跨度
WindowAssigner
数据来了之后WindowAssigner决定数据分配到哪个窗口集合
WindowAssigner的method和子类
大部分场景都是事件时间的时间窗口,对应SlidingEventTimeWindows 和 TumblingEventTimeWindows
SlidingEventTimeWindows为例,在 assignWindows() 根据窗口的start,和size和数据的tms判断是否分配新的窗口(是否启动新的窗口)
每个 WindowAssigner 都有一个默认值 Trigger
WindowTrigger
Trigger触发器决定一个窗口何时能够被计算或清除,每一个窗口都拥有一个属于自己的Trigger,Trigger上会有定时器,用来决定一个窗口何时能够被计算或清除。每当有元素加入该窗口,或者之前注册的定时器超时时,Trigger都会被调用。
EventTimeTrigger为例,代码如下
/**
* A {@link Trigger} that fires once the watermark passes the end of the window to which a pane
* 翻译:一旦水印通过窗格所在的窗口的末端,就会触发
* belongs.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
@PublicEvolving
public class EventTimeTrigger extends Trigger<Object, TimeWindow> {
private static final long serialVersionUID = 1L;
private EventTimeTrigger() {}
@Override
public TriggerResult onElement(
Object element, long timestamp, TimeWindow window, TriggerContext ctx)
throws Exception {
if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
// 翻译:如果水印已经超过窗口,则立即触发
return TriggerResult.FIRE;
} else {
ctx.registerEventTimeTimer(window.maxTimestamp());
return TriggerResult.CONTINUE;
}
}
@Override
public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) {
return time == window.maxTimestamp() ? TriggerResult.FIRE : TriggerResult.CONTINUE;
}
@Override
public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx)
throws Exception {
return TriggerResult.CONTINUE;
}
@Override
public void clear(TimeWindow window, TriggerContext ctx) throws Exception {
ctx.deleteEventTimeTimer(window.maxTimestamp());
}
@Override
public boolean canMerge() {
return true;
}
@Override
public void onMerge(TimeWindow window, OnMergeContext ctx) {
// only register a timer if the watermark is not yet past the end of the merged window
// 翻译: 如果水印尚未超过合并窗口的结束,则仅注册计时器
// this is in line with the logic in onElement(). If the watermark is past the end of
// 翻译: 这与onElement()中的逻辑是一致的。如果水印已经过了结束
// the window onElement() will fire and setting a timer here would fire the window twice.
// 翻译:window onElement()会触发,在这里设置一个计时器会触发两次窗口
long windowMaxTimestamp = window.maxTimestamp();
if (windowMaxTimestamp > ctx.getCurrentWatermark()) {
ctx.registerEventTimeTimer(windowMaxTimestamp);
}
}
@Override
public String toString() {
return "EventTimeTrigger()";
}
/**
* Creates an event-time trigger that fires once the watermark passes the end of the window.
* 翻译;创建一个事件时间触发器,一旦水印通过窗口的末端就触发该触发器。
* <p>Once the trigger fires all elements are discarded. Elements that arrive late immediately
* 翻译:触发器触发后,所有元素都被丢弃。立即迟到的元素
* trigger window evaluation with just this one element.
* 翻译:只使用这一个元素触发窗口计算。
*/
public static EventTimeTrigger create() {
return new EventTimeTrigger();
}
}
Trigger触发的结果如下:
当数据来的时候,调用Trigger判断是否需要触发计算,如果调用结果只是Fire,则计算窗口并保留窗口原样,窗口中的数据不清理,数据保持不变,等待下次触发计算的时候再次执行计算。窗口中的数据会被反复计算,直到触发结果清理。在清理之前,窗口和数据不会被释放,所以窗口一直占用内存。
WindowEvictor
Evictor可以理解为窗口数据的过滤器,可以帮助我们实现一些窗口数据的清理,比如我只要窗口的前100条数据
主要有3个
1) CountEvictor: 计数过滤器。在Window中保留指定数量的元素,并从窗口头部开始丢弃其余元素。
2)DeltaEvictor:阈值过滤器。本质上来说就是一个自定义规则,计算窗口中每一个数据记录与一个事先定义好的阈值作比较,丢弃超过阈值的数据记录。
3)TimeEvictor:时间过滤器。保留Window中最近一段时间內的元素,并丢弃其余元素。
WindowFunction
数据经过WindowAssigner后,已经被分配到不同的Window中,接下来,要通过窗口函数对窗口內的数据进行处理。窗口函数主要分为两种。
1)增量计算函数
增量计算指的是窗口保留一份中间数据,每流入一个新的元素,新的元素都会与中间数据两两合一,生成新的中间数据,再保留到窗口中,如:ReduceFunction、AggregateFunction、FoldFunction(过期)。
增量计算函数的优点是数据到达后立即计算,窗口只保留中间结果,计算效率高,但是增量计算函数计算模式是事先确定的,能够满足大部分的计算需求,对于特殊业务需求可能无法满足。
2)全量计算函数
全量计算指的是先缓存该窗口的所有元素,等到触发条件满足后对窗口內的所有元素执行计算。Flink中的内置的 ProcessWindowFunction就是全量计算函数,通过全量缓存,实现灵活计算,计算效率比增量聚合稍低,毕竟要占用更多的内存。
WindowOperator
WindowOperator类初始化时,会构建好以上几个类,主要步骤是
-
WindowAssigner 分配窗口
-
设置窗口状态
-
把这条数据加到状态里面
-
并用当前这一条数据判断是否触发窗口计算
-
如果触发窗口计算,则调用用户写的窗口函数
-
注册一个窗口本身的清除定时器
代码讲解
核心方法是processElement(),里面去做merge,包括时间水印等等,还有allowedLateness等操作
public void processElement(StreamRecord<IN> element) throws Exception {
final Collection<W> elementWindows =
windowAssigner.assignWindows(
element.getValue(), element.getTimestamp(), windowAssignerContext);
// if element is handled by none of assigned elementWindows
boolean isSkippedElement = true;
final K key = this.<K>getKeyedStateBackend().getCurrentKey();
if (windowAssigner instanceof MergingWindowAssigner) {
MergingWindowSet<W> mergingWindows = getMergingWindowSet();
for (W window : elementWindows) {
// adding the new window might result in a merge, in that case the actualWindow
// is the merged window and we work with that. If we don't merge then
// actualWindow == window
W actualWindow =
mergingWindows.addWindow(
window,
new MergingWindowSet.MergeFunction<W>() {
@Override
public void merge(
W mergeResult,
Collection<W> mergedWindows,
W stateWindowResult,
Collection<W> mergedStateWindows)
throws Exception {
if ((windowAssigner.isEventTime()
&& mergeResult.maxTimestamp() + allowedLateness
<= internalTimerService
.currentWatermark())) {
throw new UnsupportedOperationException(
"The end timestamp of an "
+ "event-time window cannot become earlier than the current watermark "
+ "by merging. Current watermark: "
+ internalTimerService
.currentWatermark()
+ " window: "
+ mergeResult);
} else if (!windowAssigner.isEventTime()) {
long currentProcessingTime =
internalTimerService.currentProcessingTime();
if (mergeResult.maxTimestamp()
<= currentProcessingTime) {
throw new UnsupportedOperationException(
"The end timestamp of a "
+ "processing-time window cannot become earlier than the current processing time "
+ "by merging. Current processing time: "
+ currentProcessingTime
+ " window: "
+ mergeResult);
}
}
triggerContext.key = key;
triggerContext.window = mergeResult;
triggerContext.onMerge(mergedWindows);
for (W m : mergedWindows) {
triggerContext.window = m;
triggerContext.clear();
deleteCleanupTimer(m);
}
// merge the merged state windows into the newly resulting
// state window
windowMergingState.mergeNamespaces(
stateWindowResult, mergedStateWindows);
}
});
// drop if the window is already late
if (isWindowLate(actualWindow)) {
mergingWindows.retireWindow(actualWindow);
continue;
}
isSkippedElement = false;
W stateWindow = mergingWindows.getStateWindow(actualWindow);
if (stateWindow == null) {
throw new IllegalStateException(
"Window " + window + " is not in in-flight window set.");
}
windowState.setCurrentNamespace(stateWindow);
windowState.add(element.getValue());
triggerContext.key = key;
triggerContext.window = actualWindow;
TriggerResult triggerResult = triggerContext.onElement(element);
if (triggerResult.isFire()) {
ACC contents = windowState.get();
if (contents == null) {
continue;
}
emitWindowContents(actualWindow, contents);
}
if (triggerResult.isPurge()) {
windowState.clear();
}
registerCleanupTimer(actualWindow);
}
// need to make sure to update the merging state in state
mergingWindows.persist();
} else {
for (W window : elementWindows) {
// drop if the window is already late
if (isWindowLate(window)) {
continue;
}
isSkippedElement = false;
windowState.setCurrentNamespace(window);
windowState.add(element.getValue());
triggerContext.key = key;
triggerContext.window = window;
TriggerResult triggerResult = triggerContext.onElement(element);
if (triggerResult.isFire()) {
ACC contents = windowState.get();
if (contents == null) {
continue;
}
emitWindowContents(window, contents);
}
if (triggerResult.isPurge()) {
windowState.clear();
}
registerCleanupTimer(window);
}
}
// side output input event if
// element not handled by any window
// late arriving tag has been set
// windowAssigner is event time and current timestamp + allowed lateness no less than
// element timestamp
if (isSkippedElement && isElementLate(element)) {
if (lateDataOutputTag != null) {
sideOutput(element);
} else {
this.numLateRecordsDropped.inc();
}
}
}
如果元素被分配到了新的窗口,调用open(),表示窗口已经开始
@Override
public void open() throws Exception {
super.open();
this.numLateRecordsDropped = metrics.counter(LATE_ELEMENTS_DROPPED_METRIC_NAME);
timestampedCollector = new TimestampedCollector<>(output);
internalTimerService = getInternalTimerService("window-timers", windowSerializer, this);
triggerContext = new Context(null, null);
processContext = new WindowContext(null);
windowAssignerContext =
new WindowAssigner.WindowAssignerContext() {
@Override
public long getCurrentProcessingTime() {
return internalTimerService.currentProcessingTime();
}
};
// create (or restore) the state that hold the actual window contents
// NOTE - the state may be null in the case of the overriding evicting window operator
if (windowStateDescriptor != null) {
windowState =
(InternalAppendingState<K, W, IN, ACC, ACC>)
getOrCreateKeyedState(windowSerializer, windowStateDescriptor);
}
// create the typed and helper states for merging windows
if (windowAssigner instanceof MergingWindowAssigner) {
// store a typed reference for the state of merging windows - sanity check
if (windowState instanceof InternalMergingState) {
windowMergingState = (InternalMergingState<K, W, IN, ACC, ACC>) windowState;
} else if (windowState != null) {
throw new IllegalStateException(
"The window uses a merging assigner, but the window state is not mergeable.");
}
@SuppressWarnings("unchecked")
final Class<Tuple2<W, W>> typedTuple = (Class<Tuple2<W, W>>) (Class<?>) Tuple2.class;
final TupleSerializer<Tuple2<W, W>> tupleSerializer =
new TupleSerializer<>(
typedTuple, new TypeSerializer[] {windowSerializer, windowSerializer});
final ListStateDescriptor<Tuple2<W, W>> mergingSetsStateDescriptor =
new ListStateDescriptor<>("merging-window-set", tupleSerializer);
// get the state that stores the merging sets
mergingSetsState =
(InternalListState<K, VoidNamespace, Tuple2<W, W>>)
getOrCreateKeyedState(
VoidNamespaceSerializer.INSTANCE, mergingSetsStateDescriptor);
mergingSetsState.setCurrentNamespace(VoidNamespace.INSTANCE);
}
}
将元素添加到对应的窗口,并更新窗口的状态
判断窗口是否应该触发计算,如果是,则触发窗口函数的fire方法。
在窗口函数计算完成后,如果窗口已经结束,则调用窗口函数的close方法,表示窗口已经结束。
以事件事件 onEventTime() 为例
public void onEventTime(InternalTimer<K, W> timer) throws Exception {
triggerContext.key = timer.getKey();
triggerContext.window = timer.getNamespace();
MergingWindowSet<W> mergingWindows;
if (windowAssigner instanceof MergingWindowAssigner) {
mergingWindows = getMergingWindowSet();
W stateWindow = mergingWindows.getStateWindow(triggerContext.window);
if (stateWindow == null) {
// Timer firing for non-existent window, this can only happen if a
// trigger did not clean up timers. We have already cleared the merging
// window and therefore the Trigger state, however, so nothing to do.
return;
} else {
windowState.setCurrentNamespace(stateWindow);
}
} else {
windowState.setCurrentNamespace(triggerContext.window);
mergingWindows = null;
}
TriggerResult triggerResult = triggerContext.onEventTime(timer.getTimestamp());
if (triggerResult.isFire()) {
ACC contents = windowState.get();
if (contents != null) {
emitWindowContents(triggerContext.window, contents);
}
}
if (triggerResult.isPurge()) {
windowState.clear();
}
if (windowAssigner.isEventTime()
&& isCleanupTime(triggerContext.window, timer.getTimestamp())) {
clearAllState(triggerContext.window, windowState, mergingWindows);
}
if (mergingWindows != null) {
// need to make sure to update the merging state in state
mergingWindows.persist();
}
}