什么鬼
WindowOperator 里面还有有一个叫做 allowLateness 的东西,这个东西什么鬼呢?简单来说就给迟到的数据第二次机会。我允许它迟到一定的时间。在规定的迟到时间内,只要要数据来了,就会触发第二次窗口计算,那到什么时候就没有第二次机会了呢?下面我们来娓娓道来。
allowLateness 的逻辑过程
二话不说,先来看一下下面的代码,在这段代码中,
```java
WindowOperator 中的成员变量
/**
* The allowed lateness for elements. This is used for:
* <ul>
* <li>Deciding if an element should be dropped from a window due to lateness.
* <li>Clearing the state of a window if the system time passes the
* {@code window.maxTimestamp + allowedLateness} landmark.
* </ul>
*/
protected final long allowedLateness;
从上面的代码中,意思是允许元素迟到多长时间。它两个点作用:
- 根据迟到的时间,决定一个元素是否别丢弃。
- 如果系统时间超过了运行的最时间,Flink 就会清清理窗口的状态,这个阈值就是 window.maxTimestamp + allowedLateness 。超过这个值就会清楚窗口的状态。
如果想弄清楚这里的逻辑,就要到 WindowOperator onElement 这个方法里面找答案。
先来上代码:
public void processElement(StreamRecord<IN> element) throws Exception {
final Collection<W> elementWindows = windowAssigner.assignWindows(
element.getValue(), element.getTimestamp(), windowAssignerContext);
//if element is handled by none of assigned elementWindows
boolean isSkippedElement = true;
final K key = this.<K>getKeyedStateBackend().getCurrentKey();
if (windowAssigner instanceof MergingWindowAssigner) {
MergingWindowSet<W> mergingWindows = getMergingWindowSet();
for (W window: elementWindows) {
// adding the new window might result in a merge, in that case the actualWindow
// is the merged window and we work with that. If we don't merge then
// actualWindow == window
W actualWindow = mergingWindows.addWindow(window, new MergingWindowSet.MergeFunction<W>() {
@Override
public void merge(W mergeResult,
Collection<W> mergedWindows, W stateWindowResult,
Collection<W> mergedStateWindows) throws Exception {
if ((windowAssigner.isEventTime() && mergeResult.maxTimestamp() + allowedLateness <= internalTimerService.currentWatermark())) {
throw new UnsupportedOperationException("The end timestamp of an " +
"event-time window cannot become earlier than the current watermark " +
"by merging. Current watermark: " + internalTimerService.currentWatermark() +
" window: " + mergeResult);
} else if (!windowAssigner.isEventTime()) {
long currentProcessingTime = internalTimerService.currentProcessingTime();
if (mergeResult.maxTimestamp() <= currentProcessingTime) {
throw new UnsupportedOperationException("The end timestamp of a " +
"processing-time window cannot become earlier than the current processing time " +
"by merging. Current processing time: " + currentProcessingTime +
" window: " + mergeResult);
}
}
triggerContext.key = key;
triggerContext.window = mergeResult;
triggerContext.onMerge(mergedWindows);
for (W m: mergedWindows) {
triggerContext.window = m;
triggerContext.clear();
deleteCleanupTimer(m);
}
// merge the merged state windows into the newly resulting state window
windowMergingState.mergeNamespaces(stateWindowResult, mergedStateWindows);
}
});
// drop if the window is already late
if (isWindowLate(actualWindow)) {
mergingWindows.retireWindow(actualWindow);
continue;
}
isSkippedElement = false;
W stateWindow = mergingWindows.getStateWindow(actualWindow);
if (stateWindow == null) {
throw new IllegalStateException("Window " + window + " is not in in-flight window set.");
}
windowState.setCurrentNamespace(stateWindow);
windowState.add(element.getValue());
triggerContext.key = key;
triggerContext.window = actualWindow;
TriggerResult triggerResult = triggerContext.onElement(element);
if (triggerResult.isFire()) {
ACC contents = windowState.get();
if (contents == null) {
continue;
}
emitWindowContents(actualWindow, contents);
}
if (triggerResult.isPurge()) {
windowState.clear();
}
registerCleanupTimer(actualWindow);
}
// need to make sure to update the merging state in state
mergingWindows.persist();
} else {
for (W window: elementWindows) {
// drop if the window is already late
if (isWindowLate(window)) {
continue;
}
isSkippedElement = false;
windowState.setCurrentNamespace(window);
windowState.add(element.getValue());
triggerContext.key = key;
triggerContext.window = window;
TriggerResult triggerResult = triggerContext.onElement(element);
if (triggerResult.isFire()) {
ACC contents = windowState.get();
if (contents == null) {
continue;
}
emitWindowContents(window, contents);
}
if (triggerResult.isPurge()) {
windowState.clear();
}
registerCleanupTimer(window);
}
}
// side output input event if
// element not handled by any window
// late arriving tag has been set
// windowAssigner is event time and current timestamp + allowed lateness no less than element timestamp
if (isSkippedElement && isElementLate(element)) {
if (lateDataOutputTag != null){
sideOutput(element);
} else {
this.numLateRecordsDropped.inc();
}
}
}
我们来分析一下,上面的代码,它的大体上的顺序这样的。
-
先根据元素的时间戳,计算出这个元素属于那个窗口。
-
判断这个窗口是否是可以合并的窗口,我们这里重点来看一下非可合并的窗口。
-
下面重点来了,isWindowLate(window) 这个方法会判断这个窗口是否已经过期了。它的代码为:
/**
- Returns {@code true} if the watermark is after the end timestamp plus the allowed lateness
- of the given window.
*/
protected boolean isWindowLate(W window) {
return (windowAssigner.isEventTime() && (cleanupTime(window) <= internalTimerService.currentWatermark()));
}
从代码上看,我们选择了 event time 后,才会比较 window.end - 1 + allowLateness 和最新 watermark 的大小关系。这里的逻辑是,如果现在的 watermark 落在 window.end - 1 + allowLateness的后面,说明已经超时了,如果落到了前面,就没有超时,窗口可以再触发一次。
如下图所示:
- sideOutput 这个是secord chance,他的功能是把运行迟到元素放到一个地方,后面再处理。就是
它的判断逻辑是 , element.timestamp + lateness <= currentWatermark 。可以这么想,开晨会的时候,比领导到的时间迟 5 分钟,就算迟到,迟到就罚款。lateness = 0 ,的意思是必须比领导到的要早。
下面是我测试的结果,还有我对结果的解释:
// 输入 000001,1461756862000, 1461756862000代表 2016-04-27 19:34:22.000 ,所以落到了区间[19:34:21:00,19:34:24:00)
timestamp:000001,1461756862000|2016-04-27 19:34:22.000,1461756862000|2016-04-27 19:34:22.000,Watermark @ -10000
// 输入 000001,1461756866000
timestamp:000001,1461756866000|2016-04-27 19:34:26.000,1461756866000|2016-04-27 19:34:26.000,Watermark @ 1461756852000
// 输入 000001,1461756862000
timestamp:000001,1461756872000|2016-04-27 19:34:32.000,1461756872000|2016-04-27 19:34:32.000,Watermark @ 1461756856000
// 输入 000001,1461756873000
timestamp:000001,1461756873000|2016-04-27 19:34:33.000,1461756873000|2016-04-27 19:34:33.000,Watermark @ 1461756862000
// 输入 000001,1461756874000, 1461756874000 代表 2016-04-27 19:34:34.000,这个值减去 10 s,正好是 19:34:24:00,所以
// 触发了 [19:34:21:00,19:34:24:00) 创建的计算,需要注意的是,计算的元素也是落到这个区间的数据,所以这次计算的窗口里面
// 只有一个元素 000001,1461756862000。
timestamp:000001,1461756874000|2016-04-27 19:34:34.000,1461756874000|2016-04-27 19:34:34.000,Watermark @ 1461756863000
(000001,1461756862000)
6> (000001,1,2016-04-27 19:34:22.000,2016-04-27 19:34:22.000,2016-04-27 19:34:21.000,2016-04-27 19:34:24.000)
// 由于设置了 allowLateness(2s),所以 000001,1461756863000(也在 [19:34:21:00,19:34:24:00)) 来到的时候有触发了窗口的计算,
// 需要注意的是,更新 state 的时候,要对结果进行覆盖操作,不能是累计操作。
timestamp:000001,1461756863000|2016-04-27 19:34:23.000,1461756874000|2016-04-27 19:34:34.000,Watermark @ 1461756864000
(000001,1461756862000)
(000001,1461756863000)
6> (000001,2,2016-04-27 19:34:22.000,2016-04-27 19:34:23.000,2016-04-27 19:34:21.000,2016-04-27 19:34:24.000)
// 000001,1461756861000 同上
timestamp:000001,1461756861000|2016-04-27 19:34:21.000,1461756874000|2016-04-27 19:34:34.000,Watermark @ 1461756864000
(000001,1461756861000)
(000001,1461756862000)
(000001,1461756863000)
6> (000001,3,2016-04-27 19:34:21.000,2016-04-27 19:34:23.000,2016-04-27 19:34:21.000,2016-04-27 19:34:24.000)
// 000001,1461756875000 , 1461756875000 代表的是 2016-04-27 19:34:35.000,对应的时间戳是 2016-04-27 19:34:25.000
timestamp:000001,1461756875000|2016-04-27 19:34:35.000,1461756875000|2016-04-27 19:34:35.000,Watermark @ 1461756864000
// 000001,1461756861000 同上
timestamp:000001,1461756861000|2016-04-27 19:34:21.000,1461756875000|2016-04-27 19:34:35.000,Watermark @ 1461756865000
(000001,1461756861000)
(000001,1461756861000)
(000001,1461756862000)
(000001,1461756863000)
6> (000001,4,2016-04-27 19:34:21.000,2016-04-27 19:34:23.000,2016-04-27 19:34:21.000,2016-04-27 19:34:24.000)
// 直到 000001,1461756876000 到来的时候,才开始 drop 迟到的数据,1461756876000 代表的是 2016-04-27 19:34:36.000,对应的
// watermark 是 2016-04-27 19:34:26.000 = 2016-04-27 19:34:24.000 + 2s ,也就说, 000001,1461756876000 这条记录来到后,
// Flink 框架会把[19:34:21:00,19:34:24:00)窗口的 content 被销毁了,找不到 content ,只能抛弃了
timestamp:000001,1461756876000|2016-04-27 19:34:36.000,1461756876000|2016-04-27 19:34:36.000,Watermark @ 1461756865000
// 碰巧,我们设置了 sideOutputLateData ,于是在 000001,1461756861000 在 outputStream 这个流里面输出了。
timestamp:000001,1461756861000|2016-04-27 19:34:21.000,1461756876000|2016-04-27 19:34:36.000,Watermark @ 1461756866000
6> 000001:outside