思考
flink 默认窗口Trigger,当数据不发送时,会自动关闭吗?
先说结论:当数据不发送时,不会自动关闭窗口,会一直等待数据;
感兴趣可看下方分析过程
问题分析
上述问题是使用flink 1.16.1 过程中一些疑惑点,以此记录;
进入正题window的使用方法为
stream
.keyBy(...) <- keyed versus non-keyed windows
.window(...) <- required: "assigner"
[.trigger(...)] <- optional: "trigger" (else default trigger)
[.evictor(...)] <- optional: "evictor" (else no evictor)
[.allowedLateness(...)] <- optional: "lateness" (else zero)
[.sideOutputLateData(...)] <- optional: "output tag" (else no side output for late data)
.reduce/aggregate/apply() <- required: "function"
[.getSideOutput(...)] <- optional: "output tag"
可以看到Trigger为可选项,当不指定会走默认的,以滑动窗口为例,可从源码中看到默认为EventTimeTrigger
这样只需要查看EventTimeTrigger 逻辑即可
@PublicEvolving
public class EventTimeTrigger extends Trigger<Object, TimeWindow> {
private static final long serialVersionUID = 1L;
private EventTimeTrigger() {}
@Override
public TriggerResult onElement(
Object element, long timestamp, TimeWindow window, TriggerContext ctx)
throws Exception {
if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
} else {
ctx.registerEventTimeTimer(window.maxTimestamp());
return TriggerResult.CONTINUE;
}
}
@Override
public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) {
return time == window.maxTimestamp() ? TriggerResult.FIRE : TriggerResult.CONTINUE;
}
@Override
public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx)
throws Exception {
return TriggerResult.CONTINUE;
}
@Override
public void clear(TimeWindow window, TriggerContext ctx) throws Exception {
ctx.deleteEventTimeTimer(window.maxTimestamp());
}
@Override
public boolean canMerge() {
return true;
}
@Override
public void onMerge(TimeWindow window, OnMergeContext ctx) {
// only register a timer if the watermark is not yet past the end of the merged window
// this is in line with the logic in onElement(). If the watermark is past the end of
// the window onElement() will fire and setting a timer here would fire the window twice.
long windowMaxTimestamp = window.maxTimestamp();
if (windowMaxTimestamp > ctx.getCurrentWatermark()) {
ctx.registerEventTimeTimer(windowMaxTimestamp);
}
}
@Override
public String toString() {
return "EventTimeTrigger()";
}
/**
* Creates an event-time trigger that fires once the watermark passes the end of the window.
*
* <p>Once the trigger fires all elements are discarded. Elements that arrive late immediately
* trigger window evaluation with just this one element.
*/
public static EventTimeTrigger create() {
return new EventTimeTrigger();
}
}
看到源码,会出现疑惑,这些方法的作用,执行顺序是什么??带着这些疑惑 首先去查看了官网,有这样一段描述(https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/operators/windows/#triggers):
通过官网的资料,大致可以看出每个方法的作用,
onElement()
方法在每个元素被加入窗口时调用。onEventTime()
方法在注册的 event-time timer 触发时调用。onProcessingTime()
方法在注册的 processing-time timer 触发时调用。onMerge()
方法与有状态的 trigger 相关。该方法会在两个窗口合并时, 将窗口对应 trigger 的状态进行合并,比如使用会话窗口时。- 最后,
clear()
方法处理在对应窗口被移除时所需的逻辑。
结合源码,可以看出 onElement() 每个数据经过,都会执行,在执行过程中对比当前数据时间戳与窗口最大时间戳,从而进行registerEventTimeTimer(注册完后,会注册到internalTimerService中,后续时间到达后会执行onEventTime方法)
综合上述可以看出,EventTimeTrigger 当数据不发送时,不会自动关闭窗口,会一直等待数据
测试样例
附上代码可自行验证
import cn.hutool.core.date.DateUtil;
import com.tigerclub.grassland.entity.Word;
import com.tigerclub.grassland.function.AggWindowFunction;
import com.tigerclub.grassland.function.SumAggFunction;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
import java.time.Duration;
public class WindowOperator {
/**
* test,2023-05-29 15:24:07
* test,2023-05-29 15:27:07
* test,2023-05-29 15:31:07
* test,2023-05-29 15:36:07
*
*
* test,2024-01-19 13:43:07
*/
public static void main(String[] args) {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
DataStreamSource<String> source = env.socketTextStream("127.0.0.1", 9999);
source.print("source>>>>>>");
SingleOutputStreamOperator<Word> aggregate = source.flatMap(new Splitter()).assignTimestampsAndWatermarks(
WatermarkStrategy.
<Word>forBoundedOutOfOrderness(Duration.ofSeconds(1))
.withTimestampAssigner((word, timestamp) -> word.getTime())
).keyBy(word -> word.getName())
//.window(TumblingEventTimeWindows.of(Time.minutes(5)))
.window(SlidingEventTimeWindows.of(Time.minutes(1),Time.seconds(2)))
// .trigger(new EventTimeOrIntervalTrigger("78722"))
.aggregate(new SumAggFunction(), new AggWindowFunction());
aggregate.print(">>>>>>>>>>>>>>");
try {
env.execute("测试");
} catch (Exception e) {
e.printStackTrace();
}
}
public static class Splitter implements FlatMapFunction<String, Word> {
@Override
public void flatMap(String sentence, Collector<Word> out) throws Exception {
String[] word = sentence.split(",");
out.collect(new Word(word[0],word[1], DateUtil.parse(word[1], "yyyy-MM-dd HH:mm:ss").getTime(), 1L));
}
}
}
下方图片可以看出,远超该触发时间,窗口却未进行计算
为解决此问题需要自定义Trigger,此处定义超过窗口长度1.2倍执行ProcessTimeTimer,来进行窗口关闭
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.tigerclub.grassland.function;
import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.streaming.api.windowing.triggers.Trigger;
import org.apache.flink.streaming.api.windowing.triggers.TriggerResult;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import java.io.IOException;
/**
* A {@link Trigger} that fires once the watermark passes the end of the window
* to which a pane belongs.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
@PublicEvolving
public class EventTimeOrIntervalTrigger extends Trigger<Object, TimeWindow> {
private static final long serialVersionUID = 1L;
ValueStateDescriptor<Integer> triggerStateDescriptor;
public EventTimeOrIntervalTrigger(String instanceId) {
triggerStateDescriptor = new ValueStateDescriptor<Integer>("triggerState" + instanceId, Types.INT);
}
@Override
public TriggerResult onElement(Object element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
ValueState<Integer> triggerState = ctx.getPartitionedState(triggerStateDescriptor);
Integer state = triggerState.value();
if (state == null) {// first element in this window
long processTriggerTime = ctx.getCurrentProcessingTime() + (long) ((window.getEnd() - window.getStart()) * 1.2);
ctx.registerProcessingTimeTimer(processTriggerTime);
ctx.registerEventTimeTimer(window.maxTimestamp());
triggerState.update(1);
return TriggerResult.CONTINUE;
}
if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
}
return TriggerResult.CONTINUE;
}
@Override
public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) throws IOException {
if (time == window.maxTimestamp()) {
ValueState<Integer> triggerState = ctx.getPartitionedState(triggerStateDescriptor);
if (triggerState.value() != null && triggerState.value() != 2) {
triggerState.update(3);
return TriggerResult.FIRE;
}
}
return TriggerResult.CONTINUE;
}
@Override
public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) throws Exception {
ValueState<Integer> triggerState = ctx.getPartitionedState(triggerStateDescriptor);
if (triggerState != null && triggerState.value() != 3) {
triggerState.update(2);
return TriggerResult.FIRE_AND_PURGE;
} else {
return TriggerResult.CONTINUE;
}
}
@Override
public void clear(TimeWindow window, TriggerContext ctx) throws Exception {
ctx.deleteEventTimeTimer(window.maxTimestamp());
}
@Override
public boolean canMerge() {
return true;
}
@Override
public void onMerge(TimeWindow window,
OnMergeContext ctx) {
// only register a timer if the watermark is not yet past the end of the merged window
// this is in line with the logic in onElement(). If the watermark is past the end of
// the window onElement() will fire and setting a timer here would fire the window twice.
long windowMaxTimestamp = window.maxTimestamp();
if (windowMaxTimestamp > ctx.getCurrentWatermark()) {
ctx.registerEventTimeTimer(windowMaxTimestamp);
}
}
@Override
public String toString() {
return "EventTimeTrigger()";
}
/**
* Creates an event-time trigger that fires once the watermark passes the end of the window.
*
* <p>Once the trigger fires all elements are discarded. Elements that arrive late immediately
* trigger window evaluation with just this one element.
*/
public static EventTimeOrIntervalTrigger create(String instanceId) {
return new EventTimeOrIntervalTrigger(instanceId);
}
}