flink1.11版本后 建议用WatermarkStrategy(Watermark生成策略)生成Watermark,当创建DataStream对象后,使用如下方法指定策略: assignTimestampsAndWatermarks(WatermarkStrategy<T>)
我们只需要 实现WatermarkGenerator<T>接口即可,该接口中有2个方法: onEvent方法在接收到每一个事件数据时就会触发调用,第一个参数event为接收的事件数据,第二个参数eventTimestamp表示事件时间戳,第三个参数output可用output.emitWatermark方法生成一个Watermark。 onPeriodicEmit方法会周期性触发,比每个元素生成一个Watermark效率高。接收一个WatermarkOutput类型的参数output,内部可用output.emitWatermark方法生成一个Watermark。
@Public
public interface WatermarkGenerator<T> {
/**
* Called for every event, allows the watermark generator to examine and remember the event
* timestamps, or to emit a watermark based on the event itself.
*/
void onEvent(T event, long eventTimestamp, WatermarkOutput output);
/**
* Called periodically, and might emit a new watermark, or not.
*
* <p>The interval in which this method is called and Watermarks are generated depends on {@link
* ExecutionConfig#getAutoWatermarkInterval()}.
*/
void onPeriodicEmit(WatermarkOutput output);
}
固定乱序长度策略(forBoundedOutOfOrderness)
通过调用WatermarkStrategy对象上的forBoundedOutOfOrderness方法来实现,接收一个Duration类型的参数作为最大乱序(out of order)长度。WatermarkStrategy对象上的withTimestampAssigner方法为从事件数据中提取时间戳提供了接口。
//在assignTimestampsAndWatermarks中用WatermarkStrategy.forBoundedOutOfOrderness方法抽取Timestamp和生成周期性水位线示例
public class Test{
public static void main(String[] args) throws Exception{
//创建流处理环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//设置EventTime语义
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
//设置周期生成Watermark间隔(10毫秒)
env.getConfig().setAutoWatermarkInterval(10L);
//并行度1
env.setParallelism(1);
//演示数据
DataStreamSource<ClickEvent> mySource = env.fromElements(
new ClickEvent("user1", 1L, 1),
new ClickEvent("user1", 2L, 2),
new ClickEvent("user1", 3L, 3),
new ClickEvent("user1", 4L, 4),
new ClickEvent("user1"