flink Watermark编程实例

一、周期生成watermark生成策略

        指定固定时间间隔生成watermark,并且可指定OutOfOrderness最大延迟容忍时间,所以此时watermark=最大元素时间戳-OutOfOrderness最大容忍时间,并且当Watermark>=窗口结束时间,窗口被触发进行计算,该操作在默认的trigger中。

        Watermark是可以设置延迟触发窗口计算,而allowedLateness是设置在窗口已经触发后对迟到的数据进行怎样的处理,是窗口的一种属性,默认为丢弃迟到数据,也可以侧输出流sideOutputLateData,也可以重新触发窗口计算allowedLateness。

import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;

import java.time.*;

import java.util.Random;

public class WaterMarkBoundedOutOfOrderness {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
//设置生成Watermark的时间间隔,基于性能的考虑
        env.getConfig().setAutoWatermarkInterval(100L);

        DataStreamSource<Tuple2<String, Long>> stringDataStreamSource = env.addSource(new SourceFunction<Tuple2<String,Long>>() {
            volatile boolean flag = true;

            @Override
            public void run(SourceContext<Tuple2<String,Long>> sourceContext) throws Exception {
                String[] s = {"张三","王五","李四","秋英"};
                while(flag) {
                    Thread.sleep(1000);
                    int i = new Random().nextInt(4);
                    sourceContext.collect(new Tuple2<String,Long>(s[i],System.currentTimeMillis()));
                }

            }

            @Override
            public void cancel() {
                flag = false;
            }
        });

        //在流上设置watermark生成策略,固定时间间隔策略,也就是最大容忍延迟时间
        SingleOutputStreamOperator<Tuple2<String, Long>> tuple2SingleOutputStreamOperator = stringDataStreamSource.assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(1))
                .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                    //抽取时间戳
                    @Override
                    public long extractTimestamp(Tuple2<String, Long> integerLongTuple2, long l) {
                        return integerLongTuple2.f1;
                    }
                }));

        tuple2SingleOutputStreamOperator.map(new MapFunction<Tuple2<String, Long>, Tuple3<String,Long,Integer>>() {
            @Override
            public Tuple3<String,Long,Integer> map(Tuple2<String, Long> stringLongTuple2) throws Exception {
                System.out.println(stringLongTuple2.f0 + stringLongTuple2.f1+" "+System.currentTimeMillis());
                return new Tuple3<String,Long,Integer>(stringLongTuple2.f0,stringLongTuple2.f1,1);
            }
        }).keyBy(new KeySelector<Tuple3<String,Long,Integer>, String>() {
            @Override
            public String getKey(Tuple3<String,Long,Integer> s) throws Exception {
                return s.f0;
            }
        }).window(TumblingEventTimeWindows.of(Time.seconds(10)))
          .sum(2)
          .print();

        env.execute("watermark test");

    }
}

=====================分割线=========================

以下代码类似boundedOutOfOrderness的代码,不过该方式已经过时:

SingleOutputStreamOperator<Tuple2<String, Long>> tuple2SingleOutputStreamOperator = streamSource.assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<Tuple2<String, Long>>() {
    Long currentWatermark = 0l;
    Long mxTimestamp = Long.MIN_VALUE;
    Long maxOutOfOrderness = 1000l;

//抽取时间戳
    @Override
    public long extractTimestamp(Tuple2<String, Long> s, long l) {
        System.out.println("(" + s.f0 + "," + s.f1 + ")");
        System.out.println("当前的watermark: " + currentWatermark);
        return mxTimestamp = Math.max(s.f1, mxTimestamp);
    }

//生成watermark
    @Nullable
    @Override
    public Watermark getCurrentWatermark() {
        currentWatermark = mxTimestamp - maxOutOfOrderness;
        System.out.println("当前产生的watermark: " + currentWatermark);
        return new Watermark(currentWatermark);
    }
});

二、watermark不更新怎么办?

如果有一个source一直没有事件流入,会发生什么?

因为没有任何事件流入,Flink流处理系统时钟将无法运作。source的这种情况,把它称之为IDLE source(空闲source)。在这种情况下,会因为某个task时钟没有推进,从而导致window无法触发计算。

在Flink中,我们可以使用withIdleness来设置空闲的source。

SingleOutputStreamOperator<Tuple2<String, Long>> wordWithTsDS =
        wordSource.assignTimestampsAndWatermarks(WatermarkStrategy
                .<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(5))      // 设置水印允许延迟5秒
 .withIdleness(Duration.ofSeconds(15))                                       // 设置空闲source为15秒
 .withTimestampAssigner((event, timestamp) -> event.f1));                    // 提取事件时间

 这样,在window计算的时候,如果某个source超过15秒没有事件流入,就会被标记为IDLE source,window在计算watermark的时候,会忽略该source。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值