浅议Flink Window Join时Watermark的推进机制

Flink Window Join就是将两条数据流通过窗口划分为有界流​,在划分的窗口内两条流的元素进行inner join,将共有的元素​发送出去。这里的窗口支持滚动/滑动/会话窗口等​。

固定的Api模板如下

stream.join(otherStream)
    .where(<KeySelector>)
    .equalTo(<KeySelector>)
    .window(<WindowAssigner>)
    .apply(<JoinFunction>);

滚动时间窗口的Window Join

滑动时间窗口的Window Join

假设有两条流,流元素分别是(id,timestamp)。对两条流进行Window Join,使用了一个窗口长度为5秒的​滚动时间窗口。相关代码见文末。

整个Flink作业的Watermark是两条流​watermark的最小值。只有当整个作业的Watermark超过了窗口的结束时候,此窗口内Join上的元素才会​被输出。

查看org.apache.flink.streaming.api.operators.AbstractStreamOperator类中processWatermark方法,可看到两条流中的最小watermark值作为整个作业的watermark.

附文中案例的相关程序

public class test {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3,5000));
        env.setParallelism(1);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);


        Properties pros = new Properties();
        pros.setProperty("bootstrap.servers", "xxx");
        DataStream<String> is1 = env.addSource(new FlinkKafkaConsumer<String>("topicName", new SimpleStringSchema(), pros));

        SingleOutputStreamOperator<Tuple2<String, Long>> watermarks1 = is1
                .map(new MapFunction<String, Tuple2<String, Long>>() {
                    @Override
                    public Tuple2<String, Long> map(String s) throws Exception {
                        String[] split = s.split(",");
                        return Tuple2.of(split[0],  Long.parseLong(split[1]));
                    }
                }).assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(0))
                                .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                                    @Override
                                    public long extractTimestamp(Tuple2<String, Long> stringStringLongTuple2, long l) {
                                        return stringStringLongTuple2.f1;
                                    }
                                })
                );

       watermarks1.print("watermark1");


        DataStream<String> is2 = env.addSource(new FlinkKafkaConsumer<String>("topicName", new SimpleStringSchema(), pros));
        SingleOutputStreamOperator<Tuple2<String, Long>> watermarks2 = is2
                .map(new MapFunction<String, Tuple2<String, Long>>() {
                    @Override
                    public Tuple2<String, Long> map(String s) throws Exception {
                        String[] split = s.split(",");
                        return Tuple2.of(split[0],  Long.parseLong(split[1]));
                    }
                }).assignTimestampsAndWatermarks(
                        WatermarkStrategy
                                .<Tuple2<String, Long>>forBoundedOutOfOrderness(Duration.ofSeconds(0))
                                .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                                    @Override
                                    public long extractTimestamp(Tuple2<String, Long> stringStringLongTuple2, long l) {
                                        return stringStringLongTuple2.f1;
                                    }
                                })
                );

        watermarks2.print("watermark2");

        watermarks1.join(watermarks2)
                        .where(a -> a.f0)
                                .equalTo(b -> b.f0)
                                        .window(TumblingEventTimeWindows.of(Time.seconds(5)))
                                                .apply(new JoinFunction<Tuple2<String, Long>, Tuple2<String, Long>, String>() {
                                                    @Override
                                                    public String join(Tuple2<String, Long> stringLongTuple2, Tuple2<String, Long> stringLongTuple22) throws Exception {
                                                        return stringLongTuple2 + " => " + stringLongTuple22;
                                                    }
                                                })
                                                        .print();
        
        env.execute("ConnectStreamViaWindowJoin");

    }

}

  • 5
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值