flink 学习(十三)数据流连接 join


一、inner join

        两个流中的数据,通过join连接,在通过where和equalsTo条件判断后,条件成立并且处在同一个窗口内的数据会触发后续的窗口操作。

1.开启nc

开启两个端口,模拟两个数据来源

nc -lp 8888
nc -lp 8899

2.示例

 @Test
    public void joinTumblingTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //数据流2
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //连接两个数据流
        stream1.join(stream2)
                //第一个流的条件
                .where(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //第二个流的条件
                .equalTo(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //滚动窗口,时间间隔10毫秒
                .window(TumblingEventTimeWindows.of(Time.milliseconds(10)))
                .apply(new JoinFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String, Integer, Integer>>() {
                    @Override
                    public Tuple3<String, Integer, Integer> join(Tuple2<String, Integer> first, Tuple2<String, Integer> second) throws Exception {
                        return new Tuple3<>(first.f0, first.f1, second.f1);
                    }
                })
                .print("join");
        env.execute("joinTumblingTest");
    }

3.测试

数据流1

a,1
a,5
b,6
a,10

数据流2

a,7
a,8
a,11

由于滑动窗口设置的时间间隔是10毫秒,当窗口关闭的时候,处在0~10毫秒内的数据会触发join操作

结果

join> (a,1,7)
join> (a,1,8)
join> (a,5,7)
join> (a,5,8)

在这里插入图片描述

二、sliding-inner-join

        下面测试滑动窗口的内连接

1.示例

滑动窗口时间间隔是4毫秒,滑动间隔是2毫秒

 @Test
    public void joinSlidingTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //数据流2
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //连接两个数据流
        stream1.join(stream2)
                //第一个流的条件
                .where(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //第二个流的条件
                .equalTo(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //滚动窗口,时间间隔10毫秒
                .window(SlidingEventTimeWindows.of(Time.milliseconds(4),Time.milliseconds(2)))
                .apply(new JoinFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String, Integer, Integer>>() {
                    @Override
                    public Tuple3<String, Integer, Integer> join(Tuple2<String, Integer> first, Tuple2<String, Integer> second) throws Exception {
                        return new Tuple3<>(first.f0, first.f1, second.f1);
                    }
                })
                .print("sliding-inner-join");
        env.execute("joinSlidingTest");
    }

2.测试

数据流1输入 a,2 和 a,4
数据流2输入a,3 和 a,4

当达到滑动窗口时间间隔4毫秒时触发join连接,打印出 sliding-inner-join> (a,2,3)
在这里插入图片描述
接着
数据流1输入 a,5 和 a,6
数据流2输入a,5 和 a,6

此时到达滑动间隔2毫秒,则4~6之间的数据会触发join操作

在这里插入图片描述

三、session-inner-join

        下面测试会话窗口的内连接

1.示例

会话窗口,时间间隔10毫秒

	@Test
    public void joinSessionTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //数据流2
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1));
        ;
        //连接两个数据流
        stream1.join(stream2)
                //第一个流的条件
                .where(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //第二个流的条件
                .equalTo(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //会话窗口,时间间隔10毫秒
                .window(EventTimeSessionWindows.withGap(Time.milliseconds(10)))
                .apply(new JoinFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String, Integer, Integer>>() {
                    @Override
                    public Tuple3<String, Integer, Integer> join(Tuple2<String, Integer> first, Tuple2<String, Integer> second) throws Exception {
                        return new Tuple3<>(first.f0, first.f1, second.f1);
                    }
                })
                .print("session-inner-join");
        env.execute("joinSessionTest");
    }

2.测试

在这里插入图片描述

数据流1输入 a,3 和 a,14
数据流2输入a,5 和 a,20

四、left-join

两个数据流左连接

1.示例

@Test
    public void leftJoinTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        return null;
                    }
                });
        //数据流2
        SingleOutputStreamOperator<Tuple2<String, Integer>> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        return null;
                    }
                });
        //连接两个数据流
        stream1.coGroup(stream2)
                //第一个流的条件
                .where(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //第二个流的条件
                .equalTo(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                })
                //滚动窗口,时间间隔10毫秒
                .window(TumblingEventTimeWindows.of(Time.milliseconds(10)))
                .apply(new CoGroupFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String, Integer, Integer>>() {
                    @Override
                    public void coGroup(Iterable<Tuple2<String, Integer>> first, Iterable<Tuple2<String, Integer>> second, Collector<Tuple3<String, Integer, Integer>> out) throws Exception {
                        //左连接
                        for (Tuple2<String, Integer> left : first) {
                            boolean isJoin = false;
                            for (Tuple2<String, Integer> right : second) {
                                isJoin = true;
                                out.collect(new Tuple3<>(left.f0, left.f1, right.f1));
                            }
                            //右侧没有数据
                            if (!isJoin) {
                                out.collect(new Tuple3<>(left.f0, left.f1, null));
                            }
                        }

                    }
                })
                .print("left join");
        env.execute("coGroupTest");
    }

2.测试

当数据时间间隔大于10毫秒时,进行左连接输出

在这里插入图片描述

五、interval-join

进行连接的两个流a和b,如果满足
b.timestamp ∈ [a.timestamp + lowerBound; a.timestamp + upperBound]
或者
a.timestamp + lowerBound <= b.timestamp <= a.timestamp + upperBound
即b的时间戳位于a的时间戳的下限和上限的范围内,可以触发join操作。

1.示例

@Test
    public void intervalJoinTest() throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
                .setParallelism(1);
        //数据流1
        KeyedStream<Tuple2<String, Integer>, String> stream1 = env.socketTextStream("172.16.10.159", 8888)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1))
                .keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                });
        //数据流2
        KeyedStream<Tuple2<String, Integer>, String> stream2 = env.socketTextStream("172.16.10.159", 8899)
                .map(new MapFunction<String, Tuple2<String, Integer>>() {
                    @Override
                    public Tuple2<String, Integer> map(String value) throws Exception {
                        String[] s = value.split(",");
                        return new Tuple2<>(s[0], Integer.parseInt(s[1]));
                    }
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1))
                .keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
                    @Override
                    public String getKey(Tuple2<String, Integer> value) throws Exception {
                        return value.f0;
                    }
                });
        //连接两个数据流
        stream1.intervalJoin(stream2)
                //事件时间
                .inEventTime()
                //定义上下界
                .between(Time.milliseconds(-2), Time.milliseconds(2))
                //不包含下界
                .lowerBoundExclusive()
                .process(new ProcessJoinFunction<Tuple2<String, Integer>, Tuple2<String, Integer>, Tuple3<String,Integer, Integer>>() {
                    @Override
                    public void processElement(Tuple2<String, Integer> left, Tuple2<String, Integer> right, Context ctx, Collector<Tuple3<String, Integer, Integer>> out) throws Exception {
                        out.collect(new Tuple3<>(left.f0, left.f1, right.f1));
                    }
                })
                .print("interval-join");
        env.execute("intervalJoinTest");
    }

2.测试

数据流1输入

a,5

数据,流2输入

a,3
a,4
a,6
a,7
a,8

根据定义的上下界是 -2 和 2,数据流2中数据位于 5-2 和 5+2 之间的数据会进行join操作

在这里插入图片描述

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

_lrs

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值