【基础】Flink -- Multistream Conversion

情绪大瓜皮丶

已于 2023-03-13 19:09:55 修改

阅读量331

点赞数

分类专栏： Flink 文章标签： flink java 大数据

于 2023-03-13 16:21:12 首次发布

本文链接：https://blog.csdn.net/zqf787351070/article/details/129496964

版权

Flink 专栏收录该内容

7 篇文章 2 订阅

订阅专栏

Flink -- Multistream Conversion

多流转换概述
分流
- 简单实现
- 侧输出流
合流

多流转换概述

前面介绍过的操作，无论是简单的转换或者是聚合、窗口计算等，都是基于一条流上的数据进行处理的。但是在实际的开发中，可能存在业务逻辑需要将不同数据源的数据合并处理或者将一条数据流分成多条流处理。

为解决上述的问题，Flink 为我们提供了“分流”与“合流”的多流转换操作。

本文代码中用到的模型 Event 以及自定义源算子 EventSource 如下所示：

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Event {

    public String user;
    public String url;
    public Long timestamp;

}

public class EventSource implements SourceFunction<Event> {

    private Boolean flag = true;

    String[] users = {"曹操", "刘备", "孙权", "诸葛亮"};
    String[] urls = {"/home", "/test?id=1", "/test?id=2", "/play/football", "/play/basketball"};

    @Override
    public void run(SourceContext<Event> sourceContext) throws Exception {
        Random random = new Random();
        while (flag) {
            sourceContext.collect(new Event(
                    users[random.nextInt(users.length)],
                    urls[random.nextInt(urls.length)],
                    Calendar.getInstance().getTimeInMillis()
            ));
            Thread.sleep(1000);
        }
    }

    @Override
    public void cancel() {
        flag = false;
    }
}

分流

所谓分流就是将一条独立的数据流拆分成完全独立的两条或者多条数据流，如下所示。

在这里插入图片描述

简单实现

分流的操作可以通过转换算子filter()简单实现。下列代码通过该方法，按照访客名简单的将单条数据流划分成了多条独立数据流：

public class FilterDemo {

    public static void main(String[] args) throws Exception {
        // 1。 环境准备
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
        environment.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
        environment.setParallelism(1);
        // 2. 设置数据源
        DataStreamSource<Event> source = environment.addSource(new EventSource());
        // 3. 根据用户名进行分流
        SingleOutputStreamOperator<Event> weiFilter = source.filter(new FilterFunction<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return "曹操".equals(event.user);
            }
        });
        SingleOutputStreamOperator<Event> shuFilter = source.filter(new FilterFunction<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return "刘备".equals(event.user) || "诸葛亮".equals(event.user);
            }
        });
        SingleOutputStreamOperator<Event> wuFilter = source.filter(new FilterFunction<Event>() {
            @Override
            public boolean filter(Event event) throws Exception {
                return "孙权".equals(event.user);
            }
        });
        // 4. 对分割的流执行输出
        weiFilter.print("魏");
        shuFilter.print("蜀");
        wuFilter.print("吴");
        // 5. 执行操作
        environment.execute();
    }

}

这种实现方法操作简单，但是这段代码底层的逻辑是将原来的数据流 stream 复制 3 份，然后对分一个复制出来的流进行筛选操作形成新的数据流，这显然不够高效。

侧输出流

另一种实现分流的方法是使用处理函数中的侧输出流。上述的使用filter()算子进行分流处理的操作中，其输出是单一的。而使用侧输出流可以得到多条数据流，并且可以自定义每条流当中的数据类型，使用更加灵活高效。

使用侧输出流实现分流的示例代码如下：

public class SideOutputDemo {

    public static void main(String[] args) throws Exception {
        // 1。 环境准备
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
        environment.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
        environment.setParallelism(1);
        // 2. 设置数据源
        DataStreamSource<Event> source = environment.addSource(new EventSource());
        // 3. 定义侧输出流标签，指定输出类型
        OutputTag<Event> weiOutputTag = new OutputTag<>("魏"){};
        OutputTag<Event> shuOutputTag = new OutputTag<>("蜀"){};
        OutputTag<Event> wuOutputTag = new OutputTag<>("吴"){};
        // 4. 使用处理函数进行分流
        SingleOutputStreamOperator<Event> process = source.process(new ProcessFunction<Event, Event>() {
            @Override
            public void processElement(Event event, ProcessFunction<Event, Event>.Context context,
                                       Collector<Event> collector) throws Exception {
                String name = event.getUser();
                if ("曹操".equals(name)) {
                    context.output(weiOutputTag, event);
                } else if ("刘备".equals(name) || "诸葛亮".equals(name)) {
                    context.output(shuOutputTag, event);
                } else if ("孙权".equals(name)) {
                    context.output(wuOutputTag, event);
                }
            }
        });
        // 5. 获取侧输出流执行输出
        process.getSideOutput(weiOutputTag).print("魏");
        process.getSideOutput(shuOutputTag).print("蜀");
        process.getSideOutput(wuOutputTag).print("吴");
        // 6. 执行程序
        environment.execute();
    }

}

合流

实际应用中也存在很多需要将多条不同源的数据流合并成一条流使用的场景，Flink 也为我们提供了丰富的 API 以应对多流合并的场景。

Union

使用 union 可以将多条流直接合并在一起，该操作要求所有流中的数据类型必须相同，合并后的数据流中将包含分流中的所有元素，且数据类型保持不变。

union 的参数可以传入多个流，即其可以实现多条流的合并操作。

合流之后的数据流水位线的传递同之前介绍的并行任务水位线的传递规则，通过实例代码可以进行简单测试：

public class UnionDemo {

    public static void main(String[] args) throws Exception {
        // 1。 环境准备
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
        environment.setParallelism(1)
                .setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
        // 2. 设置数据源
        SingleOutputStreamOperator<Event> stream01 = environment.socketTextStream("XX.XXX.XXX.XX", 8080)
                .map(data -> {
                    String[] field = data.split(",");
                    return new Event(field[0].trim(), field[1].trim(), Long.valueOf(field[2].trim()));
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        .<Event>forMonotonousTimestamps()
                        .withTimestampAssigner(new SerializableTimestampAssigner<Event>() {
                            @Override
                            public long extractTimestamp(Event event, long l) {
                                return event.getTimestamp();
                            }
                        }));
        stream01.print("stream01");
        SingleOutputStreamOperator<Event> stream02 = environment.socketTextStream("XX.XXX.XXX.XX", 8081)
                .map(data -> {
                    String[] field = data.split(",");
                    return new Event(field[0].trim(), field[1].trim(), Long.valueOf(field[2].trim()));
                })
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        .<Event>forMonotonousTimestamps()
                        .withTimestampAssigner(new SerializableTimestampAssigner<Event>() {
                            @Override
                            public long extractTimestamp(Event event, long l) {
                                return event.getTimestamp();
                            }
                        }));
        stream02.print("stream02");
        // 3. union 合流并输出水位线
        stream01.union(stream02)
                .process(new ProcessFunction<Event, String>() {
                    @Override
                    public void processElement(Event event, ProcessFunction<Event, String>.Context context,
                                               Collector<String> collector) throws Exception {
                        collector.collect("水位线>>>" + context.timerService().currentWatermark());
                    }
                })
                .print();
        // 4. 执行程序
        environment.execute();

    }

}

观察水位线的传递效果如下：

在这里插入图片描述

Connect

使用 union 进行流的连接的使用非常简单，但是其要求各个流之间的数据类型必须相同，且合并流的数据类型无法改变，因此并不适用于复杂的开发场景。

因此，Flink 提供了 connect 方法用于流的连接，可以更为灵活的操作分流与合流的数据类型。

在使用时，对一条流调用connect() 方法并传入另一条流即可实现流的 connect 合并，合并后的流的类型为 ConnectedStreams，对该类型的流可以调用keyby()进行按键分区，同时可以使用map()、flatmap()以及process()等方法进行同处理。

对于上述的同处理操作，其实现的接口中会有两个相同的方法需要实现，并以数字 1、2 进行区分，这两个方法将在两条流中的数据到来时分别调用。对于map()、flatmap()的使用比较简单，此处主要说一下使用process()进行协同处理的操作。

在process()方法中需要传入抽象类 CoProcessFunction 的子类，其也是处理函数的一员。它需要实现的就是processElement1()、processElement2()两个方法，在每个数据到来时，会根据来源的流调用其中的一个方法进行处理。CoProcessFunction 同样可以通过上下文 ctx 来访问 timestamp、水位线，并通过 TimerService 注册定时器；另外也提供了onTimer()方法，用于定义定时触发的处理操作。

public abstract class CoProcessFunction<IN1, IN2, OUT> extends 
AbstractRichFunction {            
    ...
    public abstract void processElement1(IN1 value, Context ctx, Collector<OUT> out) throws Exception;
    public abstract void processElement2(IN2 value, Context ctx, Collector<OUT> out) throws Exception;
    public void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) throws Exception {}
    public abstract class Context {...}
    ...
}

下面是 CoProcessFunction 的一个具体示例：我们可以实现一个实时对账的需求，也就是
app 的支付操作和第三方的支付操作的一个双流 Join。App 的支付事件和第三方的支付事件将会互相等待 5 秒钟，如果等不来对应的支付事件，那么就输出报警信息。示例代码如下：

public class ConnectDemo {

    public static void main(String[] args) throws Exception {
        // 1。 环境准备
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
        environment.setParallelism(1)
                .setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
        // 2. 配置数据源及水位线
        // 来自 app 的支付日志
        SingleOutputStreamOperator<Tuple3<String, String, Long>> appStream = environment
                .fromElements(
                        Tuple3.of("order-1", "app", 1000L),
                        Tuple3.of("order-2", "app", 2000L)
                ).assignTimestampsAndWatermarks(WatermarkStrategy
                        .<Tuple3<String, String, Long>>forMonotonousTimestamps()
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple3<String, String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple3<String, String, Long> element, long recordTimestamp) {
                                return element.f2;
                            }
                        })
                );
        // 来自第三方支付平台的支付日志
        SingleOutputStreamOperator<Tuple4<String, String, String, Long>> thirdpartStream = environment
                .fromElements(
                        Tuple4.of("order-1", "third-party", "success", 3000L),
                        Tuple4.of("order-3", "third-party", "success", 4000L)
        ).assignTimestampsAndWatermarks(WatermarkStrategy
                        .<Tuple4<String, String, String, Long>>forMonotonousTimestamps()
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple4<String, String, String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple4<String, String, String, Long> element, long recordTimestamp) {
                                return element.f3;
                            }
                        })
        );
        // 3. 合流并对数据进行协同处理并输出
        appStream.connect(thirdpartStream)
                .keyBy(data -> data.f0, data -> data.f0)
                .process(new OrderProcess())
                .print();
        // 4. 执行代码
        environment.execute();

    }

    public static class OrderProcess extends CoProcessFunction<Tuple3<String, String, Long>,
                                                               Tuple4<String, String, String, Long>,
                                                               String> {
        // 定义状态变量，用来保存已经到达的事件
        private ValueState<Tuple3<String, String, Long>> appEventState;
        private ValueState<Tuple4<String, String, String, Long>> thirdPartyEventState;

        @Override
        public void open(Configuration parameters) throws Exception {
            appEventState = getRuntimeContext().getState(new ValueStateDescriptor<Tuple3<String, String, Long>>
                    ("app-event", Types.TUPLE(Types.STRING, Types.STRING, Types.LONG))
            );
            thirdPartyEventState = getRuntimeContext().getState(
                    new ValueStateDescriptor<Tuple4<String, String, String, Long>>
                    ("thirdparty-event", Types.TUPLE(Types.STRING, Types.STRING, Types.STRING,Types.LONG))
            );
        }

        @Override
        public void processElement1(Tuple3<String, String, Long> appEvent,
                                    CoProcessFunction<Tuple3<String, String, Long>,
                                    Tuple4<String, String, String, Long>, String>.Context context,
                                    Collector<String> collector) throws Exception {
            if (thirdPartyEventState.value() != null){
                collector.collect(" 对账成功>>>" + appEvent + " || " + thirdPartyEventState.value());
                // 清空状态
                thirdPartyEventState.clear();
            } else {
                // 更新状态
                appEventState.update(appEvent);
                // 注册一个 5 秒后的定时器，开始等待另一条流的事件
                context.timerService().registerEventTimeTimer(appEvent.f2 + 5000L);
            }

        }

        @Override
        public void processElement2(Tuple4<String, String, String, Long> thirdPartyEvent,
                                    CoProcessFunction<Tuple3<String, String, Long>,
                                    Tuple4<String, String, String, Long>, String>.Context context,
                                    Collector<String> collector) throws Exception {
            if (appEventState.value() != null){
                collector.collect("对账成功>>>" + appEventState.value() + " || " + thirdPartyEvent);
                // 清空状态
                appEventState.clear();
            } else {
                // 更新状态
                thirdPartyEventState.update(thirdPartyEvent);
                // 注册一个 5 秒后的定时器，开始等待另一条流的事件
                context.timerService().registerEventTimeTimer(thirdPartyEvent.f3 + 5000L);
            }
        }

        @Override
        public void onTimer(long timestamp, CoProcessFunction<Tuple3<String, String, Long>, Tuple4<String, String, String, Long>, String>.OnTimerContext ctx, Collector<String> out) throws Exception {
            // 定时器触发，判断状态，如果某个状态不为空，说明另一条流中事件没来
            if (appEventState.value() != null) {
                out.collect("对账失败>>>" + appEventState.value() + " || 第三方支付平台信息未到");
            }
            if (thirdPartyEventState.value() != null) {
                out.collect("对账失败>>>" + thirdPartyEventState.value() + " || app信息未到");
            }
            appEventState.clear();
            thirdPartyEventState.clear();
        }
    }

}

Join – 基于时间的合流

使用connect()方法以及协同处理函数 CoProcessFunction 可以满足大多数的场景，其可以灵活的实现各种自定义的操作。但是处理函数是底层接口，其使用起来太过抽象，难以理解。

为了更方便的实现基于时间的合流操作，Flink 提供了内置的 join 算子以及 onGroup 算子。

窗口联结 Window Join

窗口联结的使用

窗口联结通用的调用形式如下：

stream1.join(stream2)
       .where(<KeySelector>)
       .equalTo(<KeySelector>)
       .window(<WindowAssigner>)
       .apply(<JoinFunction>)

首先调用join()方法合并两条数据流；
然后调用where()和equalTo()方法传入键选择器，分别指定两条流的键；
然后使用windos()进行窗口分配；

最后使用apply()方法，传入 JoinFunction 定义匹配数据的处理逻辑；

public interface JoinFunction<IN1, IN2, OUT> extends Function, Serializable {
     OUT join(IN1 first, IN2 second) throws Exception;
}

窗口联结的处理流程

在这里插入图片描述

数据到来后，首先会按照 key 进行分组，然后将分组后的数据放入对应的窗口中存储；
时间窗口到时关闭时，算子会统计出两条流对应数据的全部组合，即对两条流中的数据做笛卡尔积。然后把每一对匹配的数据传输到 JoinFunction 的join()方法进行处理；
每有一对数据匹配成功，join()方法就会被调用一次；

窗口联结示例代码

public class WindowJoinDemo {

    public static void main(String[] args) throws Exception {
        // 1。 环境准备
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
        environment.setParallelism(1)
                .setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
        // 2. 配置数据源及水位线
        DataStream<Tuple2<String, Long>> stream01 = environment
                .fromElements(
                        Tuple2.of("a", 1000L),
                        Tuple2.of("b", 1000L),
                        Tuple2.of("a", 2000L),
                        Tuple2.of("b", 2000L)
                )
                .assignTimestampsAndWatermarks(WatermarkStrategy
                    .<Tuple2<String, Long>>forMonotonousTimestamps()
                    .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                        @Override
                        public long extractTimestamp(Tuple2<String, Long> stringLongTuple2, long l) {
                            return stringLongTuple2.f1;
                        }
                    })
                );
        DataStream<Tuple2<String, Long>> stream02 = environment
                .fromElements(
                        Tuple2.of("a", 3000L),
                        Tuple2.of("b", 3000L),
                        Tuple2.of("a", 4000L),
                        Tuple2.of("b", 4000L)
                )
                .assignTimestampsAndWatermarks(WatermarkStrategy
                    .<Tuple2<String, Long>>forMonotonousTimestamps()
                    .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                        @Override
                        public long extractTimestamp(Tuple2<String, Long> stringLongTuple2, long l) {
                            return stringLongTuple2.f1;
                        }
                    })
                );
        // 3. 配置 join 窗口联结流并输出
        stream01.join(stream02)
                .where(data -> data.f0)
                .equalTo(data -> data.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(5L)))
                .apply(new JoinFunction<Tuple2<String, Long>, Tuple2<String, Long>, String>() {
                    @Override
                    public String join(Tuple2<String, Long> left, Tuple2<String, Long> right) throws Exception {
                        return left + ">>>" + right;
                    }
                })
                .print();
        // 4. 执行程序
        environment.execute();
    }

}

输出结果：

(a,1000)>>>(a,3000)
(a,1000)>>>(a,4000)
(a,2000)>>>(a,3000)
(a,2000)>>>(a,4000)
(b,1000)>>>(b,3000)
(b,1000)>>>(b,4000)
(b,2000)>>>(b,3000)
(b,2000)>>>(b,4000)

间隔联结 Interval Join

间隔连接时针对一条数据流的每一个点，在其时间的前后开辟出一段时间间隔，并与另一条流中落入该时间间隔的数据进行匹配。

窗口联结可以看作是多对多的关系，而间隔联结是一对多的关系。

间隔联结的原理

在这里插入图片描述

假定我们设置的下界为 -2，上界为 1，那么对于下方流来讲，数据 2 可匹配到上方流中 0 - 3 之间的数据，反映在图中即 2-0 以及 2-1 两条匹配记录。

因此，匹配的条件为：a.timestamp + lowerBound <= b.timestamp <= a.timestamp + upperBound

间隔联结的使用

间隔联结的通用调用形式如下：

stream1
 .keyBy(<KeySelector>)
 .intervalJoin(stream2.keyBy(<KeySelector>))
 .between(Time.milliseconds(-2), Time.milliseconds(1))
 .process(<ProcessJoinFunction>);

首先调用keyBy()方法对流进行分区；
然后调用intervalJoin()方法传入要合并的同样经过分区操作的流；
然后使用between()方法设置时间间隔的上下界；
最后调用process()并传入 ProcessJoinFunction 类实现数据的处理逻辑；

间隔联结示例代码

public class IntervalJoinDemo {

    public static void main(String[] args) throws Exception {
        // 1。 环境准备
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
        environment.setParallelism(1)
                .setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
        // 2. 配置数据源及水位线
        DataStream<Tuple2<String, Long>> stream01 = environment
                .fromElements(
                        Tuple2.of("a", 5000L),
                        Tuple2.of("b", 7000L)
                )
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        .<Tuple2<String, Long>>forMonotonousTimestamps()
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple2<String, Long> stringLongTuple2, long l) {
                                return stringLongTuple2.f1;
                            }
                        })
                );
        DataStream<Tuple2<String, Long>> stream02 = environment
                .fromElements(
                        Tuple2.of("a", 1000L),
                        Tuple2.of("b", 2000L),
                        Tuple2.of("a", 6000L),
                        Tuple2.of("b", 8000L)
                )
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        .<Tuple2<String, Long>>forMonotonousTimestamps()
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple2<String, Long> stringLongTuple2, long l) {
                                return stringLongTuple2.f1;
                            }
                        })
                );
        // 3. 配置 join 窗口联结流并输出
        stream01.keyBy(data -> data.f0)
                .intervalJoin(stream02.keyBy(data -> data.f0))
                .between(Time.seconds(-4), Time.seconds(1))
                .process(new ProcessJoinFunction<Tuple2<String, Long>, Tuple2<String, Long>, String>() {
                    @Override
                    public void processElement(Tuple2<String, Long> left, Tuple2<String, Long> right,
                                               ProcessJoinFunction<Tuple2<String, Long>, Tuple2<String, Long>, String>.Context context,
                                               Collector<String> collector) throws Exception {
                        collector.collect(left + ">>>" + right);
                    }
                })
                .print();
        // 4. 执行程序
        environment.execute();
    }

}

输出结果：

(a,5000)>>>(a,1000)
(a,5000)>>>(a,6000)
(b,7000)>>>(b,8000)

窗口同组联结 Window CoGroup

窗口同组联结的使用

窗口同组联结的通用调用如下：

stream1.coGroup(stream2)
 .where(<KeySelector>)
 .equalTo(<KeySelector>)
 .window(TumblingEventTimeWindows.of(Time.hours(1)))
 .apply(<CoGroupFunction>)

其基本使用方法与窗口联结相似，只不过将join()方法替换为coGroup()方法；同时apply()方法传入 CoGroupFunction 接口的实现类，其定义如下：

public interface CoGroupFunction<IN1, IN2, O> extends Function, Serializable {
 void coGroup(Iterable<IN1> first, Iterable<IN2> second, Collector<O> out) throws Exception;
}

窗口同组联结与窗口联结的不同在于，其将匹配到同意窗口的两条流中的数据分别以集合的方式传输至最后的处理函数，而不是笛卡尔积的形式。

窗口同组联结的示例代码

public class WindowCoGroupDemo {

    public static void main(String[] args) throws Exception {
        // 1。 环境准备
        StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
        environment.setParallelism(1)
                .setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);
        // 2. 配置数据源及水位线
        DataStream<Tuple2<String, Long>> stream01 = environment
                .fromElements(
                        Tuple2.of("a", 1000L),
                        Tuple2.of("b", 1000L),
                        Tuple2.of("a", 2000L),
                        Tuple2.of("b", 2000L)
                )
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        .<Tuple2<String, Long>>forMonotonousTimestamps()
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple2<String, Long> stringLongTuple2, long l) {
                                return stringLongTuple2.f1;
                            }
                        })
                );
        DataStream<Tuple2<String, Long>> stream02 = environment
                .fromElements(
                        Tuple2.of("a", 3000L),
                        Tuple2.of("b", 3000L),
                        Tuple2.of("a", 4000L),
                        Tuple2.of("b", 4000L)
                )
                .assignTimestampsAndWatermarks(WatermarkStrategy
                        .<Tuple2<String, Long>>forMonotonousTimestamps()
                        .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Long>>() {
                            @Override
                            public long extractTimestamp(Tuple2<String, Long> stringLongTuple2, long l) {
                                return stringLongTuple2.f1;
                            }
                        })
                );
        // 3. 配置 join 窗口联结流并输出
        stream01.coGroup(stream02)
                .where(data -> data.f0)
                .equalTo(data -> data.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(5L)))
                .apply(new CoGroupFunction<Tuple2<String, Long>, Tuple2<String, Long>, String>() {
                    @Override
                    public void coGroup(Iterable<Tuple2<String, Long>> iterable, Iterable<Tuple2<String, Long>> iterable1,
                                        Collector<String> collector) throws Exception {
                        collector.collect(iterable + ">>>" + iterable1);
                    }
                })
                .print();
        // 4. 执行程序
        environment.execute();

    }

}

输出结果：

[(a,1000), (a,2000)]>>>[(a,3000), (a,4000)]
[(b,1000), (b,2000)]>>>[(b,3000), (b,4000)]

情绪大瓜皮丶

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【基础】Flink -- Multistream Conversion

整理总结了 Flink 中关于多流转换的基本概念和使用方法
复制链接

扫一扫

专栏目录

【基础】Flink -- Multistream Conversion

Flink -- Multistream Conversion

多流转换概述

分流

简单实现

侧输出流

合流

Union

Connect

Join – 基于时间的合流

窗口联结 Window Join

间隔联结 Interval Join

窗口同组联结 Window CoGroup

“相关推荐”对你有帮助么？