Flink入门系列03-多流操作API

侧流输出

需求: 将行为事件流进行分流。A事件分到一个流,B事件分到另一个流,其他事件保留在主流。

SingleOutputStreamOperator<EventLog> processed = streamSource.process(new ProcessFunction<EventLog, EventLog>() {
    /**
     *
     * @param eventLog  输入数据
     * @param ctx 上下文,它能提供 侧输出 功能
     * @param out 主流输出收集器
     * @throws Exception
     */
    @Override
    public void processElement(EventLog eventLog, ProcessFunction<EventLog, EventLog>.Context ctx, Collector<EventLog> out) throws Exception {
        String eventId = eventLog.getEventId();
        if ("appLaunch".equals(eventId)) {
            ctx.output(new OutputTag<EventLog>("launch", TypeInformation.of(EventLog.class)), eventLog);
        } else if ("putBack".equals(eventId)) {
            ctx.output(new OutputTag<String>("back", TypeInformation.of(String.class)), JSON.toJSONString(eventLog));
        }
        out.collect(eventLog);
    }
});

// 获取 launch 测流数据
DataStream<EventLog> launchStream = processed.getSideOutput(new OutputTag<EventLog>("launch", TypeInformation.of(EventLog.class)));

// 获取 back 测流数据
DataStream<String> backStream = processed.getSideOutput(new OutputTag<String>("back", TypeInformation.of(String.class)));

launchStream.print("launch");

backStream.print("back");

processed.print("main");

双流 connect

ConnectedStreams<String, String> connectedStreams = stream1.connect(stream2);

SingleOutputStreamOperator<String> resultStream = connectedStreams.map(new CoMapFunction<String, String, String>() {
    // 共同的状态数据
    String prefix = "prefix_";

    //对 左流 处理的逻辑
    @Override
    public String map1(String value) throws Exception {
        // 把数字*10,再返回字符串
        return prefix + (Integer.parseInt(value) * 10) + "";
    }

     //对 右流 处理的逻辑
    @Override
    public String map2(String value) throws Exception {
        return prefix + value.toUpperCase();
    }
});
resultStream.print();

双流 union

参与 union 的流,必须数据类型一致

DataStream<String> unioned = stream1.union(stream2);
unioned.map(s -> "prefix_" + s).print();

双流 cogroup

可实现两个流的数据进行窗口关联(包含inner ,left, right, outer)

DataStream<String> resultStream = s1.coGroup(s2)
        .where(tp -> tp.f0)  // 左流的  f0 字段
        .equalTo(tp -> tp.f0)   // 右流的 f0 字段
        .window(TumblingProcessingTimeWindows.of(Time.seconds(10)))  // 划分窗口
        .apply(new CoGroupFunction<Tuple2<String, String>, Tuple3<String, String, String>, String>() {
            /**
             * @param first  是协同组中的第一个流的数据
             * @param second 是协同组中的第二个流的数据
             * @param out 是处理结果的输出器
             * @throws Exception
             */
            @Override
            public void coGroup(Iterable<Tuple2<String, String>> first, Iterable<Tuple3<String, String, String>> second, Collector<String> out) throws Exception {
                // 在这里实现  left out join
                for (Tuple2<String, String> t1 : first) {
                    boolean flag = false;
                    for (Tuple3<String, String, String> t2 : second) {
                        // 拼接两表字段输出
                        out.collect(t1.f0 + "," + t1.f1 + "," + t2.f0 + "," + t2.f1 + "," + t2.f2);
                        flag = true;
                    }
                    if (!flag) {
                        // 如果能走到这里面,说明右表没有数据,则直接输出左表数据
                        out.collect(t1.f0 + "," + t1.f1 + "," + null + "," + null + "," + null);
                    }
                }
                // TODO  实现  right out join
                // TODO  实现  full out join
                // TODO  实现  inner join
            }
        });
resultStream.print();

双流 join

只能得到关联上的数据,即 inner join,其他类型join需要使用 coGroup

DataStream<String> joinedStream = s1.join(s2)
    .where(tp2 -> tp2.f0)
    .equalTo(tp3 -> tp3.f0)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(20)))
    .apply(new JoinFunction<Tuple2<String, String>, Tuple3<String, String, String>, String>() {
        @Override
        public String join(Tuple2<String, String> t1, Tuple3<String, String, String> t2) throws Exception {
            return t1.f0 + "," + t1.f1 + "," + t2.f0 + "," + t2.f1 + "," + t2.f2;
        }
    });

joinedStream.print();

广播流

场景:事实表流数据 和 维度表流数据做关联,此时,一般把维度表流数据转换为广播流。

// 将字典数据所在流: s2 ,转成广播流
MapStateDescriptor<String, Tuple2<String, String>> userInfoStateDesc = new MapStateDescriptor<>("userInfoStateDesc", TypeInformation.of(String.class), TypeInformation.of(new TypeHint<Tuple2<String, String>>() {}));
BroadcastStream<Tuple3<String, String, String>> s2BroadcastStream = s2.broadcast(userInfoStateDesc);

// 哪个流处理中需要用到广播状态数据,就要 去  连接 connect  这个广播流
BroadcastConnectedStream<Tuple2<String, String>, Tuple3<String, String, String>> connected = s1.connect(s2BroadcastStream);

SingleOutputStreamOperator<String> resultStream = connected.process(new BroadcastProcessFunction<Tuple2<String, String>, Tuple3<String, String, String>, String>() {
    /**
     * 本方法,是用来处理 主流中的数据(每来一条,调用一次)
     * @param element  左流(主流)中的一条数据
     * @param ctx  上下文
     * @param out  输出器
     * @throws Exception
     */
    @Override
    public void processElement(Tuple2<String, String> element, BroadcastProcessFunction<Tuple2<String, String>, Tuple3<String, String, String>, String>.ReadOnlyContext ctx, Collector<String> out) throws Exception {
        // 通过 ReadOnlyContext ctx 取到的广播状态对象,是一个 “只读 ” 的对象;
        ReadOnlyBroadcastState<String, Tuple2<String, String>> broadcastState = ctx.getBroadcastState(userInfoState);
        if (broadcastState != null) {
            Tuple2<String, String> userInfo = broadcastState.get(element.f0);
            out.collect(element.f0 + "," + element.f1 + "," + (userInfo == null ? null : userInfo.f0) + "," + (userInfo == null ? null : userInfo.f1));
        } else {
            out.collect(element.f0 + "," + element.f1 + "," + null + "," + null);
        }
    }

    /**
     *
     * @param element  广播流中的一条数据
     * @param ctx  上下文
     * @param out 输出器
     * @throws Exception
     */
    @Override
    public void processBroadcastElement(Tuple3<String, String, String> element, BroadcastProcessFunction<Tuple2<String, String>, Tuple3<String, String, String>, String>.Context ctx, Collector<String> out) throws Exception {
        // 从上下文中,获取广播状态对象(可读可写的状态对象)
        BroadcastState<String, Tuple2<String, String>> broadcastState = ctx.getBroadcastState(userInfoState);
        // 然后将获得的这条  广播流数据, 拆分后,装入广播状态
        broadcastState.put(element.f0, Tuple2.of(element.f1, element.f2));
    }
});
resultStream.print();
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
上百节课视频详细讲解,需要的小伙伴自行百度网盘下载,链接见附件,永久有效。 共课程包含9个章节:Flink安装部署与快速入门Flink批处理APIFlink流处理APIFlink高级APIFlink-Table与SQL、Flink-Action综合练习、Flink-高级特性和新特性、Flink多语言开发、Flink性能调优 课程目录: Flink-day01 00-[了解]-课程介绍 01-[了解]-Flink概述 02-[掌握]-Flink安装部署-local本地模式 03-[掌握]-Flink安装部署-Standalone独立集群模式 04-[掌握]-Flink安装部署-Standalone-HA高可用集群模式 05-[重点]-Flink安装部署-On-Yarn-两种提交模式 06-[重点]-Flink安装部署-On-Yarn-两种提交模式-演示 07-[了解]-Flink入门案例-前置说明 08-[掌握]-Flink入门案例-环境准备 09-[掌握]-Flink入门案例-代码实现-1-DataSet 10-[掌握]-Flink入门案例-代码实现-2-DataStream流批一体-匿名内部类版 11-[掌握]-Flink入门案例-代码实现-2-DataStream流批一体-Lambda版 12-[掌握]-Flink入门案例-代码实现-2-DataStream流批一体-On-Yarn 13-[掌握]-Flink原理初探-角色分工-执行流程-DataFlow 14-[掌握]-Flink原理初探-TaskSlot和TaskSlotSharing 15-[掌握]-Flink原理初探-执行流程图生成 Flink-day02 01-[理解]-流处理核心概念说明 02-[掌握]-Source-基于集合 03-[掌握]-Source-基于文件 04-[掌握]-Source-基于Socket 05-[掌握]-Source-自定义Source-随机生成订单数据 06-[掌握]-Source-自定义Source-实时加载MySQL数据 07-[掌握]-Source-Transformation-基本操作 08-[掌握]-Source-Transformation-合并和连接 09-[掌握]-Source-Transformation-拆分和选择 10-[掌握]-Source-Transformation-重平衡分区 11-[掌握]-Source-Transformation-其他分区 12-[掌握]-Source-Sink-基于控制台和文件 13-[掌握]-Source-Sink-自定义Sink 14-[了解]-Connectors-JDBC 15-[重点]-Connectors-Flink整合Kafka-Source 16-[重点]-Connectors-Flink整合Kafka-Sink-实时ETL 17-[了解]-Connectors-Redis Flink-day03 01-[了解]-Flink高级API-四大基石介绍 02-[了解]-Flink高级API-Window-分类和API介绍 03-[掌握]-Flink高级API-Window-基于时间的滑动和滚动窗口 04-[了解]-Flink高级API-Window-基于数量的滑动和滚动窗口 05-[了解]-Flink高级API-Window-会话窗口 06-[理解]-Flink高级API-Time-时间分类和事件时间的重要性及Watermaker的引入 07-[理解]-Flink高级API-Time-Watermaker概念详解 08-[理解]-Flink高级API-Time-Watermaker图解 09-[掌握]-Flink高级API-Time-Watermaker-代码演示 10-[了解]-Flink高级API-Time-Watermaker-代码演示-理论验证 11-[掌握]-Flink高级API-Time-Watermaker-outputTag-allowedlateness解决数据丢失 12-[了解]-Flink高级API-State-Flink中状态的自动管理 13-[了解]-Flink高级API-State-有状态计算和无状态计算 14-[了解]-Flink高级API-State-状态分类 15-[了解]-Flink高级API-State-keyState代码演示 16-[了解]-Flink高级API-State-OperatorState代码演示 Flink-day04-07等等

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值