Flink API - ProcessFunction

在这里插入图片描述
Flink 本身提供了多层 API ,具体结构层次如下:
在这里插入图片描述
过程函数(ProcessFunction)是 Flink 的最底层 API,它不定义任何的操作算子,仅仅通过统一的 process 操作。在处理函数中,使用者直面数据流中最基本的元素:数据事件(event)、状态(state)以及时间(time)。
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/process_function/
在这里插入图片描述
Flink 提供了 8 个不同的处理函数:

  1. ProcessFunction 最基本的处理函数,基于 DataStream 直接调用 process() 时作为参数传
  2. KeyedProcessFunction 对流按键分区后的处理函数,基于 KeyedStream 调用 process() 时作为参数传入。要想使用定时器、状态就必须是基于 KeyedStream。
  3. ProcessWindowFunction 开窗之后的处理函数,也是全窗口函数的代表。基于 WindowedStream 调用 process() 时作为参数传入。
  4. ProcessAllWindowFunction 同样是开窗之后的处理函数,基于 AllWindowedStream 调用 process() 时作为参数传入。
  5. CoProcessFunction 合并(connect)两条流之后的处理函数。
  6. ProcessJoinFunction 间隔连接(interval join)两条流之后的处理函数。
  7. BroadcastProcessFunction 广播连接流处理函数,这里的“广播连接流”BroadcastConnectedStream,是一个未 keyBy 的普通 DataStream 与一个广播流(BroadcastStream)做连接(conncet)之后的产物。
  8. KeyedBroadcastProcessFunction 按键分区的广播连接流处理函数,这时的广播连接流,是一个KeyedStream 与广播流(BroadcastStream)做连接之后的结果。

ProcessFunction

针对没有 keyBy 的数据流,可以使用 ProcessFunction 接口,针对流中的每个元素输出 0 个、1 个或者多个元素。(与类似 RichFlatMapFunction)
● ProcessFunction<IN, OUT>:IN是输入的泛型,OUT是输出的泛型
● processElement:每来一条数据,调用一次
● 使用.process(new ProcessFunction<I, O>)来调用。
ProcessFunction 函数虽然在没有分组的情况下也可以获取 timer(定时器)和 state(状态)但是在编译是会报 Keyed state can only be used on a ‘keyed stream’, i.e., after a ‘keyBy()’ operation 异常

        DataStreamSource<Integer> sourceStream = env
                .addSource(new SourceFunction<Integer>() {
                    private boolean running = true;
                    private Random random = new Random();

                    @Override
                    public void run(SourceContext<Integer> ctx) throws Exception {
                        while (running) {
                            ctx.collect(random.nextInt(100));
                            Thread.sleep(1000);
                        }
                    }

                    @Override
                    public void cancel() {
                        running = false;
                    }
                });
        
        sourceStream
                .process(new ProcessFunction<Integer, Integer>() {
                    @Override
                    public void processElement(Integer value, Context ctx, Collector<Integer> out) throws Exception {
                        if (value % 10 == 1){
                            out.collect(value);
                            out.collect(value);
                        }
                    }
                })
                .print();

# 异常演示
public class ProcessFunctions {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);

        // ProcessFunction 不能使用 状态变量 不能使用 onTimer 编译会出错

        // 求平均值  实现 map - reduce

        DataStreamSource<Integer> sourceStream = env
                .addSource(new SourceFunction<Integer>() {
                    private boolean running = true;
                    private Random random = new Random();

                    @Override
                    public void run(SourceContext<Integer> ctx) throws Exception {
                        while (running) {
                            ctx.collect(random.nextInt(100));
                            Thread.sleep(1000);
                        }
                    }

                    @Override
                    public void cancel() {
                        running = false;
                    }
                });
        
        // ProcessFunction 不能使用 状态变量 不能使用 onTimer 编译会出错
        sourceStream
                .process(new ProcessFunction<Integer, String>() {

                    private ValueState<Tuple2<Integer, Integer>> avgState;

                    @Override
                    public void open(Configuration parameters) throws Exception {
                        super.open(parameters);
                        avgState = getRuntimeContext().getState(new ValueStateDescriptor<Tuple2<Integer, Integer>>("avg-state", Types.TUPLE(Types.INT, Types.INT)));
                    }

                    @Override
                    public void processElement(Integer value, Context ctx, Collector<String> out) throws Exception {

                        if (avgState.value() == null) {
                            avgState.update(Tuple2.of(value, 1));
                        } else {
                            avgState.update(Tuple2.of(avgState.value().f0 + value, avgState.value().f1 + 1));
                        }

                        ctx.timerService().registerProcessingTimeTimer(ctx.timerService().currentProcessingTime() + 10L);

                    }

                    @Override
                    public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
                        super.onTimer(timestamp, ctx, out);

                        ctx.timerService().registerProcessingTimeTimer(timestamp + 10L);

                        out.collect("avg = " + ((double) avgState.value().f1 / avgState.value().f0));

                    }
                })
                .print("avg:");


        env.execute();

    }

}

KeyedProcessFunction

针对 keyBy 之后的键控流(KeyedStream),可以使用 KeyedProcessFunction
● KeyedProcessFunction<KEY, IN, OUT>:KEY是key的泛型,IN是输入的泛型,OUT是输出的泛型。
● processElement:来一条数据,触发调用一次。
● onTimer:定时器。时间到达某一个时间戳触发调用。


public class NumValueDataContinuousRiseKeyedProcessFunction {

    public static void main(String[] args) throws Exception {

        // 整数连续 1s 上升报警

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);

        DataStreamSource<Integer> randomNumSourceStream = env
                .addSource(new SourceFunction<Integer>() {

                    private boolean running = true;
                    private Random random = new Random();

                    @Override
                    public void run(SourceContext<Integer> ctx) throws Exception {
                        while (running) {
                            ctx.collect(random.nextInt(100000));

                            try {
                                TimeUnit.MILLISECONDS.sleep(200);
                            } catch (InterruptedException e) {
                                e.printStackTrace();
                            }
                        }
                    }

                    @Override
                    public void cancel() {
                        running = false;
                        random = null;
                    }
                });

        randomNumSourceStream
                // 将所有数据分到一个 slot
                .keyBy(r -> 1)
                // KEY、INPUT、OUTPUT
                .process(new KeyedProcessFunction<Integer, Integer, String>() {

                    // 保存上一条数据
                    private ValueState<Integer> lastValueState;

                    // 保存定时器
                    private ValueState<Long> timerTs;

                    private final Long oneS = 1000L;

                    @Override
                    public void open(Configuration parameters) throws Exception {
                        lastValueState = getRuntimeContext().getState(new ValueStateDescriptor<Integer>("lastValueState", Types.INT));
                        timerTs = getRuntimeContext().getState(new ValueStateDescriptor<Long>("timerTs", Types.LONG));
                    }

                    @Override
                    public void processElement(Integer value, Context ctx, Collector<String> out) throws Exception {

                        // =====================
                        Integer lastVal = lastValueState.value() == null ? Integer.MIN_VALUE : lastValueState.value();

                        // 跟新上一个元素
                        lastValueState.update(value);

                        Long ts = null;
                        if (timerTs.value() != null){
                            ts = timerTs.value();
                        }

                        if (lastVal >= value){
                            if (ts != null){
                                System.out.println("当前元素小与上一条元素,curr = " + value + ", last = " + lastVal + ", 删除定时器 = " + ts);
                                ctx.timerService().deleteProcessingTimeTimer(ts);
                                timerTs.clear();
                            }
                        }else {
                            System.out.println("当前元素大于上一条元素 -->>>,curr = " + value + ", last = " + lastVal);
                            if (ts == null){
                                long timer = ctx.timerService().currentProcessingTime() + oneS;
                                System.out.println("当前元素大于上一条,并且是第一条元素,curr = " + value + ", last = " + lastVal + ", 注册定时器 = " + timer);
                                ctx.timerService().registerProcessingTimeTimer(timer);
                                timerTs.update(timer);
                            }

                        }

                    }

                    @Override
                    public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
                        out.collect("整数连续一秒上升");
                        timerTs.clear();
                    }
                })
                .print();


        env.execute();


    }

}

ProcessWindowFunction

针对 WindowedStream 之后的数据流,可以使用 ProcessWindowFunction
● ProcessWindowFunction<IN, OUT, KEY, W extends Window>。
● processElement:来一条数据,触发调用一次。
● onTimer:定时器。时间到达某一个时间戳触发调用。


public class WinElemsCountProcessWindowFunction {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);

        SingleOutputStreamOperator<Tuple2<String, Integer>> withWatermarkStream = env
                .fromCollection(
                        Arrays.asList(
                                Tuple2.of("Alick", 1),
                                Tuple2.of("Alick", 2),
                                Tuple2.of("Alick", 3),
                                Tuple2.of("BOb", 3),
                                Tuple2.of("BOb", 5),
                                Tuple2.of("BOb", 7),
                                Tuple2.of("BOb", 10),
                                Tuple2.of("Alick", 7)
                        )
                )
                .assignTimestampsAndWatermarks(
                        new WatermarkStrategy<Tuple2<String, Integer>>() {
                            @Override
                            public WatermarkGenerator<Tuple2<String, Integer>> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
                                return new WatermarkGenerator<Tuple2<String, Integer>>() {

                                    private Long delay = 0L;
                                    private Long watermark = -Long.MAX_VALUE + delay + 1L;

                                    @Override
                                    public void onEvent(Tuple2<String, Integer> event, long eventTimestamp, WatermarkOutput output) {
                                        watermark = Math.max(event.f1,watermark);
                                    }

                                    @Override
                                    public void onPeriodicEmit(WatermarkOutput output) {
                                        output.emitWatermark(new Watermark(watermark - delay - 1L));
                                    }
                                };
                            }

                            @Override
                            public TimestampAssigner<Tuple2<String, Integer>> createTimestampAssigner(TimestampAssignerSupplier.Context context) {
                                return new TimestampAssigner<Tuple2<String, Integer>>() {
                                    @Override
                                    public long extractTimestamp(Tuple2<String, Integer> element, long recordTimestamp) {
                                        return element.f1 * 1000L;
                                    }
                                };
                            }
                        }
                );
//                .assignTimestampsAndWatermarks(
//                        WatermarkStrategy
//                                .<Tuple2<String, Integer>>forBoundedOutOfOrderness(Duration.ofSeconds(0L))
//                                .withTimestampAssigner(new SerializableTimestampAssigner<Tuple2<String, Integer>>() {
//                                    @Override
//                                    public long extractTimestamp(Tuple2<String, Integer> element, long recordTimestamp) {
//                                        return element.f1 * 1000L;
//                                    }
//                                })
//                );


        KeyedStream<Tuple2<String, Integer>, String> keyedStream = withWatermarkStream
                .keyBy(r -> r.f0);

        WindowedStream<Tuple2<String, Integer>, String, TimeWindow> winStream = keyedStream
                .window(TumblingEventTimeWindows.of(Time.seconds(5)));

        SingleOutputStreamOperator<String> countWinElemsStream = winStream
                .process(
                        new ProcessWindowFunction<Tuple2<String, Integer>, String, String, TimeWindow>() {
                            @Override
                            public void process(String key, Context context, Iterable<Tuple2<String, Integer>> elements, Collector<String> out) throws Exception {

                                Timestamp winStart = new Timestamp(context.window().getStart());
                                Timestamp winEnd = new Timestamp(context.window().getEnd());

                                long count = elements.spliterator().getExactSizeIfKnown();

                                out.collect("key = " + key + " , win [ " + winStart + " - " + winEnd + " ) 有 " + count + " 个元素");

                            }
                        }
                );

        countWinElemsStream.print();


        env.execute();


    }

}

ProcessAllWindowFunction

针对 AllWindowedStream 之后的数据流,可以使用 ProcessAllWindowFunction
● ProcessAllWindowFunction<IN, OUT, W extends Window>
● process(Context context, Iterable elements, Collector out):窗口触发时执行


public class WinElemsCountProcessAllWindowFunction {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        env.setParallelism(1);


        SingleOutputStreamOperator<Event> withWatermarkStream = env
                .addSource(new ClickSource())
                .assignTimestampsAndWatermarks(
                        new WatermarkStrategy<Event>() {
                            @Override
                            public WatermarkGenerator<Event> createWatermarkGenerator(WatermarkGeneratorSupplier.Context context) {
                                return new WatermarkGenerator<Event>() {

                                    private long delay = 0L;

                                    private long maxWatermark = -Long.MAX_VALUE + delay + 1L;

                                    @Override
                                    public void onEvent(Event event, long eventTimestamp, WatermarkOutput output) {
                                        maxWatermark = Math.max(event.timestamp, maxWatermark);
                                    }

                                    @Override
                                    public void onPeriodicEmit(WatermarkOutput output) {
                                        output.emitWatermark(new Watermark(maxWatermark - delay - 1L));
                                    }
                                };

                            }

                            @Override
                            public TimestampAssigner<Event> createTimestampAssigner(TimestampAssignerSupplier.Context context) {
                                return new TimestampAssigner<Event>() {
                                    @Override
                                    public long extractTimestamp(Event element, long recordTimestamp) {
                                        return element.timestamp;
                                    }
                                };
                            }
                        }
                );

        AllWindowedStream<Event, GlobalWindow> allWindowedStream = withWatermarkStream
                .windowAll(GlobalWindows.create());

        SingleOutputStreamOperator<String> countAlWinElemsStream = allWindowedStream
                .trigger(
                        // 每 5 s 触发一次窗口计算
                        new Trigger<Event, GlobalWindow>() {

                            public final Long tenS = 5_000L;

                            @Override
                            public TriggerResult onElement(Event element, long timestamp, GlobalWindow window, TriggerContext ctx) throws Exception {

                                ValueState<Long> tenSTriggerState = ctx.getPartitionedState(new ValueStateDescriptor<Long>("tenSTriggerState", Types.LONG));

                                if (tenSTriggerState.value() == null) {
                                    long timer = ctx.getCurrentWatermark() + tenS;
                                    ctx.registerEventTimeTimer(timer);
                                    tenSTriggerState.update(timer);
                                }

                                return TriggerResult.CONTINUE;
                            }

                            @Override
                            public TriggerResult onProcessingTime(long time, GlobalWindow window, TriggerContext ctx) throws Exception {
                                return TriggerResult.CONTINUE;
                            }

                            @Override
                            public TriggerResult onEventTime(long time, GlobalWindow window, TriggerContext ctx) throws Exception {
                                ValueState<Long> tenSTriggerState = ctx.getPartitionedState(new ValueStateDescriptor<Long>("tenSTriggerState", Types.LONG));

                                tenSTriggerState.clear();

                                return TriggerResult.FIRE;
                            }

                            @Override
                            public void clear(GlobalWindow window, TriggerContext ctx) throws Exception {
                                System.out.println("=============================");
                            }
                        }
                )
                .process(
                        new ProcessAllWindowFunction<Event, String, GlobalWindow>() {
                            @Override
                            public void process(Context context, Iterable<Event> elements, Collector<String> out) throws Exception {

                                out.collect("窗口中有 " + elements.spliterator().getExactSizeIfKnown() + " 条元素");

                            }
                        }
                );

        countAlWinElemsStream.print();


        env.execute();


    }

}

CoProcessFunction、ProcessJoinFunction、BroadcastProcessFunction、KeyedBroadcastProcessFunction API 在多流操作是详细介绍
ProcessFunction 原理将在 基于时间和窗口 做详细介绍

参考资料
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/process_function/

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值