Flink processFunction API

最新推荐文章于 2024-06-09 16:58:43 发布

老鼠扛刀满街找猫@

最新推荐文章于 2024-06-09 16:58:43 发布

阅读量123

点赞数 1

分类专栏： flink

本文链接：https://blog.csdn.net/qq_27242695/article/details/118338331

版权

flink 专栏收录该内容

28 篇文章 1 订阅

订阅专栏

文章目录

Flink processFunction API

Flink processFunction API

Flink 提供了 8 个 Process Function： • ProcessFunction

KeyedProcessFunction
CoProcessFunction
ProcessJoinFunction
BroadcastProcessFunction
KeyedBroadcastProcessFunction
ProcessWindowFunction
ProcessAllWindowFunction

1 KeyedProcessFunction

KeyedProcessFunction 用来操作 KeyedStream。KeyedProcessFunction 会处理流的每一个元素，输出为 0 个、1 个或者多个元素。所有的 Process Function 都继承自RichFunction 接口，所以都有 open()、close()和 getRuntimeContext()等方法。而KeyedProcessFunction<K, I, O>还额外提供了两个方法:

processElement(I value, Context ctx, Collector out), 流中的每一个元素都会调用这个方法，调用结果将会放在 Collector 数据类型中输出。Context 可以访问元素的时间戳，元素的 key，以及TimerService 时间服务。Context 还可以将结果输出到别的流(side outputs)。
onTimer(long timestamp, OnTimerContext ctx, Collector out) 是一个回调函数。当之前注册的定时器触发时调用。参数 timestamp 为定时器所设定的触发的时间戳。Collector 为输出结果的集合。OnTimerContext 和processElement 的 Context 参数一样，提供了上下文的一些信息，例如定时器触发的时间信息(事件时间或者处理时间)。

2 TimerService 和定时器（Timers）

Context 和 OnTimerContext 所持有的 TimerService 对象拥有以下方法:

long currentProcessingTime()
返回当前处理时间
long currentWatermark()
返回当前 watermark 的时间戳
void registerProcessingTimeTimer(long timestamp)
会注册当前 key 的processing time 的定时器。当 processing time 到达定时时间时，触发 timer。
void registerEventTimeTimer(long timestamp)
会注册当前 key 的 event time 定时器。当水位线大于等于定时器注册的时间时，触发定时器执行回调函数。
void deleteProcessingTimeTimer(long timestamp)
删除之前注册处理时间定时器。如果没有这个时间戳的定时器，则不执行。
void deleteEventTimeTimer(long timestamp)
删除之前注册的事件时间定时器，如果没有此时间戳的定时器，则不执行。当定时器 timer 触发时，会执行回调函数 onTimer()。注意定时器 timer 只能在keyed streams 上面使用。

3. 侧输出流（SideOutput）

大部分的 DataStream API 的算子的输出是单一输出，也就是某种数据类型的流。除了 split 算子，可以将一条流分成多条流，这些流的数据类型也都相同。process function 的 side outputs 功能可以产生多条流，并且这些流的数据类型可以不一样。一个 side output 可以定义为 OutputTag[X]对象，X 是输出流的数据类型。process function 可以通过 Context 对象发射一个事件到一个或者多个 side outputs。

4. 测试

code

public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 设置事件时间
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        Properties consumerProperties = new Properties();
        consumerProperties.setProperty("bootstrap.servers", "127.0.0.1:9092");
        consumerProperties.setProperty("group.id", "test");


        FlinkKafkaConsumer<String> kafkaConsumer = new FlinkKafkaConsumer<>(
                "test",
                new SimpleStringSchema(),
                consumerProperties);

        DataStream<String> dataSource = env.addSource(kafkaConsumer);

        SingleOutputStreamOperator<Integer> dataStream = dataSource.map(new MyMapFunction()) // 类型转换
                .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<Sensor>(Time.seconds(1)) { // 设置watermark
                    @Override
                    public long extractTimestamp(Sensor sensor) {
                        return sensor.getEventTime(); // 设置事件时间
                    }
                }).keyBy(Sensor::getKey).process(new MyKeyProcessFunction());

        // 获取测输出流并输出
        dataStream.getSideOutput(MyKeyProcessFunction.tempTag).print();

        env.execute();

    }


    public static class MyMapFunction implements MapFunction<String, Sensor> {

        public Sensor map(String s) throws Exception {
            String[] split = s.split(",");
            Sensor sensor = new Sensor();
            sensor.setKey(split[0]);
            sensor.setEventTime(Long.valueOf(split[1]));
            sensor.setVal(Integer.valueOf(split[2]));
            return sensor;
        }
    }


    public static class MyKeyProcessFunction extends KeyedProcessFunction<String, Sensor, Integer> {

        // 测输出流
        static final OutputTag<String> tempTag = new OutputTag<String>("tempTag");

        // 存储最大上报的值
        private ValueState<Sensor> maxState;
        // 存储定时任务时间
        private ValueState<Long> timeState;

        @Override
        public void open(Configuration parameters) throws Exception {
            super.open(parameters);
            maxState = getRuntimeContext().getState(new
                    ValueStateDescriptor<>("last", Sensor.class));
            timeState = getRuntimeContext().getState(new
                    ValueStateDescriptor<>("time", Long.class));
        }

        @Override
        public void processElement(Sensor sensor, Context context, Collector<Integer> collector) throws Exception {

            if (maxState.value() == null || maxState.value().getVal() < sensor.getVal()) {
                /*long timeTx = context.timerService().currentProcessingTime() + 5L;
                context.timerService().registerEventTimeTimer(sensor.getEventTime() + 5000L);*/
                maxState.update(sensor);
                if(timeState.value() == null) {
                    timeState.update(sensor.getEventTime() + 5000L);
                    context.timerService().registerEventTimeTimer(timeState.value());
                }
            } else {
                context.timerService().deleteEventTimeTimer(timeState.value());
                maxState.clear();
                timeState.clear();
            }

        }


        // 定时任务触发器
        @Override
        public void onTimer(long timestamp, OnTimerContext ctx, Collector<Integer> out) throws Exception {
            super.onTimer(timestamp, ctx, out);
            System.out.println("上报值5s内连续上升:" + maxState.value());
            ctx.output(tempTag, maxState.value().toString());
            maxState.clear();
            timeState.clear();
        }
    }

结果

kafka
console输出

5 Flink 的 Keyed Stat

Flink 的 Keyed State 支持以下数据类型：

ValueState保存单个的值，值的类型为 T。

get 操作: ValueState.value()
set 操作: ValueState.update(T value)

ListState保存一个列表，列表里的元素的数据类型为 T。
基本操作如下：

ListState.add(T value)
ListState.addAll(List values)
ListState.get()返回 Iterable
ListState.update(List values)

MapState<K, V>保存 Key-Value 对。

MapState.get(UK key)
MapState.put(UK key, UV value)
MapState.contains(UK key)
MapState.remove(UK key)

ReducingState
AggregatingState<I, O>

5.1 案例-ReducingState

        .process(new KeyedProcessFunction<String, WaterSensor, String>() {
            private ReducingState<Integer> state;

            @Override
            public void open(Configuration parameters) throws Exception {
                //初始化状态
                state = getRuntimeContext()
                        .getReducingState(new ReducingStateDescriptor<Integer>("state",
                                        new ReduceFunction<Integer>() {
                                            @Override
                                            public Integer reduce(Integer value1, Integer value2) throws Exception {
                                                return value1 + value2;
                                            }
                                        },
                                        Integer.class));
            }
            
            @Override
            public void processElement(WaterSensor value, Context ctx, Collector<String> out) throws Exception {
                state.add(value.getVc());
                out.collect(value.getId() + ":" + state.get());
            }
        })
        .print();

5.2 案例-AggregatingState

.process(new KeyedProcessFunction<String, WaterSensor, Double>() {

    private AggregatingState<Integer, Double> avgState;

    @Override
    public void open(Configuration parameters) throws Exception {
        AggregatingStateDescriptor<Integer, Tuple2<Integer, Integer>, Double> aggregatingStateDescriptor = new AggregatingStateDescriptor<>("avgState", new AggregateFunction<Integer, Tuple2<Integer, Integer>, Double>() {
            @Override
            public Tuple2<Integer, Integer> createAccumulator() {
                return Tuple2.of(0, 0);
            }

            @Override
            public Tuple2<Integer, Integer> add(Integer value, Tuple2<Integer, Integer> accumulator) {
                return Tuple2.of(accumulator.f0 + value, accumulator.f1 + 1);
            }

            @Override
            public Double getResult(Tuple2<Integer, Integer> accumulator) {
                return accumulator.f0 * 1D / accumulator.f1;
            }

            @Override
            public Tuple2<Integer, Integer> merge(Tuple2<Integer, Integer> a, Tuple2<Integer, Integer> b) {
                return Tuple2.of(a.f0 + b.f0, a.f1 + b.f1);
            }
        }, Types.TUPLE(Types.INT, Types.INT));
        avgState = getRuntimeContext().getAggregatingState(aggregatingStateDescriptor);
    }

    @Override
    public void processElement(WaterSensor value, Context ctx, Collector<Double> out) throws Exception {
        avgState.add(value.getVc());
        out.collect(avgState.get());
    }
})

老鼠扛刀满街找猫@

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Flink processFunction API

文章目录Flink processFunction API1 KeyedProcessFunction2 TimerService 和定时器（Timers）Flink processFunction API1 KeyedProcessFunctionKeyedProcessFunction 用来操作 KeyedStream。KeyedProcessFunction 会处理流的每一个元素，输出为 0 个、1 个或者多个元素。所有的 Process Function 都继承自RichFunction 接
复制链接

扫一扫