Flink Window窗口机制

Flink Window窗口机制

Window是无限数据流处理的核心,Window将一个无限的stream拆分成有限大小的”buckets”桶,我们可以在这些桶上做计算操作。本文主要聚焦于在Flink中如何进行窗口操作,以及程序员如何从window提供的功能中获得最大的收益。
  窗口化的Flink程序的一般结构如下,第一个代码段中是分组的流,而第二段是非分组的流。正如我们所见,唯一的区别是分组的stream调用keyBy(…)和window(…),而非分组的stream中window()换成了windowAll(…),这些也将贯穿都这一页的其他部分中。

Demo 1

利用countwindow实现两条数据之间的求和功能:
样例类依然使用上一篇所写的SensorReading

public class Window1 {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//        String filePath="D:\\Project\\FlinkStu\\resources\\sensor.txt";
//        DataStreamSource<String> inputStream = env.readTextFile(filePath);
        Properties prop = new Properties();
        prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.146.222:9092");
        prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"sensor_group1");
        prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
        DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>("sensor", new SimpleStringSchema(), prop));

        SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
            @Override
            public SensorReading map(String s) throws Exception {
                String[] split = s.split(",");
                return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
            }
        });

        SingleOutputStreamOperator<SensorReading> resultMaxStream
                = mapStream.keyBy("id")
//                .timeWindow(Time.seconds(15));
//                .timeWindow(Time.seconds(15),Time.seconds(15));
//                .countWindow(6);
//                .countWindow(6,2);
//                .window(EventTimeSessionWindows.withGap(Time.seconds(15)));
//                .window(TumblingEventTimeWindows.of(Time.seconds(5)));   //事件内包含时间的处理窗口
//                .window(TumblingProcessingTimeWindows.of(Time.seconds(5)));   //系统处理时间的窗口
//                .timeWindow(Time.seconds(15)).max("temperature");
//                .max("temperature");
                .countWindow(6,2)
                .reduce(new ReduceFunction<SensorReading>() {
                    @Override
                    public SensorReading reduce(SensorReading sensorReading, SensorReading t1) throws Exception {
                        return new SensorReading(sensorReading.getId(),
                                sensorReading.getTimestamp(),
                                sensorReading.getTemperature()+t1.getTemperature()
                                );
                    }
                });

        resultMaxStream.print("max");
        env.execute("flinkwindow");
    }
}

运行截图如下:
在这里插入图片描述

Demo 2

实现求平均值的功能

public class window2 {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        Properties prop = new Properties();
        prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.146.222:9092");
        prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"group_id_2");
        prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");

        DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>("sensor", new SimpleStringSchema(), prop));
        SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
            @Override
            public SensorReading map(String s) throws Exception {
                String[] split = s.split(",");
                return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
            }
        });

        SingleOutputStreamOperator<Double> resultAvgStream = mapStream.keyBy("id")
                .countWindow(6, 2).aggregate(new AvgFunction());
        resultAvgStream.print("avg");

        env.execute("avgwindow");

    }
    private static class AvgFunction implements AggregateFunction<SensorReading, Tuple2<Double,Integer>,Double> {
		//初始化
        @Override
        public Tuple2<Double, Integer> createAccumulator() {
            return new Tuple2<>(0.0,0);
        }
		//统计传入数据的 总和 以及 个数
        @Override
        public Tuple2<Double, Integer> add(SensorReading sensorReading, Tuple2<Double, Integer> doubleIntegerTuple2) {
            double temp = sensorReading.getTemperature() + doubleIntegerTuple2.f0;
            int count = doubleIntegerTuple2.f1 + 1;
            return new Tuple2<>(temp,count);
        }
		//根据统计的总和  以及 个数  计算出平均数
        @Override
        public Double getResult(Tuple2<Double, Integer> doubleIntegerTuple2) {
            double resultAvg = doubleIntegerTuple2.f0 / doubleIntegerTuple2.f1;
            return resultAvg;
        }
		//组间合并
        @Override
        public Tuple2<Double, Integer> merge(Tuple2<Double, Integer> doubleIntegerTuple2, Tuple2<Double, Integer> acc1) {
            double tempSum = doubleIntegerTuple2.f0 + acc1.f0;
            int countSum = doubleIntegerTuple2.f1 + acc1.f1;
            return new Tuple2<>(tempSum,countSum);
        }
    }
}


运行如下所示:
在这里插入图片描述

Demo 3

代码如下:

public class window3 {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 设定时间语义  事件时间 TimeCharacteristic.EventTime
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        Properties prop = new Properties();
        prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.146.222:9092");
        prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"sensor_group1");
        prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");

        DataStreamSource<String> inputStream =
                env.addSource(new FlinkKafkaConsumer011<String>(
                        "sensor",
                        new SimpleStringSchema(),
                        prop)
                );

        SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
            @Override
            public SensorReading map(String s) throws Exception {
                String[] split = s.split(",");
                return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
            }
        })
                .assignTimestampsAndWatermarks(  // 处理乱序时间
                        new BoundedOutOfOrdernessTimestampExtractor<SensorReading>(Time.seconds(0)) {
                            @Override
                            public long extractTimestamp(SensorReading sensorReading) {
                                return sensorReading.getTimestamp()*1000L;
                            }
                        });

        OutputTag<SensorReading> outputTag = new OutputTag<SensorReading>("late") {
        };

        SingleOutputStreamOperator<SensorReading> maxResultStream =
                mapStream.keyBy("id")
//                .timeWindow(Time.seconds(15))  //, Time.seconds(2) ,Time.seconds(5)
                        .window(TumblingEventTimeWindows.of(Time.seconds(15),Time.seconds(1)))
                        .allowedLateness(Time.seconds(30))
                        .sideOutputLateData(outputTag)
                        .max("temperature");

        maxResultStream.print("max");
        DataStream<SensorReading> sideOutput = maxResultStream.getSideOutput(outputTag);
        sideOutput.print("late");


        env.execute("flinkwindow");

    }
}

运行结果如图:
事件时间为15秒,延迟30秒,那么只要在45秒以内出现 1~15秒的最大值,都是会更新为当前最大值,例如:
第一秒出现数据:35.5
第十秒出现数据:36.6 (此时最大值为36.6)
第二十秒出现数据:37.7 (该数据属于第二个15秒的窗口)
然后再次输入 第八秒的数据:39.9
那么第一个窗口的最大值会对应变动为 39.9,直到出现时间小于45秒(不包含45秒),都可以往1~15秒这个区间新增最大值,除非后续出现时间大于等于45秒,那么第一个窗口将关闭,如果依然往 1-15秒这个区别输入数据,那么该数据将被sideOutput 作为迟到数据收集起来等待下一步处理(防止数据丢失)
在这里插入图片描述

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值