flink窗口函数

概述

streaming流式计算是一种被设计用于处理无限数据集的数据处理引擎,而无限数据集是指一种不断增长的本质上无限的数据集,而window是一种切割无限数据为有限块进行处理的手段。
​ Window是无限数据流处理的核心,Window将一个无限的stream拆分成有限大小的”buckets”桶,我们可以在这些桶上做计算操作。

flink中的时间种类有哪些

Flink中的时间与现实世界中的时间是不一致的,在flink中被划分为事件时间,摄入时间,处理时间三种。
1、如果以EventTime为基准来定义时间窗口将形成EventTimeWindow,要求消息本身就应该携带EventTime
2、如果以IngesingtTime为基准来定义时间窗口将形成IngestingTimeWindow,以source的systemTime为准。
3、如果以ProcessingTime基准来定义时间窗口将形成ProcessingTimeWindow,以operator的systemTime为准。

window类型

时间窗口:按照时间生成window
滚动时间窗口
滑动时间窗口
会话窗口
计数窗口:按照指定的数据条数生成一个window,与时间无关
滚动计数窗口
滑动计数窗口

1、滚动窗口(tumbling windows)
依据固定的窗口长度对数据进行切分
时间对齐,窗口长度固定,没有重叠
在这里插入图片描述

2、滑动窗口(sliding windows)
可以按照固定的长度向后滑动固定的距离
滑动窗口由固定的窗口长度和滑动间隔组成
可以有重叠(是否重叠和滑动距离有关系)
滚动窗口可以看做是特殊的滑动窗口(窗口大小和滑动距离相同)
在这里插入图片描述
3、会话窗口(session windows)
由一系列事件组合一个指定时间长度的timeout间隙组成,也就是一段时间没有接收到新数据就会生成新的窗口
在这里插入图片描述

window api

1、timeWindow、countWindow、window

public class Window1 {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties prop = new Properties();
        prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
        prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"group_2");
        prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
        DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>("sensor",
                new SimpleStringSchema(), prop));
        SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {

            @Override
            public SensorReading map(String s) throws Exception {
                String[] split = s.split(",");
                return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
            }
        });
        SingleOutputStreamOperator<SensorReading> maxStream = mapStream.keyBy("id")
                //滚动时间窗口 将数据流切分成不重叠的窗口
                //每一个事件只能属于一个窗口
                .timeWindow(Time.seconds(15))
                //滑动时间窗口  窗口是不间断的 需要平滑地进行窗口聚合
                //每5秒计算一次最近15秒的最高温度
//                .timeWindow(Time.seconds(15),Time.seconds(5))
                //每15个行为事件统计最高温度
//                .countWindow(15)
                //每2个计算一次最近6个事件的最高温度
//                .countWindow(6,2)
                //会话窗口 会话窗口在一段时间内没有接收到元素时,
                //即当发生不活动的间隙时,会话窗口关闭 静态
//                .window(EventTimeSessionWindows.withGap(Time.seconds(15)))
                .max("temperature");
        mapStream.print("max");
        env.execute("max_1");
    }
}

2、reduce

public class WindowReduce {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties prop = new Properties();
        prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
        prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"group_3");
        prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common" +
                ".serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common" +
                ".serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
        DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>("sensor",
                new SimpleStringSchema(), prop));
        SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
            @Override
            public SensorReading map(String s) throws Exception {
                String[] split = s.split(",");
                return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
            }
        });
        SingleOutputStreamOperator<SensorReading> resultStream = mapStream.keyBy("id")
                .countWindow(6, 2)
                //温度求和
                .reduce(new ReduceFunction<SensorReading>() {
                    @Override
                    public SensorReading reduce(SensorReading sensorReading, SensorReading t1) throws Exception {
                        return new SensorReading(sensorReading.getId(),
                                sensorReading.getTimestamp(),
                                sensorReading.getTemperature() + t1.getTemperature());
                    }
                });
        resultStream.print("sum");
        env.execute("reduce_1");
    }
}

在这里插入图片描述
在这里插入图片描述
3、aggregate

public class WindowAgg {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties prop = new Properties();
        prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
        prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"group_3");
        prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common" +
                ".serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common" +
                ".serialization.StringDeserializer");
        prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
        DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>("sensor",
                new SimpleStringSchema(), prop));
        SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
            @Override
            public SensorReading map(String s) throws Exception {
                String[] split = s.split(",");
                return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
            }
        });
        SingleOutputStreamOperator<Double> aggStream = mapStream.keyBy("id")
                .countWindow(6, 2)
                .aggregate(new AggregateFunction<SensorReading, Tuple2<Double, Integer>, Double>() {
                    @Override
                    public Tuple2<Double, Integer> createAccumulator() {
                        //初始值
                        return new Tuple2<>(0.0, 0);
                    }

                    @Override
                    public Tuple2<Double, Integer> add(SensorReading sensorReading, Tuple2<Double, Integer> doubleIntegerTuple2) {
                        //求和 温度值相加  个数+1
                        double temp = sensorReading.getTemperature() + doubleIntegerTuple2._1;
                        int count = doubleIntegerTuple2._2 + 1;
                        return new Tuple2<>(temp, count);
                    }

                    @Override
                    public Double getResult(Tuple2<Double, Integer> doubleIntegerTuple2) {
                        //求平均值
                        return doubleIntegerTuple2._1 / doubleIntegerTuple2._2;
                    }

                    @Override
                    public Tuple2<Double, Integer> merge(Tuple2<Double, Integer> doubleIntegerTuple2, Tuple2<Double, Integer> acc1) {
                        double temp = doubleIntegerTuple2._1 + acc1._1;
                        int i = doubleIntegerTuple2._2 + acc1._2;
                        return new Tuple2<>(temp, i);
                    }
                });
        aggStream.print("avg");
        env.execute("agg_1");
    }
}

在这里插入图片描述
在这里插入图片描述
4、allowedLateness

public class Window3 {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(2);
        //设置时间语义 事件发生时间 TimeCharacteristic.EventTime
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        Properties prop = new Properties();
        prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
        prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"sensor_group2");
        prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serizlization.StringDeserializer");
        prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serizlization.StringDeserializer");
        prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
        DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>
                ("sensor", new SimpleStringSchema(), prop));
        SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
            @Override
            public SensorReading map(String s) throws Exception {
                String[] split = s.split(",");
                return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
            }//BoundedOutOfOrdernessTimestampExtractor 处理乱序时间 Time.seconds(0) 不延迟
        }).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<SensorReading>(Time.seconds(0)) {
            @Override
            public long extractTimestamp(SensorReading sensorReading) {
                return sensorReading.getTimestamp() * 1000L;
            }
        });
        SingleOutputStreamOperator<SensorReading> maxResultStream
                = mapStream.keyBy("id")
                .timeWindow(Time.seconds(15))
                .allowedLateness(Time.seconds(30))
                .max("temperature");
        maxResultStream.print("max");
        env.execute("window3");
    }
}

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述
5、sideOutputLateData

public class Window3 {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(2);
        //设置时间语义 事件发生时间 TimeCharacteristic.EventTime
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        Properties prop = new Properties();
        prop.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.232.211:9092");
        prop.setProperty(ConsumerConfig.GROUP_ID_CONFIG,"sensor_group2");
        prop.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serizlization.StringDeserializer");
        prop.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serizlization.StringDeserializer");
        prop.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"latest");
        DataStreamSource<String> inputStream = env.addSource(new FlinkKafkaConsumer011<String>
                ("sensor", new SimpleStringSchema(), prop));
        SingleOutputStreamOperator<SensorReading> mapStream = inputStream.map(new MapFunction<String, SensorReading>() {
            @Override
            public SensorReading map(String s) throws Exception {
                String[] split = s.split(",");
                return new SensorReading(split[0], Long.parseLong(split[1]), Double.parseDouble(split[2]));
            }//BoundedOutOfOrdernessTimestampExtractor 处理乱序时间 Time.seconds(0) 不延迟
        }).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<SensorReading>(Time.seconds(0)) {
            @Override
            public long extractTimestamp(SensorReading sensorReading) {
                return sensorReading.getTimestamp() * 1000L;
            }
        });
        OutputTag<SensorReading> outputTag = new OutputTag<SensorReading>("late11111"){};
        SingleOutputStreamOperator<SensorReading> maxResultStream
                = mapStream.keyBy("id")
                .timeWindow(Time.seconds(15))
                .allowedLateness(Time.seconds(30))
                .sideOutputLateData(outputTag)
                .max("temperature");
        maxResultStream.print("max");
        DataStream<SensorReading> sideOutput = maxResultStream.getSideOutput(outputTag);
        sideOutput.print("sideout");
        env.execute("window");
    }
}

在这里插入图片描述
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值