4.2.9 Flink-流处理框架-Table API 与 SQL-流处理中的特殊概念之窗口(Windows)

目录

1.写在前面

2.窗口(Windows)

2.1 分组窗口(Group Windows)

2.1.1 滚动窗口

2.1.2 滑动窗口

2.1.3 会话窗口

 2.2 Over Windows

2.2.1 无界 Over Windows

2.2.2 有界的 over window

2.3 SQL 中窗口的定义

2.3.1 Group Windows

2.3.2 Over Windows

3.代码总结

4.代码练习(以分组滚动窗口为例)


1.写在前面

        时间语义,要配合窗口操作才能发挥作用。最主要的用途,当然就是开窗口、根据时间段做计算了。下面我们就来看看 Table API 和 SQL 中,怎么利用时间字段做窗口操作。 在 Table API 和 SQL 中,主要有两种窗口:Group WindowsOver Windows

2.窗口(Windows)

2.1 分组窗口(Group Windows)

        分组窗口(Group Windows)会根据时间或行计数间隔,将行聚合到有限的组(Group) 中,并对每个组的数据执行一次聚合函数。Table API 中的 Group Windows 都是使用.window(w:GroupWindow)子句定义的,并且必须由 as 子句指定一个别名。为了按窗口对表进行分组,窗口的别名必须在 group by 子句 中,像常规的分组字段一样引用。

2.1.1 滚动窗口

        代码如下: 

2.1.2 滑动窗口

        代码如下:

2.1.3 会话窗口

        代码如下:

 2.2 Over Windows

        可以在事件时间或处理时间,以及指定为时间间隔、或行计数的范围内,定义 Over windows。 无界的over window是使用常量指定的。也就是说,时间间隔要指定UNBOUNDED_RANGE, 或者行计数间隔要指定 UNBOUNDED_ROW。而有界的 over window 是用间隔的大小指定的。

2.2.1 无界 Over Windows

2.2.2 有界的 over window

2.3 SQL 中窗口的定义

        我们已经了解了在 Table API 里 window 的调用方式,同样,我们也可以在 SQL 中直接加入窗口的定义和使用。

2.3.1 Group Windows

        Group Windows 在 SQL 查询的 Group BY 子句中定义。与使用常规 GROUP BY 子句的查询 一样,使用 GROUP BY 子句的查询会计算每个组的单个结果行。

2.3.2 Over Windows

        由于 Over 本来就是 SQL 内置支持的语法,所以这在 SQL 中属于基本的聚合操作。所有聚合必须在同一窗口上定义,也就是说,必须是相同的分区、排序和范围。目前仅支持在当前行范围之前的窗口(无边界和有边界)。 注意,ORDER BY 必须在单一的时间属性上指定。

3.代码总结

import com.atguigu.apitest.beans.SensorReading;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.table.api.Over;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.Tumble;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.types.Row;

public class TableTest5_TimeAndWindow {
    public static void main(String[] args) throws Exception {
        // 1. 创建环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

        // 2. 读入文件数据,得到DataStream
        DataStream<String> inputStream = env.readTextFile("D:\\Projects\\BigData\\FlinkTutorial\\src\\main\\resources\\sensor.txt");

        // 3. 转换成POJO
        DataStream<SensorReading> dataStream = inputStream.map(line -> {
            String[] fields = line.split(",");
            return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
        })
                .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<SensorReading>(Time.seconds(2)) {
                    @Override
                    public long extractTimestamp(SensorReading element) {
                        return element.getTimestamp() * 1000L;
                    }
                });

        // 4. 将流转换成表,定义时间特性
//        Table dataTable = tableEnv.fromDataStream(dataStream, "id, timestamp as ts, temperature as temp, pt.proctime");
        Table dataTable = tableEnv.fromDataStream(dataStream, "id, timestamp as ts, temperature as temp, rt.rowtime");

        tableEnv.createTemporaryView("sensor", dataTable);

        // 5. 窗口操作
        // 5.1 Group Window
        // table API
        Table resultTable = dataTable.window(Tumble.over("10.seconds").on("rt").as("tw"))
                .groupBy("id, tw")
                .select("id, id.count, temp.avg, tw.end");

        // SQL
        Table resultSqlTable = tableEnv.sqlQuery("select id, count(id) as cnt, avg(temp) as avgTemp, tumble_end(rt, interval '10' second) " +
                "from sensor group by id, tumble(rt, interval '10' second)");

        // 5.2 Over Window
        // table API
        Table overResult = dataTable.window(Over.partitionBy("id").orderBy("rt").preceding("2.rows").as("ow"))
                .select("id, rt, id.count over ow, temp.avg over ow");

        // SQL
        Table overSqlResult = tableEnv.sqlQuery("select id, rt, count(id) over ow, avg(temp) over ow " +
                " from sensor " +
                " window ow as (partition by id order by rt rows between 2 preceding and current row)");

//        dataTable.printSchema();
//        tableEnv.toAppendStream(resultTable, Row.class).print("result");
//        tableEnv.toRetractStream(resultSqlTable, Row.class).print("sql");
        tableEnv.toAppendStream(overResult, Row.class).print("result");
        tableEnv.toRetractStream(overSqlResult, Row.class).print("sql");

        env.execute();
    }
}

4.代码练习(以分组滚动窗口为例)

        我们可以综合学习过的内容,用一段完整的代码实现一个具体的需求。例如,可以开一 个滚动窗口,统计 10 秒内出现的每个 sensor 的个数。

package com.ucas.windows;

import com.ucas.beans.SensorReading;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.Tumble;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.types.Row;
import sun.management.Sensor;

/**
 * @author GONG
 * @version 1.0
 * @date 2021/3/14 23:01
 */
public class WindowDemo {
    public static void main(String[] args) throws Exception {
        /**
         * 注册环境
         */
        EnvironmentSettings mySetting = EnvironmentSettings
                .newInstance()
//                .useOldPlanner()
                .useBlinkPlanner()
                .inStreamingMode()
                .build();
        //1.设置流执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //2.设置并行度
        env.setParallelism(1).setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        //3.根据流执行环境创建流转表的环境
        StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(env, mySetting);


        //4.读取文件,得到dataStream
        String filePath = "D:\\IDEA_WORK\\flinkstudy\\src\\main\\resources\\sensor.txt";


        //5.利用流执行环境读取txt
        DataStreamSource<String> inputStream = env.readTextFile(filePath);
        //6.转换成 Java Bean,并指定 timestamp 和 watermark
        SingleOutputStreamOperator<SensorReading> dataStream = inputStream.map(line -> {
            String[] fields = line.split(",");
            return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
        }).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<SensorReading>(Time.seconds(1)) {
            @Override
            public long extractTimestamp(SensorReading sensorReading) {
                return sensorReading.getTimestamp() * 1000L;
            }
        });

        //7.将 DataStream 转换为 Table
        Table sensorTable = tableEnvironment.fromDataStream(dataStream, "id,timestamp.rowtime as ts,temperature");

        //8.开窗聚合
        Table selectTable = sensorTable.window(Tumble.over("10.seconds").on("ts").as("tw"))
                .groupBy("id,tw")
                .select("id,id.count,temperature.avg,tw.end");
        //9.sql
        tableEnvironment.createTemporaryView("sensor", sensorTable);
        Table resultSqlTable = tableEnvironment.sqlQuery(
                "select id, count(id), avg(temperature), tumble_end(ts, interval '10' second) " + "from sensor group by id, tumble(ts, interval '10' second)");
        //10.转换成流打印输出
        tableEnvironment.toAppendStream(selectTable, Row.class).print("select:");
        tableEnvironment.toAppendStream(resultSqlTable, Row.class).print("sql:");
        //11.开始执行环境
        env.execute("time and window test");
    }
}
package com.ucas.beans;

// 传感器温度读数的数据类型
public class SensorReading {
    // 属性:id,时间戳,温度值
    private String id;
    private Long timestamp;
    private Double temperature;

    public SensorReading() {
    }

    public SensorReading(String id, Long timestamp, Double temperature) {
        this.id = id;
        this.timestamp = timestamp;
        this.temperature = temperature;
    }

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public Long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(Long timestamp) {
        this.timestamp = timestamp;
    }

    public Double getTemperature() {
        return temperature;
    }

    public void setTemperature(Double temperature) {
        this.temperature = temperature;
    }

    @Override
    public String toString() {
        return "SensorReading{" +
                "id='" + id + '\'' +
                ", timestamp=" + timestamp +
                ", temperature=" + temperature +
                '}';
    }
}

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值