26.flink table中使用水位线和event_time

我先森

已于 2022-09-19 17:47:46 修改

阅读量973

点赞数

文章标签： flink 大数据数据库

于 2022-09-19 12:53:47 首次发布

转载需注明

本文链接：https://blog.csdn.net/qq_36066039/article/details/126862434

版权

概念和通用api

本章节不涉及到具体代码逻辑，主要是对table api的整体把握。不涉及代码，不涉及代码。适合中等水平阅读。flink对table api和sql api最终都会转化为DataStream程序，在这中间有一个转化器，所以不要问Table api会不会完全取代DataStream 这种问题了。
注意：纯table 环境TableEnvironment 和基于DataStream构建table环境StreamTableEnvironment在实际用的时候语法还是有些去别的。
TableEnvironment构造 event_time 和水位线是在connect连接器中以sql的形式添加的。
StreamTableEnvironment构造event_time 和水位线可以直接在流到表的转换时用schem 定义。

如果你看了我之前的水位线的文章你会知道，DataStream关于事件时间和水位线其实代码量很复杂的，实际上Table api目前已经足够完善，Table api中定义event_time和水位线已经十分方便。
读者必须要知道Table api中的Table环境的构建分为纯TableEnvironment和混合的StreamTableEnvironment，二者在event_time和水位线的定义略有不同。请看文吧。

吐槽一点，随着flink 使用多了，发现文章越写越难写了，这可能是最后一篇关于flink的文章的。 flink官网的东西太多了，我上班又很忙基本没时间去写文章了。拜拜大家。

1.创建table环境的两种方式

1.1纯table 环境

import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;

EnvironmentSettings settings = EnvironmentSettings
    .newInstance()
    .inStreamingMode()
    //.inBatchMode()
    .build();

TableEnvironment tEnv = TableEnvironment.create(settings);

inStreamingMode()和inBatchMode()决定了运行模式是流模式还是批模式，流处理和批处理的行为表现是不同的，具体区别参考：流批的思考
table api以后将会是flink主推的核心模块，table完全支持流/批处理，相对而言只用于批处理的DataSet api将会被逐渐抛弃。

1.2.基于DataStream构建table环境(此方式不支持批处理)

再说一遍此方式构造的table环境目前仅支持流处理，不支持批处理附上官网原文：

Currently, the StreamTableEnvironment does not support enabling the batch execution mode yet. Nevertheless, bounded streams can be processed there using the streaming execution mode but with lower efficiency.
Note, however, that the general TableEnvironment can work in both streaming execution or optimized batch execution mode.

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);

因为一些底层复杂逻辑可能需要datastream模块来写，但是我们可能希望后面的分析部分依旧用table api, 因此这个模块是用来做 DataStream和table api 转换的流，处理过程中随便你怎么转换。此api适合比较复杂的逻辑。

实战开发中我们经常这么做：

1.在DataStream API中实现主要逻辑之前，使用table 连接器轻松访问外部数据源，然后利用sql的便捷性做一些脏数据过滤。
2.如果需要一些自定义触发器，或者是状态化程序等sql不支持的语法那么就切换回DataStram api.
3.在写出计算结果的时候再次切换回table api借用连接器轻松输出计算结果。

连接器是flink为我们实现好的数据传输管道，支持kafka mysql elasticsearch 等等。本章不讲解连接器，只讲解核心逻辑。

1.3 永久表和临时表（视图）

永久表指的是flink链接外部系统创建的表，简单来说就是读取的外部系统的数据。临时表也称为视图一般是基于sqlQuery的查询来创建。
创建一个永久表：

     tableEnv.executeSql("CREATE TABLE sinkTable (name STRING,score INT,PRIMARY KEY (name) NOT ENFORCED) WITH (" +
                "'connector' = 'jdbc'," +
                "'url' = 'jdbc:mysql://你的ip:3306/TestDB'" +
                ",'table-name' = 'socket_test'," +
                "'username'='root'," +
                "'password'='123456'," +
                "'driver'='com.mysql.cj.jdbc.Driver')");

上述用的是mysql的连接器创建的表，sinkTable 就是一个实在的表了。

2.查询表的两种方式

我们以sinkTable为例子：

tableEnv.sqlQuery(“select name，score from sinkTable”)

Table t = tableEnv.from(“sinkTable”).as(“name”,“score”)
t.select($(“name”)).filter(…).groupBy(…);

第一种是传统的sql方式，第二种是flink封装的方法。两种api都可以用。

3.将查询结果发出落地

1.A batch Table can only be written to a BatchTableSink,
2. a streaming Table requires either an AppendStreamTableSink, a RetractStreamTableSink, or an UpsertStreamTableSink.

上面的介绍摘自官网，我觉得英文更能说清楚。意思就是批处理模式当且仅当结果发出一次即可，因为结果是确定的，发出用的是：BatchTableSink
而对于流处理的Tabl
的发出则根据行为分为：AppendStreamTableSink，RetractStreamTableSink，UpsertStreamTableSink。

AppendStreamTableSink：用于查询落地，查询的结果直接放入落地的目的地，不涉及更新等复杂操作。
UpsertStreamTableSink: 如果目标库中存在则更新，不存在则创建，此行为就是数据库中的upsert概念
RetractStreamTableSink：适用于复杂聚合/窗口处理等复杂table逻辑。

flink发出计算结果有两种写法：

table1.executeInsert(“table2”);
env.executeSql("insert into table +select子查询 "); //此写法和数据库sql语句基本一致，不清楚可以直接百度.

更推荐第二种写法，简单明了。

4.深入理解表到流的转换

4.1.table 转换为DataStream的四种方法

toChangelogStream(…)
toDataStream(…)
toAppendStream(…)
toRetractStream(…)
看下面一段代码：

package com.flink.demo.tableApi.kafka;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Schema;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
import org.apache.flink.types.Row;
public class ChangeLogStream {
    public static void main(String[] args) throws Exception {
// create environments of both APIs
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

// create a DataStream
        DataStream<Row> dataStream = env.fromElements(
                Row.of("Alice", 12),
                Row.of("Bob", 10),
                Row.of("Alice", 100));

// interpret the insert-only DataStream as a Table
        Table inputTable = tableEnv.fromDataStream(dataStream).as("name", "score");

// register the Table object as a view and query it
// the query contains an aggregation that produces updates
        tableEnv.createTemporaryView("InputTable", inputTable);
        Table resultTable = tableEnv.sqlQuery(
                "SELECT name, SUM(score) FROM InputTable GROUP BY name");

//        DataStream<Row> resultStream = tableEnv.toChangelogStream(resultTable);
        DataStream<Row> resultStream = tableEnv.toAppendStream(resultTable,Row.class);
//        DataStream<Tuple2<Boolean, Row>> resultStream = tableEnv.toRetractStream(resultTable,Row.class);

        resultStream.print(">>>>>>>>>>>>>>>>>>>>>>>>>>>>:");
        env.execute();

    }
}

建议读者自行运行一下就会发现：在执行sql之后转换成DataStream的时候只有toChangelogStream，和toRetractStream是正常运行的，toDataStream和toAppendStream都会报错。也就是说在flink中 sql的不同行为定义了对应的转换流的方法，不能乱用否则一定出错。

比如：select * from table–>这种sql的话不涉及到分组状态，所产生的数据全都是insert流，此时可以用toDataStream或者toAppnedStream
除此之外任何涉及分组，聚合，窗口的数据计算都要用toChangelogStream，和toRetractStream。
如果不明白为什么建议阅读：flink 流和批的一些思考

4.2.DataStream api和Table api混合使用的时候关于参数的注意事项

我们都知道这两个流数据是可以转换的，但是关于flink程序的配置configration则有一点要注意，tableEnv的配置不会传递给DataStream 流，但是DataStreamEnv的配置会传递给Table流，因此对于配置信息应该在DataStream的Env 设置。
我们建议在切换到Table API之前尽早在DataStream API中设置所有的配置参数。

import java.time.ZoneId;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;

// create Java DataStream API

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

// set 需要的参数配置

env.setMaxParallelism(256);

env.getConfig().addDefaultKryoSerializer(MyCustomType.class, CustomKryoSerializer.class);

env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);

// 转化为 Table环境

StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
...
...
...

5.多管道同时运行

下面给了一个代码，逻辑是从数据库中查出数据然后录入两个目录中：

5.1.mysql 源数据如下：

在这里插入图片描述


import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.TableEnvironment;

public class TableStatementPiple {
    public static void main(String[] args) {
        EnvironmentSettings settings = EnvironmentSettings
                .newInstance()
                .inStreamingMode()
//                .inBatchMode()
                .build();

        TableEnvironment tEnv = TableEnvironment.create(settings);
        tEnv.executeSql("CREATE TABLE MyUserTable (\n" +
                "  id INT,\n" +
                "  order_name STRING,\n" +
                "  price DOUBLE\n" +
                ") WITH (\n" +
                "   'connector' = 'jdbc',\n" +
                "   'url' = 'jdbc:mysql://localhost:3306/testdb',\n" +
                "   'driver'='com.mysql.cj.jdbc.Driver',\n "+
                "   'username'='root',\n"+
                "   'password'='123456',\n"+
                "   'table-name' = 'order'\n" +
                ")");
        tEnv.executeSql("CREATE TABLE sink01 (\n" +
                "  id INT,\n" +
                "  order_name STRING,\n" +
                "  price DOUBLE\n" +
                ") PARTITIONED BY (order_name) WITH (\n" +
                "  'connector' = 'filesystem',           \n" +
                "  'path' = 'C:\\\\Users\\\\penggan\\\\Desktop\\\\logdir\\\\re',  \n" +
                "  'format' = 'csv'\n" +
                ")");
        tEnv.executeSql("CREATE TABLE sink02 (\n" +
                "  id INT,\n" +
                "  order_name STRING,\n" +
                "  price DOUBLE\n" +
                ") PARTITIONED BY (order_name) WITH (\n" +
                "  'connector' = 'filesystem',           \n" +
                "  'path' = 'C:\\\\Users\\\\penggan\\\\Desktop\\\\logdir\\\\re2',  \n" +
                "  'format' = 'csv'\n" +
                ")");

//输出结果第一种写法
        tEnv.createStatementSet()
                .addInsertSql("INSERT INTO sink01 SELECT * FROM MyUserTable")
                .addInsertSql("INSERT INTO sink02 SELECT * FROM MyUserTable")
                .execute();
     //输出结果第二种写法
       tableEnv.createStatementSet()
    	.addInsert("OutputTable", tableEnv.from("InputTable"))
   	 	.addInsert("OutputTable2", tableEnv.from("InputTable"))
    	.execute()
    }
}

5.2.结果如下图：

在这里插入图片描述
对于上面的sql语句不做过多解释，属于连接器的知识，以后会讲解。本小节内容的目的是学会：createStatementSet()批量输出各种不同的目的地。

6.DataStream和Table Api混合使用的时候触发问题

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv= StreamTableEnvironment.create(env);
.....省略无用代码
.....省略无用代码
.....省略无用代码
.....省略无用代码

// (1)

// adds a branch with a printing sink to the StreamExecutionEnvironment
tableEnv.toDataStream(table).print()

// (2)

// executes a Table API end-to-end pipeline as a Flink job and prints locally,
// thus (1) has still not been executed
table.execute().print()

// executes the DataStream API pipeline with the sink defined in (1) as a
// Flink job, (2) was already running before
env.execute()

意思是说：env不调用execute() 则代码中由table 转化为DataStream的数据流都不会真的执行。
对于tableEnv来说，只要调用table.execute()就会立即执行。
这一点要注意一下。

7.StreamTableEnvironment仅插入流的处理

仅插入流又称为insert-only流数据，不涉及复杂聚合计算，每次sql执行的结果都不会影响之前的任何结果，这就是插入流。
如果不理解请回头看：第四小节–》深入理解表到流的转换
A StreamTableEnvironment offers the following methods to convert from and to DataStream API:

fromDataStream(DataStream): Interprets a stream of insert-only changes and arbitrary type as a table. 默认情况 Event-time and watermarks 不会被传播
fromDataStream(DataStream, Schema): Interprets a stream of insert-only changes and arbitrary type as a table. schema 参数可以定义列table 列的信息，定义table的主键，以及添加Event-time and watermarks 。
createTemporaryView(String, DataStream): Registers the stream under a name to access it in SQL. It is a shortcut for createTemporaryView(String, fromDataStream(DataStream)).
createTemporaryView(String, DataStream, Schema): Registers the stream under a name to access it in SQL. It is a shortcut for createTemporaryView(String, fromDataStream(DataStream, Schema)).
toDataStream(DataStream): Converts a table into a stream of insert-only changes. The default stream record type is org.apache.flink.types.Row. A single rowtime attribute column is written back into the DataStream API’s record. Watermarks are propagated as well. 水位线和event_time都会从table api传播到DataStream
toDataStream(DataStream, AbstractDataType): Converts a table into a stream of insert-only changes. This method accepts a data type to express the desired stream record type. The planner might insert implicit casts and reorders columns to map columns to fields of the (possibly nested) data type.
toDataStream(DataStream, Class): A shortcut for toDataStream(DataStream, DataTypes.of(Class)) to quickly create the desired data type reflectively.

总结：

toDataStream 方法，table 中定义的event_time 和水位线是会传递到DataStream中的。
fromDataStream方法， DataStream 中的event_time 和水位线是不会传递到Table中的，此时需要用Schem来定义event_time和水位线。

Schem

Schem 是StreamTableEnvironment 中非常重要的类，主要用于DataStream 转Table的时候定义列的信息，除了定义列的信息之外，还可以定义事件时间event_time， processing_time, 以及水位线信息。对于纯粹TableEnvironment api则不需要关注Schem, 纯粹table api定义事件时间和水位线是直接在连接器中定义。下面分别展示TableEnvironment ，StreamTableEnvironment 定义event_time和水位线的例子。

纯TableEnvironment 定义event_time 和水位线

mysql语句如下：
/*
 Navicat Premium Data Transfer

 Source Server         : linux-master
 Source Server Type    : MySQL
 Source Server Version : 80018
 Source Host           : 192.168.134.128:3306
 Source Schema         : TestDB

 Target Server Type    : MySQL
 Target Server Version : 80018
 File Encoding         : 65001

 Date: 19/09/2022 00:56:27
*/

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for person
-- ----------------------------
DROP TABLE IF EXISTS `person`;
CREATE TABLE `person`  (
  `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` varchar(260) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '' COMMENT '姓名',
  `age` int(11) UNSIGNED NOT NULL DEFAULT 0 COMMENT '年龄',
  `sex` tinyint(2) UNSIGNED NOT NULL DEFAULT 2 COMMENT '0:女, 1男',
  `email` text CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NULL COMMENT '邮箱',
  `event_time` timestamp(0) NULL DEFAULT NULL,
  `num` int(11) NULL DEFAULT NULL,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 18 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;

-- ----------------------------
-- Records of person
-- ----------------------------
INSERT INTO `person` VALUES (1, 'pg01', 12, 1, 'www.johngo689.com', '2022-09-19 23:01:01', 1);
INSERT INTO `person` VALUES (2, 'pg01', 13, 0, 'www.johngo689.com', '2022-09-19 23:01:02', 1);
INSERT INTO `person` VALUES (3, 'pg01', 14, 0, 'www.johngo689.com', '2022-09-19 23:01:03', 1);
INSERT INTO `person` VALUES (4, 'pg01', 15, 0, 'www.johngo689.com', '2022-09-19 23:01:04', 1);
INSERT INTO `person` VALUES (5, 'pg01', 16, 1, 'www.johngo689.com', '2022-09-19 23:01:05', 1);
INSERT INTO `person` VALUES (6, 'pg01', 17, 1, 'www.johngo689.com', '2022-09-19 23:01:06', 1);
INSERT INTO `person` VALUES (7, 'pg01', 18, 0, 'www.johngo689.com', '2022-09-19 23:01:07', 1);
INSERT INTO `person` VALUES (8, 'pg01', 19, 0, 'www.johngo689.com', '2022-09-19 23:01:08', 1);
INSERT INTO `person` VALUES (9, 'pg01', 20, 1, 'www.johngo689.com', '2022-09-19 23:01:09', 1);
INSERT INTO `person` VALUES (10, 'pg01', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:10', 1);
INSERT INTO `person` VALUES (16, 'pg01', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:11', 1);
INSERT INTO `person` VALUES (17, 'pg01', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:12', 1);
INSERT INTO `person` VALUES (18, 'pg01', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:13', 1);
INSERT INTO `person` VALUES (19, 'pg02', 13, 0, 'www.johngo689.com', '2022-09-19 23:01:01', 1);
INSERT INTO `person` VALUES (20, 'pg02', 14, 0, 'www.johngo689.com', '2022-09-19 23:01:02', 1);
INSERT INTO `person` VALUES (21, 'pg02', 15, 0, 'www.johngo689.com', '2022-09-19 23:01:03', 1);
INSERT INTO `person` VALUES (22, 'pg02', 20, 1, 'www.johngo689.com', '2022-09-19 23:01:04', 1);
INSERT INTO `person` VALUES (23, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:05', 1);
INSERT INTO `person` VALUES (24, 'pg02', 16, 1, 'www.johngo689.com', '2022-09-19 23:01:06', 1);
INSERT INTO `person` VALUES (25, 'pg02', 17, 1, 'www.johngo689.com', '2022-09-19 23:01:07', 1);
INSERT INTO `person` VALUES (26, 'pg02', 18, 0, 'www.johngo689.com', '2022-09-19 23:01:08', 1);
INSERT INTO `person` VALUES (27, 'pg02', 19, 0, 'www.johngo689.com', '2022-09-19 23:01:09', 1);
INSERT INTO `person` VALUES (28, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:10', 1);
INSERT INTO `person` VALUES (29, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:11', 1);
INSERT INTO `person` VALUES (30, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:12', 1);
INSERT INTO `person` VALUES (31, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:13', 1);
INSERT INTO `person` VALUES (32, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:14', 1);
INSERT INTO `person` VALUES (33, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:15', 1);
INSERT INTO `person` VALUES (34, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:16', 1);
INSERT INTO `person` VALUES (35, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:17', 1);
INSERT INTO `person` VALUES (36, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:18', 1);
INSERT INTO `person` VALUES (37, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:19', 1);
INSERT INTO `person` VALUES (38, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:20', 1);
INSERT INTO `person` VALUES (39, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:21', 1);
INSERT INTO `person` VALUES (40, 'pg02', 21, 0, 'www.johngo689.com', '2022-09-19 23:01:22', 1);

SET FOREIGN_KEY_CHECKS = 1;


#################################################下面是java代码*******************************
package com.flink.demo.tableApi.kafka.onlyTable;

import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.TableEnvironment;

public class tableEventTimeWindow {
    public static void main(String[] args) {

        EnvironmentSettings settings = EnvironmentSettings
                .newInstance()
//                .inStreamingMode()
                .inBatchMode()
                .build();

        TableEnvironment tableEnv = TableEnvironment.create(settings);
        //定义事件时间和水位线
        tableEnv.executeSql("CREATE TABLE MyTable (" +
                "name string ," +
                "num INT," +
                "event_time TIMESTAMP(0) ," +
                "WATERMARK FOR event_time as event_time -INTERVAL '0.001' SECOND," +
                "PRIMARY KEY (name) NOT ENFORCED" +
                ") WITH (" +
                "'connector' = 'jdbc'," +
                "'url' = 'jdbc:mysql://192.168.37.128:3306/TestDB'" +
                ",'table-name' = 'person'," +
                "'username'='root'," +
                "'password'='123456'," +
                "'driver'='com.mysql.cj.jdbc.Driver')");
//        tableEnv.sqlQuery("select * from MyTable").execute().print();
//查询的时候定义event_time窗口
        Table table =  tableEnv.sqlQuery("select name,sum(num)," +
                "TUMBLE_START(event_time, INTERVAL '2' SECOND) AS start_ts," +
                "TUMBLE_END(event_time, INTERVAL '2' SECOND) AS end_ts " +
                "from MyTable group by " +
                "name, TUMBLE(event_time, INTERVAL '2' SECOND)");//直接sql查询
        table.execute().print();


注意事项：event_time 在mysql中存储的必须是timestamp或者是DateTime格式，不能是String或者是long
这样建立连接的时候才能指定该字段为event_time, 
WATERMARK FOR event_time as event_time -INTERVAL '0.001' SECOND 指的是基于event_time生成一个周期性递增的水位线,
周期为1毫秒


    }
}

±-------------------------------±------------±--------------------±--------------------+
| name | EXPR$1 | start_ts | end_ts |
±-------------------------------±------------±--------------------±--------------------+
| pg02 | 1 | 2022-09-19 23:01:00 | 2022-09-19 23:01:02 |
| pg02 | 2 | 2022-09-19 23:01:02 | 2022-09-19 23:01:04 |
| pg02 | 2 | 2022-09-19 23:01:04 | 2022-09-19 23:01:06 |
| pg02 | 2 | 2022-09-19 23:01:06 | 2022-09-19 23:01:08 |
| pg02 | 2 | 2022-09-19 23:01:08 | 2022-09-19 23:01:10 |
| pg02 | 2 | 2022-09-19 23:01:10 | 2022-09-19 23:01:12 |
| pg02 | 2 | 2022-09-19 23:01:12 | 2022-09-19 23:01:14 |
| pg02 | 2 | 2022-09-19 23:01:14 | 2022-09-19 23:01:16 |
| pg02 | 2 | 2022-09-19 23:01:16 | 2022-09-19 23:01:18 |
| pg02 | 2 | 2022-09-19 23:01:18 | 2022-09-19 23:01:20 |
| pg02 | 2 | 2022-09-19 23:01:20 | 2022-09-19 23:01:22 |
| pg02 | 1 | 2022-09-19 23:01:22 | 2022-09-19 23:01:24 |
| pg01 | 1 | 2022-09-19 23:01:00 | 2022-09-19 23:01:02 |
| pg01 | 2 | 2022-09-19 23:01:02 | 2022-09-19 23:01:04 |
| pg01 | 2 | 2022-09-19 23:01:04 | 2022-09-19 23:01:06 |
| pg01 | 2 | 2022-09-19 23:01:06 | 2022-09-19 23:01:08 |
| pg01 | 2 | 2022-09-19 23:01:08 | 2022-09-19 23:01:10 |
| pg01 | 2 | 2022-09-19 23:01:10 | 2022-09-19 23:01:12 |
| pg01 | 2 | 2022-09-19 23:01:12 | 2022-09-19 23:01:14 |
±-------------------------------±------------±--------------------±--------------------+

StreamTableEnvironment 定义水位线和event_time

// some example POJO
public  class User {
    public String name;

    public Integer score;

    public String u_time;

    // default constructor for DataStream API
    public User() {}

    // fully assigning constructor for Table API
    public User(String name, Integer score, String u_time) {
        this.name = name;
        this.score = score;
        this.u_time = u_time;
    }
}

import com.flink.demo.enty.User;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Schema;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;

import java.time.Instant;


public class WindowEventTime {
    public static void main(String[] args) throws Exception {
// create environments of both APIs
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);
        System.out.println(Instant.ofEpochMilli(1000));
// create a DataStream

        DataStream<User> dataStream = env.fromElements(
                new User("Alice", 1, "2022-09-19 22:00:01"),
                new User("Alice", 1, "2022-09-19 22:00:02"),
                new User("Alice", 1, "2022-09-19 22:00:03"),
                new User("Alice", 1, "2022-09-19 22:00:04"),
                new User("Alice", 1, "2022-09-19 22:00:05"),
                new User("Alice", 1, "2022-09-19 22:00:06"),
                new User("Alice", 1, "2022-09-19 22:00:07"),
                new User("Alice", 1, "2022-09-19 22:00:08"),
                new User("Alice", 1, "2022-09-19 22:00:09"),
                new User("Alice", 1, "2022-09-19 22:00:10"),
                new User("Alice", 1, "2022-09-19 22:00:11"),
                new User("Alice", 1, "2022-09-19 22:00:12"),
                new User("Alice", 1, "2022-09-19 22:00:13"),
                new User("Alice", 1, "2022-09-19 22:00:14"),
                new User("Alice", 1, "2022-09-19 22:00:15"),
                new User("Bob", 1, "2022-09-19 22:00:01"),
                new User("Bob", 1, "2022-09-19 22:00:02"),
                new User("Bob", 1, "2022-09-19 22:00:03"),
                new User("Bob", 1, "2022-09-19 22:00:04"),
                new User("Bob", 1, "2022-09-19 22:00:05"),
                new User("Bob", 1, "2022-09-19 22:00:06"),
                new User("Bob", 1, "2022-09-19 22:00:07"),
                new User("Bob", 1, "2022-09-19 22:00:08"),
                new User("Bob", 1, "2022-09-19 22:00:09"),
                new User("Bob", 1, "2022-09-19 22:00:10"),
                new User("Bob", 1, "2022-09-19 22:00:11"),
                new User("Bob", 1, "2022-09-19 22:00:12"),
                new User("Bob", 1, "2022-09-19 22:00:13"),
                new User("Bob", 1, "2022-09-19 22:00:14"),
                new User("Bob", 1, "2022-09-19 22:00:15"));
//        构造processing time
//        Schema schema = Schema.newBuilder()
//                .columnByExpression("proc_time", "PROCTIME()")
//                .build();
//构造事件时间和水位线
        Schema schema = Schema.newBuilder()
                .columnByExpression("event_time", "CAST(u_time AS TIMESTAMP_LTZ(3))")
                .watermark("event_time", "event_time - INTERVAL '0.001' SECOND")
                .build();
        Table table = tableEnv.fromDataStream(dataStream,schema);
//        table.printSchema();
        tableEnv.createTemporaryView("view",table);
        tableEnv.sqlQuery("select name,sum(score),TUMBLE_START(event_time, INTERVAL '2' SECOND) AS start_ts,TUMBLE_END(event_time, INTERVAL '2' SECOND) AS start_ts from view group by name,TUMBLE(event_time, INTERVAL '2' SECOND)").execute().print();


    }
}


    }
}

解释：

columnByExpression(“event_time”, “CAST(event_time AS TIMESTAMP_LTZ(3))”)意思是将数据对象中的u_time转化为flink中的TIMESTAMP_LTZ(3) 类型，然后转换的值用event_time来保存，并将event_time加入table中，这一步相当于添加了新的列,列明为event_time。
watermark(“event_time”, “event_time- INTERVAL ‘0.001’ SECOND”) 意思是列event_time作为流数据的事件时间，并且
基于event_time 每隔1毫秒生成一个递增的水位线

结果：
±—±-------------------------------±------------±------------------------±------------------------+
| op | name | EXPR$1 | start_ts | start_ts0 |
±—±-------------------------------±------------±------------------------±------------------------+
| +I | Bob | 2 | 2022-09-19 22:00:14.000 | 2022-09-19 22:00:16.000 |
| +I | Alice | 1 | 2022-09-19 22:00:00.000 | 2022-09-19 22:00:02.000 |
| +I | Alice | 2 | 2022-09-19 22:00:02.000 | 2022-09-19 22:00:04.000 |
| +I | Alice | 2 | 2022-09-19 22:00:04.000 | 2022-09-19 22:00:06.000 |
| +I | Alice | 2 | 2022-09-19 22:00:06.000 | 2022-09-19 22:00:08.000 |
| +I | Alice | 2 | 2022-09-19 22:00:08.000 | 2022-09-19 22:00:10.000 |
| +I | Alice | 2 | 2022-09-19 22:00:10.000 | 2022-09-19 22:00:12.000 |
| +I | Alice | 2 | 2022-09-19 22:00:12.000 | 2022-09-19 22:00:14.000 |
| +I | Alice | 2 | 2022-09-19 22:00:14.000 | 2022-09-19 22:00:16.000 |
±—±-------------------------------±------------±------------------------±------------------------+