Flink SQL:Queries(Windowing TVF)

本文详细介绍了Apache Flink SQL中的窗口化表值函数(Windowing TVFs),包括TUMBLE、HOP和CUMULATE三种类型,以及窗口偏移的概念。这些函数用于将无限流按固定大小或滑动方式分段,以便进行计算。Flink的窗口化TVFs提供了一种更灵活、更符合SQL标准的方式来支持复杂窗口计算,如Window TopN、Window Join等,而传统的Grouped Window Functions仅支持窗口聚合。
摘要由CSDN通过智能技术生成

Windowing table-valued functions (Windowing TVFs)

Batch Streaming

Windows are at the heart of processing infinite streams. Windows split the stream into “buckets” of finite size, over which we can apply computations. This document focuses on how windowing is performed in Flink SQL and how the programmer can benefit to the maximum from its offered functionality.
窗口是处理无限流的核心。窗口将流拆分为有限大小的“桶”,我们可以在其中应用计算。本文档重点介绍如何在Flink SQL中执行窗口化,以及程序员如何从其提供的功能中最大限度地受益。

Apache Flink provides several window table-valued functions (TVF) to divide the elements of your table into windows, including:
Apache Flink提供了几个窗口表值函数 (TVF)来将表的元素划分为窗口,包括:

  • Tumble Windows
    滚动窗口
  • Hop Windows
    跳跃窗口
  • Cumulate Windows
    累积窗口
  • Session Windows (will be supported soon)
    会话窗口(即将支持)

Note that each element can logically belong to more than one window, depending on the windowing table-valued function you use. For example, HOP windowing creates overlapping windows wherein a single element can be assigned to multiple windows.
请注意,每个元素在逻辑上可以属于多个窗口,具体取决于您使用的窗口化表值函数。例如,HOP windowing创建重叠窗口,其中单个元素可以分配给多个窗口。

Windowing TVFs are Flink defined Polymorphic Table Functions (abbreviated PTF). PTF is part of the SQL 2016 standard, a special table-function, but can have a table as a parameter. PTF is a powerful feature to change the shape of a table. Because PTFs are used semantically like tables, their invocation occurs in a FROM clause of a SELECT statement.
窗口化表值函数是Flink定义的多态表函数(缩写为PTF)。PTF是SQL 2016标准的一部分,是一个特殊的表函数,但可以将表作为参数。PTF是改变表形状的强大功能。因为PTF在语义上类似于表,所以它们的调用发生在SELECT语句的FROM子句中。

Windowing TVFs is a replacement of legacy Grouped Window Functions. Windowing TVFs is more SQL standard compliant and more powerful to support complex window-based computations, e.g. Window TopN, Window Join. However, Grouped Window Functions can only support Window Aggregation.
窗口化TVFs是传统Grouped Window Functions的替代品。窗口化TVFs更符合SQL标准,更强大,可以支持复杂的基于窗口的计算,例如Window TopN, Window Join。而Grouped Window Functions只能支持窗口聚合。

See more how to apply further computations based on windowing TVF:
了解更多如何基于窗口化TVF做进一步计算:

  • Window Aggregation
  • Window TopN
  • Window Join
  • Window Deduplication

Window Functions

Apache Flink provides 3 built-in windowing TVFs: TUMBLE, HOP and CUMULATE. The return value of windowing TVF is a new relation that includes all columns of original relation as well as additional 3 columns named “window_start”, “window_end”, “window_time” to indicate the assigned window. In streaming mode, the “window_time” field is a time attributes of the window. In batch mode, the “window_time” field is an attribute of type TIMESTAMP or TIMESTAMP_LTZ based on input time field type. The “window_time” field can be used in subsequent time-based operations, e.g. another windowing TVF, or interval joins, over aggregations. The value of window_time always equal to window_end - 1ms.
Apache Flink提供了3个内置窗口化TVF:TUMBLE, HOP and CUMULATE。窗口化TVF的返回值是一个新的关系,它包括原始关系的所有列以及另外三列,分别名为“window_start”, “window_end”, “window_time”,以指示指定的窗口。在流模式下,“window_time”字段是窗口的时间属性。在批处理模式中,“window_time”字段是基于输入时间字段类型的TIMESTAMP或TIMESTAMP_LTZ类型的属性。“window_time”字段可用于后续基于时间的操作,例如,另一个窗口化TVF或interval joins, over aggregations。window_time的值始终等于window_end-1ms。

TUMBLE

The TUMBLE function assigns each element to a window of specified window size. Tumbling windows have a fixed size and do not overlap. For example, suppose you specify a tumbling window with a size of 5 minutes. In that case, Flink will evaluate the current window, and a new window started every five minutes, as illustrated by the following figure.
TUMBLE函数将每个元素分配给指定窗口大小的窗口。滚动窗口具有固定大小,不会重叠。例如,假设指定一个大小为5分钟的滚动窗口。在这种情况下,Flink将评估当前窗口,每五分钟启动一个新窗口,如下图所示。

在这里插入图片描述

The TUMBLE function assigns a window for each row of a relation based on a time attribute field. In streaming mode, the time attribute field must be either event or processing time attributes. In batch mode, the time attribute field of window table function must be an attribute of type TIMESTAMP or TIMESTAMP_LTZ. The return value of TUMBLE is a new relation that includes all columns of original relation as well as additional 3 columns named “window_start”, “window_end”, “window_time” to indicate the assigned window. The original time attribute “timecol” will be a regular timestamp column after window TVF.
TUMBLE函数根据时间属性字段为关系的每一行分配一个窗口。在流模式下,时间属性字段必须是事件或处理时间属性。在批处理模式下,窗口化表函数的时间属性字段必须是TIMESTAMP或TIMESTAMP_LTZ类型的属性。TUMBLE的返回值是一个新的关系,它包括原始关系的所有列,以及名为“window_start”、“window_end”和“window_time”的额外3列,以指示指定的窗口。原始时间属性“timecol”将是窗口化TVF之后的常规时间戳列。

TUMBLE function takes three required parameters, one optional parameter:
TUMBLE函数采用三个必需参数,一个可选参数:

TUMBLE(TABLE data, DESCRIPTOR(timecol), size [, offset ])
  • data: is a table parameter that can be any relation with a time attribute column.
    data:是一个表参数,可以是与时间属性列的任何关系。
  • timecol: is a column descriptor indicating which time attributes column of data should be mapped to tumbling windows.
    timecol:是一个列描述符,指示数据的哪些时间属性列应映射到滚动窗口。
  • size: is a duration specifying the width of the tumbling windows.
    size:是指定滚动窗口宽度的持续时间。
  • offset: is an optional parameter to specify the offset which window start would be shifted by.
    offset:是一个可选参数,用于指定窗口开始偏移的偏移量。

Here is an example invocation on the Bid table:
以下是Bid表的调用示例:

-- tables must have time attribute, e.g. `bidtime` in this table
Flink SQL> desc Bid;
+-------------+------------------------+------+-----+--------+---------------------------------+
|        name |                   type | null | key | extras |                       watermark |
+-------------+------------------------+------+-----+--------+---------------------------------+
|     bidtime | TIMESTAMP(3) *ROWTIME* | true |     |        | `bidtime` - INTERVAL '1' SECOND |
|       price |         DECIMAL(10, 2) | true |     |        |                                 |
|        item |                 STRING | true |     |        |                                 |
+-------------+------------------------+------+-----+--------+---------------------------------+

Flink SQL> SELECT * FROM Bid;
+------------------+-------+------+
|          bidtime | price | item |
+------------------+-------+------+
| 2020-04-15 08:05 |  4.00 | C    |
| 2020-04-15 08:07 |  2.00 | A    |
| 2020-04-15 08:09 |  5.00 | D    |
| 2020-04-15 08:11 |  3.00 | B    |
| 2020-04-15 08:13 |  1.00 | E    |
| 2020-04-15 08:17 |  6.00 | F    |
+------------------+-------+------+

Flink SQL> SELECT * FROM TABLE(
   TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES));
-- or with the named params
-- note: the DATA param must be the first
Flink SQL> SELECT * FROM TABLE(
   TUMBLE(
     DATA => TABLE Bid,
     TIMECOL => DESCRIPTOR(bidtime),
     SIZE => INTERVAL '10' MINUTES));
+------------------+-------+------+------------------+------------------+-------------------------+
|          bidtime | price | item |     window_start |       window_end |            window_time  |
+------------------+-------+------+------------------+------------------+-------------------------+
| 2020-04-15 08:05 |  4.00 | C    | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
| 2020-04-15 08:07 |  2.00 | A    | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
| 2020-04-15 08:09 |  5.00 | D    | 2020-04-15 08:00 | 2020-04-15 08:10 | 2020-04-15 08:09:59.999 |
| 2020-04-15 08:11 |  3.00 | B    | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-04-15 08:19:59.999 |
| 2020-04-15 08:13 |  1.00 | E    | 2020-04-15 08:10 | 2020-04-15 08:20 | 2020-
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值