Flink SQL:Queries(Window Deduplication)

Flink SQL中的窗口重复数据消除是一种特殊的流式查询重复数据消除方式,仅在窗口结束时输出最终结果,提高了性能。它要求PARTITION BY包含window_start和window_end,并按时间属性(处理时间或事件时间)排序。当前限制包括不支持会话窗口和必须按事件时间排序。
摘要由CSDN通过智能技术生成

Window Deduplication

Streaming
Window Deduplication is a special Deduplication which removes rows that duplicate over a set of columns, keeping the first one or the last one for each window and partitioned keys.
窗口重复数据消除是一种特殊的重复数据消除,它删除在一组列上重复的行,为每个窗口和分区键保留第一行或最后一行。

For streaming queries, unlike regular Deduplicate on continuous tables, Window Deduplication does not emit intermediate results but only a final result at the end of the window. Moreover, window Deduplication purges all intermediate state when no longer needed. Therefore, Window Deduplication queries have better performance if users don’t need results updated per record. Usually, Window Deduplication is used with Windowing TVF directly. Besides, Window Deduplication could be used with other operations based on Windowing TVF, such as Window Aggregation, Window TopN and Window Join.
对于流式查询,与持续表上的常规重复数据消除不同,窗口重复数据消除不会在窗口结束时发出中间结果,而只发出最终结果。此外,窗口重复数据消除在不再需要时清除所有中间状态。因此,如果用户不需要按记录更新结果,则窗口重复数据消除查询具有更好的性能。通常,窗口重复数据消除直接与窗口化TVF一起使用。此外,窗口重复数据消除可以与基于窗口化TVF的其他操作一起使用,如窗口聚合、窗口TopN和窗口连接。

Window Deduplication can be defined in the same syntax as regular Deduplication, see Deduplication documentation for more information. Besides that, Window Deduplication requires the PARTITION BY clause contains window_start and window_end columns of the relation. Otherwise, the optimizer won’t be able to translate the query.
窗口重复数据消除可以用与常规重复数据消除相同的语法定义,有关详细信息,请参阅重复数据消除文档。此外,窗口重复数据消除要求PARTITION BY子句包含相关的window_start 和window_end 列。否则,优化器将无法翻译查询。

Flink uses ROW_NUMBER() to remove duplicates, just like the way of Window Top-N query. In theory, Window Deduplication is a special case of Window Top-N in which the N is one and order by the processing time or event time.
Flink使用ROW_NUMBER()来删除重复项,就像Window Top-N查询一样。理论上,窗口重复数据消除是窗口Top-N的一种特殊情况,其中N为1,按处理时间或事件时间排序。

The following shows the syntax of the Window Deduplication statement:
以下显示了Window Deduplication语句的语法:

SELECT [column_list]
FROM (
   SELECT [column_list],
     ROW_NUMBER() OVER (PARTITION BY window_start, window_end [, col_key1...]
       ORDER BY time_attr [asc|desc]) AS rownum
   FROM table_name) -- relation applied windowing TVF
WHERE (rownum = 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值