Flink窗口

Flink中的窗口机制是处理无限流的核心,通过窗口将无界流转化为有界流进行计算。窗口在第一个元素到达时创建,当时间超过窗口结束时间戳加上允许延迟后删除。事件时间依据事件本身携带的时间戳,处理时间则依赖于系统时钟。在完全接收数据后,基于事件时间的窗口操作能提供一致且正确的结果,即使数据出现乱序。而处理时间适合对时间精度要求不高的场景,提供更高的性能。
摘要由CSDN通过智能技术生成

window

在Flink中,StreamExecutionEnvironment是无界流,而在项目中有时会需要统计一段时间内。这个时候我们就需要用Flink中的窗口来将无界流拆分为有界流
官方定义:Windows are at the heart of processing infinite streams. Windows split the stream into “buckets” of finite size, over which we can apply computations.
窗口是处理无限流的核心。 窗口将流分隔成有限大小的“桶”,以供我们进行计算。

生命周期

官方文档:In a nutshell, a window is created as soon as the first element that should belong to this window arrives, and the window is completely removed when the time (event or processing time) passes its end timestamp plus the user-specified allowed lateness (see Allowed Lateness). Flink guarantees removal only for time-based windows and not for other types, e.g. global windows (see Window Assigners).
简单的说,一个窗口在属于此窗口的第一个元素到达时创建,窗口完全删除的条件是:时间(事件或处理时间)达到该窗口的结束时间戳,并加上用户指定的允许的延迟,窗口被完全删除(参见 Allowed Lateness)。Flink保证仅对基于时间的窗口进行删除,而不适用于其他类型的窗口,比如全局窗口(参见 窗口分配器)。
简单来说当第一个数据来的时候窗口被创建,当超过这个window size的时候窗口被删除,如果设置了延迟时间,那么窗口移除的时间将变为 结束时间加上延迟的时间

Flink中的时间

Flink中的时间

Event Time

官方文档:Event time is the time that each individual event occurred on its producing device. This time is typically embedded within the records before they enter Flink, and that event timestamp can be extracted from each record. In event time, the progress of time depends on the data, not on any wall clocks. Event time programs must specify how to generate Event Time Watermarks, which is the mechanism that signals progress in event time. This watermarking mechanism is described in a later section, below.

In a perfect world, event time processing would yield completely consistent and deterministic results, regardless of when events arrive, or their ordering. However, unless the events are known to arrive in-order (by timestamp), event time processing incurs some latency while waiting for out-of-order events. As it is only possible to wait for a finite period of time, this places a limit on how deterministic event time applications can be.

Assuming all of the data has arrived, event time operations will behave as expected, and produce correct and consistent results even when working with out-of-order or late events, or when reprocessing historic data. For example, an hourly event time window will contain all records that carry an event timestamp that falls into that hour, regardless of the order in which they arrive, or when they are processed. (See the section on late events for more information.)

在现实场景中,数据的流入是有网络延迟的,对于依赖于时间进行响应计算的业务,我们需要使用Event Time来获取这条消息的真正时间进行算子

ProcessingTime

官方文档:When a streaming program runs on processing time, all time-based operations (like time windows) will use the system clock of the machines that run the respective operator. An hourly processing time window will include all records that arrived at a specific operator between the times when the system clock indicated the full hour.
当你的流程序给予processing time运行的,所有基于时间的操作都是使用的操作系统的时间运行相关的计算。每小时处理时间窗口将是操作系统的整点之间到达的数据。
如果业务上对时间的要求不是特别的高可以使用这个,因为他的性能是最高的。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值