Flink中的Time及Windows的使用

Flink中的Time及Windows的使用

Flink中的Time类型

Event Time / Processing Time / Ingestion Time

Flink supports different notions of time in streaming programs.

img

对于Flink里面的三种时间:

  • 事件时间

    Event time: Event time is the time that each individual event occurred on its producing device. This time is typically embedded within the records before they enter Flink, and that event timestamp can be extracted from each record. In event time, the progress of time depends on the data, not on any wall clocks. Event time programs must specify how to generate Event Time Watermarks, which is the mechanism that signals progress in event time. This watermarking mechanism is described in a later section, below.

    **事件时间:**事件时间是指每个事件在其生产设备上发生的时间。 通常在记录进入Flink之前将其嵌入到记录中,并且可以从每个记录中提取事件时间戳。 在事件时间中,时间的进度取决于数据,而不取决于任何挂钟。 事件时间程序必须指定如何生成事件时间水印”,这是信号事件时间进展的机制。 此水印机制将在后面的部分以下中进行描述。

    In a perfect world, event time processing would yield completely consistent and deterministic results, regardless of when events arrive, or their ordering. However, unless the events are known to arrive in-order (by timestamp), event time processing incurs some latency while waiting for out-of-order events. As it is only possible to wait for a finite period of time, this places a limit on how deterministic event time applications can be.

    在理想情况下,事件时间处理将产生完全一致且确定的结果,而不管事件何时到达或它们的顺序如何。 但是,除非已知事件是按时间戳(按时间戳)到达的,否则事件时间处理会在等待无序事件时产生一些延迟。 由于只能等待有限的时间,因此这限制了确定性事件时间应用程序的可用性

    Assuming all of the data has arrived, event time operations will behave as expected, and produce correct and consistent results even when working with out-of-order or late events, or when reprocessing historic data. For example, an hourly event time window will contain all records that carry an event timestamp that falls into that hour, regardless of the order in which they arrive, or when they are processed. (See the section on late events for more information.)

    假设所有数据都已到达,事件时间操作将按预期方式运行,即使在处理无序或迟到事件或重新处理历史数据时,也会产生正确且一致的结果。 例如,每小时事件时间窗口将包含所有带有落入该小时事件时间戳的记录,无论它们到达的顺序或处理的时间。 (有关更多信息,请参见晚期事件部分。)

    Note that sometimes when event time programs are processing live data in real-time, they will use some processing time operations in order to guarantee that they are progressing in a timely fashion.

    请注意,有时当事件时间程序实时处理实时数据时,它们将使用一些“处理时间”操作,以确保它们及时进行。

  • 摄取时间

    Ingestion time: Ingestion time is the time that events enter Flink. At the source operator each record gets the source’s current time as a timestamp, and time-based operations (like time windows) refer to that timestamp.

    **摄取时间:**摄取时间是事件进入Flink的时间。 在源操作员处,每条记录都将源的当前时间作为时间戳记,并且基于时间的操作(例如时间窗口)引用该时间戳记。

    Ingestion time sits conceptually in between event time and processing time. Compared to processing time, it is slightly more expensive, but gives more predictable results. Because ingestion time uses stable timestamps (assigned once at the source), different window operations over the records will refer to the same timestamp, whereas in processing time each window operator may assign the record to a different window (based on the local system clock and any transport delay).

    摄取时间在概念上位于事件时间处理时间之间。 与处理时间相比,它稍微贵一点,但结果却更可预测。 由于摄取时间使用稳定的时间戳(在源处分配了一次),因此对记录的不同窗口操作将引用相同的时间戳,而在处理时间中,每个窗口操作员都可以将记录分配给不同的窗口(基于 本地系统时钟和任何传输延迟)。

    Compared to event time, ingestion time programs cannot handle any out-of-order events or late data, but the programs don’t have to specify how to generate watermarks.

    与“事件时间”相比,“摄入时间”程序不能处理任何乱序事件或迟到的数据,但是程序不必指定如何生成“水印”。

    Internally, ingestion time is treated much like event time, but with automatic timestamp assignment and automatic watermark generation.

    在内部,“摄取时间”与“事件时间”非常相似,但是具有自动时间戳分配和自动水印生成功能。

  • 处理时间

    Processing time: Processing time refers to the system time of the machine that is executing the respective operation.

    **处理时间:**处理时间是指执行相应操作的机器的系统时间。

    When a streaming program runs on processing time, all time-based operations (like time windows) will use the system clock of the machines that run the respective operator. An hourly processing time window will include all records that arrived at a specific operator between the times when the system clock indicated the full hour. For example, if an application begins running at 9:15am, the first hourly processing time window will include events processed between 9:15am and 10:00am, the next window will include events processed between 10:00am and 11:00am, and so on.

    当流式程序按处理时间运行时,所有基于时间的操作(如时间窗口)都将使用运行该操作的计算机系统时钟。 每小时处理时间窗口将包括系统时钟指示整小时的时间之间到达特定操作的所有记录。 例如,如果应用程序在9:15 am开始运行,则第一个每小时处理时间窗口将包括在9:15 am和10:00 am之间处理的事件,下一个窗口将包括在10:00 am和11:00 am之间处理的事件,依此类推。

    Processing time is the simplest notion of time and requires no coordination between streams and machines. It provides the best performance and the lowest latency. However, in distributed and asynchronous environments processing time does not provide determinism, because it is susceptible to the speed at which records arrive in the system (for example from the message queue), to the speed at which the records flow between operators inside the system, and to outages (scheduled, or otherwise).

    处理时间是最简单的时间概念,不需要流和机器之间的协调。 它提供了最佳的性能和最低的延迟。 但是,在分布式和异步环境中,处理时间不能提供确定性,因为它容易受到记录到达系统的速度(例如,从消息队列中来),记录在系统内部的操作员之间流动的速度以及中断(计划的或其他方式)的影响。

Setting a Time Characteristic

The first part of a Flink DataStream program usually sets the base time characteristic. That setting defines how data stream sources behave (for example, whether they will assign timestamps), and what notion of time should be used by window operations like KeyedStream.timeWindow(Time.seconds(30)).

The following example shows a Flink program that aggregates events in hourly time windows. The behavior of the windows adapts with the time characteristic.

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值