flink的三种时间机制

api提供的类如下:
org.apache.flink.streaming.api.TimeCharacteristic
时间特性定义了系统如何为与时间相关的顺序和依赖于时间的操作(例如时间窗口)确定时间。
flink有三种时间机制:

  1. 时间生成时间/事件时间(event time)
    Event time means that the time of each individual element in the stream (also called event) is determined by the event’s individual custom timestamp. These timestamps either exist in the elements from before they entered the Flink streaming dataflow, or are user-assigned at the sources. The big implication of this is that it allows for elements to arrive in the sources and in all operators out of order, meaning that elements with earlier timestamps may arrive after elements with later timestamps. Operators that window or order data with respect to event time must buffer data until they can be sure that all timestamps for a certain time interval have been received. This is handled by the so called “time watermarks”.
    事件时间意味着流中每个单独元素的时间(也称为事件)由事件的单独自定义时间戳决定。这些时间戳要么在它们进入Flink流数据流之前存在于元素中,要么在源中由用户分配。这样做的主要含义是,它允许元素以无序的顺序到达源和所有操作符,这意味着时间戳较早的元素可能会到达时间戳较晚的元素之后。与事件时间相关的窗口或订单数据的操作符必须缓冲数据,直到它们能够确保已接收到特定时间间隔的所有时间戳。这是由所谓的“时间水印”处理的。
    Operations based on event time are very predictable - the result of windowing operations is typically identical no matter when the window is executed and how fast the streams operate. At the same time, the buffering and tracking of event time is also costlier than operating with processing time, and typically also introduces more latency. The amount of extra cost depends mostly on how much out of order the elements arrive, i.e., how long the time span between the arrival of early and late elements is. With respect to the “time watermarks”, this means that the cost typically depends on how early or late the watermarks can be generated for their timestamp.
    In relation to {@link #IngestionTime}, the event time is similar, but refers the the event’s original time, rather than the time assigned at the data source. Practically, that means that event time has generally more meaning, but also that it takes longer to determine that all elements for a certain time have arrived.
    基于事件时间的操作是非常可预测的——窗口操作的结果通常是相同的,无论何时执行窗口和流操作的速度有多快。与此同时,事件时间的缓冲和跟踪也比操作的处理时间更长,而且通常还会引入更多的延迟,且额外成本的多少主要取决于元素到达时有多少混乱,即早到达元素和晚到达元素到达之间的时间跨度有多长。对于“时间水印”,这意味着成本通常取决于为其时间戳生成水印的时间早或晚。
    事件时间与接入时间类似,但引用的是事件的原始时间,而不是在数据源分配的时间。实际上,这意味着事件时间通常具有更大的意义,但也意味着需要更长的时间来确定所有元素在某一段时间内已经到达。

  2. 事件接入时间/接入时间(ingestion time)
    Ingestion time means that the time of each individual element in the stream is determined when the element enters the Flink streaming data flow. Operations like windows group the elements based on that time, meaning that processing speed within the streaming dataflow does not affect windowing, but only the speed at which sources receive elements.
    接入时间意味着在每个单独元素进入Flink流数据流时的时间。像时间窗口这样基于那个时间对元素进行分组聚合的操作,这意味着流数据流中的处理速度不影响窗口,而只受源接收元素速度的影响。
    Ingestion time is often a good compromise between processing time and event time. It does not need and special manual form of watermark generation, and events are typically not too much outor-order when they arrive at operators; in fact, out-of-orderness can only be introduced by streaming shuffles or split/join/union operations. The fact that elements are not very much out-of-order means that the latency increase is moderate, compared to event time.
    接入时间通常是处理时间和事件时间之间的一个很好的折衷。它不需要和特殊的手动形式的水印生成,事件通常到达算子/operator时并不会打乱很多顺序;事实上,接入时间只会因为通过streaming shuffles/split/join/union操作使数据变得无序。元素没有过度无序这一事实意味着,与事件时间相比,延迟的增加是适度的更少的。

  3. 事件处理时间(processing time)
    Processing time for operators means that the operator uses the system clock of the machine to determine the current time of the data stream. Processing-time windows trigger based on wall-clock time and include whatever elements happen to have arrived at the operator at that point in time.
    Using processing time for window operations results in general in quite non-deterministic results, because the contents of the windows depends on the speed in which elements arrive.
    It is, however, the cheapest method of forming windows and the method that introduces the least latency.
    处理时间意味着算子/operator使用机器的系统时钟来确定数据流的当前时间。处理时间窗口的触发器基于机器系统时间,包括任何碰巧在那个时间点到达算子/operator的元素。
    对窗口操作使用处理时间通常会导致相当不确定的结果,因为窗口的内容取决于元素到达的速度。
    然而,它是形成窗口的成本最低的方法,也是引入最小延迟的方法。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值