聊聊flink的BoundedOutOfOrdernessTimestampExtractor

本文深入探讨了Apache Flink中BoundedOutOfOrdernessTimestampExtractor的原理和作用。该类用于处理乱序事件,通过maxOutOfOrderness参数设定最大延迟时间,确保水印的正确生成。extractTimestamp方法从元素中提取时间戳,而getCurrentWatermark则计算并返回新的水印,保证水印的单调递增。这一机制在实时流处理中对于窗口计算至关重要。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本文研究一下Flink中的BoundedOutOfOrdernessTimestampExtractor

  • BoundedOutOfOrdernessTimestampExtractor抽象类实现AssignerWithPeriodicWatermarks接口的extractTimestamp及getCurrentWatermark方法,同时声明抽象方法extractAscendingTimestamp供子类实现
  • BoundedOutOfOrdernessTimestampExtractor的构造器接收maxOutOfOrderness参数用于指定element允许滞后(t-t_w,t为element的eventTime,t_w为前一次watermark的时间)的最大时间,在计算窗口数据时,如果超过该值则会被忽略
  • BoundedOutOfOrdernessTimestampExtractor的extractTimestamp方法会调用子类的extractTimestamp方法抽取时间,如果该时间大于currentMaxTimestamp,则更新currentMaxTimestamp;getCurrentWatermark先计算potentialWM,如果potentialWM大于等于lastEmittedWatermark则更新lastEmittedWatermark(currentMaxTimestamp - lastEmittedWatermark >= maxOutOfOrderness,这里表示lastEmittedWatermark太小了所以差值超过了maxOutOfOrderness,因而调大lastEmittedWatermark),最后返回Watermark(lastEmittedWatermark)
  • public abstract class BoundedOutOfOrdernessTimestampExtractor<T> implements AssignerWithPeriodicWatermarks<T> {
          
    	private static final long serialVersionUID = 1L;
    
    	/** The current maximum timestamp seen so far. */
    	//定义当前最大时间戳
    	private long currentMaxTimestamp;
    
    	/** The timestamp of the last emitted watermark. */
    	//最后提交的时间戳
    	private long lastEmittedWatermark = Long.MIN_VALUE;
    
    	/**
    	 * The (fixed) interval between the maximum seen timestamp seen in the records
    	 * and that of the watermark to be emitted.
    	 */
    	private final long maxOutOfOrderness;
    
    	public BoundedOutOfOrdernessTimestampExtractor(Time maxOutOfOrderness) {
    		if (maxOutOfOrderness.toMilliseconds() < 0) {
    			throw new RuntimeException("Tried to set the maximum allowed " +
    				"lateness to " + maxOutOfOrderness + ". This parameter cannot be negative.");
    		}
    		this.maxOutOfOrderness = maxOutOfOrderness.toMilliseconds();
    		this.currentMaxTimestamp = Long.MIN_VALUE + this.maxOutOfOrderness;
    	}
    
    	public long getMaxOutOfOrdernessInMillis() {
    		return maxOutOfOrderness;
    	}
    
    	/**
    	 * Extracts the timestamp from the given element.
    	 *
    	 * @param element The element that the timestamp is extracted from.
    	 * @return The new timestamp.
    	 */
    	public abstract long extractTimestamp(T element);
    
    	@Override
    	public final Watermark getCurrentWatermark() {
    		// this guarantees that the watermark never goes backwards.
    		//这个句代码保证了生成的水印是单调递增的
    		//当前最大的时间戳减去延时时间和上次最后提交的水印时间比较
    		//保留最大的时间(减去延时时间)作为水印
    		long potentialWM = currentMaxTimestamp - maxOutOfOrderness;
    		if (potentialWM >= lastEmittedWatermark) {
    			lastEmittedWatermark = potentialWM;
    		}
    		return new Watermark(lastEmittedWatermark);
    	}
    
    //提取数据中时间作为timestamp
    //如果timestamp 大于最大的currentMaxTimestamp 就把currentMaxTimestamp 置为 timestamp
    //返回当前提取到的timestamp
    	@Override
    	public final long extractTimestamp(T element, long previousElementTimestamp) {
    		long timestamp = extractTimestamp(element);
    		if (timestamp > currentMaxTimestamp) {
    			currentMaxTimestamp = timestamp;
    		}
    		return timestamp;
    	}
    }
    
    

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值