聊聊flink的Evictors

本文主要研究一下flink的Evictors

Evictor

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/Evictor.java

@PublicEvolving
public interface Evictor<T, W extends Window> extends Serializable {

	void evictBefore(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext);

	void evictAfter(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext);

	interface EvictorContext {

		long getCurrentProcessingTime();

		MetricGroup getMetricGroup();

		long getCurrentWatermark();
	}
}
  • Evictor接收两个泛型,一个是element的类型,一个是窗口类型;它定义了evictBefore(在windowing function之前)、evictAfter(在windowing function之后)两个方法,它们都有EvictorContext参数;EvictorContext定义了getCurrentProcessingTime、getMetricGroup、getCurrentWatermark方法

CountEvictor

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/CountEvictor.java

@PublicEvolving
public class CountEvictor<W extends Window> implements Evictor<Object, W> {
	private static final long serialVersionUID = 1L;

	private final long maxCount;
	private final boolean doEvictAfter;

	private CountEvictor(long count, boolean doEvictAfter) {
		this.maxCount = count;
		this.doEvictAfter = doEvictAfter;
	}

	private CountEvictor(long count) {
		this.maxCount = count;
		this.doEvictAfter = false;
	}

	@Override
	public void evictBefore(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
		if (!doEvictAfter) {
			evict(elements, size, ctx);
		}
	}

	@Override
	public void evictAfter(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
		if (doEvictAfter) {
			evict(elements, size, ctx);
		}
	}

	private void evict(Iterable<TimestampedValue<Object>> elements, int size, EvictorContext ctx) {
		if (size <= maxCount) {
			return;
		} else {
			int evictedCount = 0;
			for (Iterator<TimestampedValue<Object>> iterator = elements.iterator(); iterator.hasNext();){
				iterator.next();
				evictedCount++;
				if (evictedCount > size - maxCount) {
					break;
				} else {
					iterator.remove();
				}
			}
		}
	}

	public static <W extends Window> CountEvictor<W> of(long maxCount) {
		return new CountEvictor<>(maxCount);
	}

	public static <W extends Window> CountEvictor<W> of(long maxCount, boolean doEvictAfter) {
		return new CountEvictor<>(maxCount, doEvictAfter);
	}
}
  • CountEvictor实现了Evictor接口,其中element类型为Object;它有两个属性,分别是doEvictAfter、maxCount;其中doEvictAfter用于指定是使用evictBefore方法还是evictAfter方法;maxCount为窗口元素个数的阈值,超出则删掉

DeltaEvictor

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/DeltaEvictor.java

@PublicEvolving
public class DeltaEvictor<T, W extends Window> implements Evictor<T, W> {
	private static final long serialVersionUID = 1L;

	DeltaFunction<T> deltaFunction;
	private double threshold;
	private final boolean doEvictAfter;

	private DeltaEvictor(double threshold, DeltaFunction<T> deltaFunction) {
		this.deltaFunction = deltaFunction;
		this.threshold = threshold;
		this.doEvictAfter = false;
	}

	private DeltaEvictor(double threshold, DeltaFunction<T> deltaFunction, boolean doEvictAfter) {
		this.deltaFunction = deltaFunction;
		this.threshold = threshold;
		this.doEvictAfter = doEvictAfter;
	}

	@Override
	public void evictBefore(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext ctx) {
		if (!doEvictAfter) {
			evict(elements, size, ctx);
		}
	}

	@Override
	public void evictAfter(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext ctx) {
		if (doEvictAfter) {
			evict(elements, size, ctx);
		}
	}

	private void evict(Iterable<TimestampedValue<T>> elements, int size, EvictorContext ctx) {
		TimestampedValue<T> lastElement = Iterables.getLast(elements);
		for (Iterator<TimestampedValue<T>> iterator = elements.iterator(); iterator.hasNext();){
			TimestampedValue<T> element = iterator.next();
			if (deltaFunction.getDelta(element.getValue(), lastElement.getValue()) >= this.threshold) {
				iterator.remove();
			}
		}
	}

	@Override
	public String toString() {
		return "DeltaEvictor(" +  deltaFunction + ", " + threshold + ")";
	}

	public static <T, W extends Window> DeltaEvictor<T, W> of(double threshold, DeltaFunction<T> deltaFunction) {
		return new DeltaEvictor<>(threshold, deltaFunction);
	}

	public static <T, W extends Window> DeltaEvictor<T, W> of(double threshold, DeltaFunction<T> deltaFunction, boolean doEvictAfter) {
		return new DeltaEvictor<>(threshold, deltaFunction, doEvictAfter);
	}
}
  • DeltaEvictor实现了Evictor接口,它有三个属性,分别是doEvictAfter、threshold、deltaFunction;其中doEvictAfter用于指定是使用evictBefore方法还是evictAfter方法;threshold为阈值,如果deltaFunction.getDelta方法(每个element与lastElement计算delta)算出来的值大于等于该值,则需要移除该元素

TimeEvictor

flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/TimeEvictor.java

@PublicEvolving
public class TimeEvictor<W extends Window> implements Evictor<Object, W> {
	private static final long serialVersionUID = 1L;

	private final long windowSize;
	private final boolean doEvictAfter;

	public TimeEvictor(long windowSize) {
		this.windowSize = windowSize;
		this.doEvictAfter = false;
	}

	public TimeEvictor(long windowSize, boolean doEvictAfter) {
		this.windowSize = windowSize;
		this.doEvictAfter = doEvictAfter;
	}

	@Override
	public void evictBefore(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
		if (!doEvictAfter) {
			evict(elements, size, ctx);
		}
	}

	@Override
	public void evictAfter(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) {
		if (doEvictAfter) {
			evict(elements, size, ctx);
		}
	}

	private void evict(Iterable<TimestampedValue<Object>> elements, int size, EvictorContext ctx) {
		if (!hasTimestamp(elements)) {
			return;
		}

		long currentTime = getMaxTimestamp(elements);
		long evictCutoff = currentTime - windowSize;

		for (Iterator<TimestampedValue<Object>> iterator = elements.iterator(); iterator.hasNext(); ) {
			TimestampedValue<Object> record = iterator.next();
			if (record.getTimestamp() <= evictCutoff) {
				iterator.remove();
			}
		}
	}

	private boolean hasTimestamp(Iterable<TimestampedValue<Object>> elements) {
		Iterator<TimestampedValue<Object>> it = elements.iterator();
		if (it.hasNext()) {
			return it.next().hasTimestamp();
		}
		return false;
	}

	private long getMaxTimestamp(Iterable<TimestampedValue<Object>> elements) {
		long currentTime = Long.MIN_VALUE;
		for (Iterator<TimestampedValue<Object>> iterator = elements.iterator(); iterator.hasNext();){
			TimestampedValue<Object> record = iterator.next();
			currentTime = Math.max(currentTime, record.getTimestamp());
		}
		return currentTime;
	}

	@Override
	public String toString() {
		return "TimeEvictor(" + windowSize + ")";
	}

	@VisibleForTesting
	public long getWindowSize() {
		return windowSize;
	}

	public static <W extends Window> TimeEvictor<W> of(Time windowSize) {
		return new TimeEvictor<>(windowSize.toMilliseconds());
	}

	public static <W extends Window> TimeEvictor<W> of(Time windowSize, boolean doEvictAfter) {
		return new TimeEvictor<>(windowSize.toMilliseconds(), doEvictAfter);
	}
}
  • TimeEvictor实现了Evictor接口,其中element类型为Object;它有两个属性,分别是doEvictAfter、windowSize;其中doEvictAfter用于指定是使用evictBefore方法还是evictAfter方法;windowSize用于指定窗口的时间长度,以窗口元素最大时间戳-windowSize为evictCutoff,所有timestamp小于等于evictCutoff的元素都将会被剔除

小结

  • Evictor接收两个泛型,一个是element的类型,一个是窗口类型;它定义了evictBefore(在windowing function之前)、evictAfter(在windowing function之后)两个方法,它们都有EvictorContext参数;EvictorContext定义了getCurrentProcessingTime、getMetricGroup、getCurrentWatermark方法
  • Evictor有几个内置的实现类,分别是CountEvictor、DeltaEvictor、TimeEvictor;其中CountEvictor是按窗口元素个数来进行剔除,TimeEvictor是按窗口长度来进行剔除,DeltaEvictor则是根据窗口元素与lastElement的delta与指定的threshold对比来进行剔除
  • 如果指定了evictor(evictBefore)则会妨碍任何pre-aggregation操作,因为所有的窗口元素都会在windowing function计算之前先执行evictor操作;另外就是flink不保障窗口元素的顺序,也就是evictor如果有按窗口开头或末尾剔除元素,可能剔除的元素实际上并不是最先或最后达到的

doc

转载于:https://my.oschina.net/go4it/blog/2998188

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值