Flink 实现动态时间窗口

本文介绍了如何在Flink中实现动态更改时间窗口大小的方法,避免了需要重启和发布程序的需求。通过自定义窗口分配器,利用WindowAssigner在数据进入窗口前填充动态时间,然后在assignWindow方法中根据数据调整窗口时间,从而实现窗口大小的动态调整。
摘要由CSDN通过智能技术生成

目前常用的有基于处理时间以及基于时间时间的滑动窗口以及滚动窗口,不过这些窗口时间是固定的不可改变,当有需求要求调整时间窗口大小的时候就得重新启动,然后发布程序,这种方式在变更频率小的时候无所谓,当变更频率大的时候就很耗时间,那么接下来介绍一种可以动态更改窗口时间的方法:

Flink 窗口实现语法为:

Keyed Windows

stream
       .keyBy(...)               <-  keyed versus non-keyed windows
       .window(...)              <-  required: "assigner"
      [.trigger(...)]            <-  optional: "trigger" (else default trigger)
      [.evictor(...)]            <-  optional: "evictor" (else no evictor)
      [.allowedLateness(...)]    <-  optional: "lateness" (else zero)
      [.sideOutputLateData(...)] <-  optional: "output tag" (else no side output for late data)
       .reduce/aggregate/apply()      <-  required: "function"
      [.getSideOutput(...)]      <-  optional: "output tag"

Non-Keyed Windows

stream
       .windowAll(...)           <-  required: "assigner"
      [.trigger(...)]            <-  optional: "trigger" (else default trigger)
      [.evictor(...)]            <-  optional: "evictor" (else no evictor)
      [.allowedLateness(...)]    <-  optional: "lateness" (else zero)
      [.sideOutputLateData(...)] <-  optional: "output tag" (else no side output for late data)
       .reduce/aggregate/apply()      <-  required: "function"
      [.getSideOutput(...)]      <-  optional: "output tag"

当使用滑动窗口或者滚动窗口的时候,Flink 内部已经定义好了对应的类,只需要直接使用就行,那么我们实现窗口时间动态改变也基于写好的类进行;那么我们这次修改该是基于窗口分配器进行的。

窗口分配器定义了如何将元素分配给窗口。这是通过WindowAssigner 在window(…)【针对键控流】或windowAll()【针对非键控流】调用中指定的选项来完成的。

WindowAssigner负责将每个传入元素分配给一个或多个窗口。Flink带有针对最常见用例的预定义窗口分配器,即滚动窗口, 滑动窗口,会话窗口和全局窗口。还可以通过扩展WindowAssigner类来实现自定义窗口分配器。所有内置窗口分配器【全局窗口除外】均基于时间将元素分配给窗口,时间可以是处理时间,也可以是事件时间。

基于时间的窗口具有开始时间戳【包括端点】和结束时间戳【包括端点】,它们共同描述了窗口的大小。在代码中,Flink在使用TimeWindow基于时间的窗口时使用,该方法具有查询开始和结束时间戳记的方法maxTimestamp(),还具有返回给定窗口允许的最大时间戳的附加方法。
 

首先看的是滑动窗口代码:

public class SlidingEventTimeWindows extends WindowAssigner<Object, TimeWindow> {
    private static final long serialVersionUID = 1L;
    
    //窗口大小
    private final long size;
    //滑动步长
    private final long slide;

    private final long offset;

    protected SlidingEventTimeWindows(long size, long slide, long offset) {
        if (Math.abs(offset) >= slide || size <= 0) {
            throw new IllegalArgumentException(
                    "SlidingEventTimeWindows parameters must satisfy "
                            + "abs(offset) < slide and size > 0");
        }

        this.size = size;
        this.slide = slide;
        this.offset = offset;
    }

    //根据size以及slide去分配窗口,那么我们可以在这个地方动态调整size以及slide,实现窗口动态变化
    //我们发现,每次分配窗口的时候都会将原始的数据传进来,那么我们就可以在element上抽取动态改变的数据
    @Override
    public Collection<TimeWindow> assignWindows(
            Object element, long timestamp, WindowAssignerContext context) {
        if (timestamp > Long.MIN_VALUE) {
            List<TimeWindow> windows = new ArrayList<>((int) (size / slide));
            long lastStart = TimeWindow.getWindowStartWithOffset(timestamp, offset, slide);
            for (long start = lastStart; start > timestamp - size; start -= slide) {
                windows.add(new TimeWindow(start, start + size));
            }
            return windows;
        } else {
            throw new RuntimeException(
                    "Record has Long.MIN_VALUE timestamp (= no timestamp marker). "
                            + "Is the time characteristic set to 'ProcessingTime', or did you forget to call "
                            + "'DataStream.assignTimestampsAndWatermarks(...)'?");
        }
    }
}

改完之后的代码为:

public class DynSlidingEventTimeWindows extends WindowAssigner<Object, TimeWindow> {
    private static final long serialVersionUID = 1L;

    private final long size;

    private final long slide;

    private final long offset;
    
    //从原始数据中获取窗口长度
    private final TimeAdjustExtractor sizeTimeAdjustExtractor;
   //从原始数据中获取窗口步长
    private final TimeAdjustExtractor slideTimeAdjustExtractor;

    protected DynSlidingEventTimeWindows(long size, long slide, long offset) {
        if (Math.abs(offset) >= slide || size <= 0) {
            throw new IllegalArgumentException(
                    "SlidingEventTimeWindows parameters must satisfy "
                            + "abs(offset) < slide and size > 0");
        }

        this.size = size;
        this.slide = slide;
        this.offset = offset;
        this.sizeTimeAdjustExtractor = (elem) -> 0;
        this.slideTimeAdjustExtractor = (elem) -> 0;
    }

    protected DynSlidingEventTimeWindows(long size, long slide, long offset,TimeAdjustExtractor sizeTimeAdjustExtractor,
                                         TimeAdjustExtractor slideTimeAdjustExtractor) {
        if (Math.abs(offset) >= slide || size <= 0) {
            throw new IllegalArgumentException(
                    "SlidingEventTimeWindows parameters must satisfy "
                            + "abs(offset) < slide and size > 0");
        }

        this.size = size;
        this.slide = slide;
        this.offset = offset;
        this.sizeTimeAdjustExtractor = sizeTimeAdjustExtractor;
        this.slideTimeAdjustExtractor = slideTimeAdjustExtractor;
    }

    //每次分配窗口的时候,都从数据里面抽取窗口与步长,如果存在就将新定义的长度以及步长作为新的长度与步长,这样就实现了动态调整
    @Override
    public Collection<TimeWindow> assignWindows(
            Object element, long timestamp, WindowAssignerContext context) {
        long realSize = this.sizeTimeAdjustExtractor.extract(element);
        long realSlide = this.slideTimeAdjustExtractor.extract(element);
        if (timestamp > Long.MIN_VALUE) {
            List<TimeWindow> windows = new ArrayList<>((int) ((realSize == 0? size : realSize) / (realSlide == 0? slide:realSlide)));
            long lastStart = TimeWindow.getWindowStartWithOffset(timestamp, offset, (realSlide == 0? slide:realSlide));
            for (long start = lastStart; start > timestamp - (realSize == 0? size : realSize); start -= (realSlide == 0? slide:realSlide)) {
                windows.add(new TimeWindow(start, start + (realSize == 0? size : realSize)));
            }
            return windows;
        } else {
            throw new RuntimeException(
                    "Record has Long.MIN_VALUE timestamp (= no timestamp marker). "
                            + "Is the time characteristic set to 'ProcessingTime', or did you forget to call "
                            + "'DataStream.assignTimestampsAndWatermarks(...)'?");
        }
    }

    public long getSize() {
        return size;
    }

    public long getSlide() {
        return slide;
    }

    @Override
    public Trigger<Object, TimeWindow> getDefaultTrigger(StreamExecutionEnvironment env) {
        return EventTimeTrigger.create();
    }

    @Override
    public String toString() {
        return "SlidingEventTimeWindows(" + size + ", " + slide + ")";
    }

    /**
     * Creates a new {@code SlidingEventTimeWindows} {@link WindowAssigner} that assigns elements to
     * sliding time windows based on the element timestamp.
     *
     * @param size The size of the generated windows.
     * @param slide The slide interval of the generated windows.
     * @return The time policy.
     */
    public static DynSlidingEventTimeWindows of(Time size, Time slide) {
        return new DynSlidingEventTimeWindows(size.toMilliseconds(), slide.toMilliseconds(), 0);
    }

    /**
     * Creates a new {@code SlidingEventTimeWindows} {@link WindowAssigner} that assigns elements to
     * time windows based on the element timestamp and offset.
     *
     * <p>For example, if you want window a stream by hour,but window begins at the 15th minutes of
     * each hour, you can use {@code of(Time.hours(1),Time.minutes(15))},then you will get time
     * windows start at 0:15:00,1:15:00,2:15:00,etc.
     *
     * <p>Rather than that,if you are living in somewhere which is not using UTC±00:00 time, such as
     * China which is using UTC+08:00,and you want a time window with size of one day, and window
     * begins at every 00:00:00 of local time,you may use {@code of(Time.days(1),Time.hours(-8))}.
     * The parameter of offset is {@code Time.hours(-8))} since UTC+08:00 is 8 hours earlier than
     * UTC time.
     *
     * @param size The size of the generated windows.
     * @param slide The slide interval of the generated windows.
     * @param offset The offset which window start would be shifted by.
     * @return The time policy.
     */
    public static DynSlidingEventTimeWindows of(Time size, Time slide, Time offset) {
        return new DynSlidingEventTimeWindows(
                size.toMilliseconds(), slide.toMilliseconds(), offset.toMilliseconds());
    }


    public static DynSlidingEventTimeWindows of(Time size, Time slide, Time offset,TimeAdjustExtractor sizeTimeAdjustExtractor,
                                                TimeAdjustExtractor slideTimeAdjustExtractor) {
        return new DynSlidingEventTimeWindows(
                size.toMilliseconds(), slide.toMilliseconds(), offset.toMilliseconds(),
                sizeTimeAdjustExtractor,slideTimeAdjustExtractor);
    }


    @Override
    public TypeSerializer<TimeWindow> getWindowSerializer(ExecutionConfig executionConfig) {
        return new TimeWindow.Serializer();
    }

    @Override
    public boolean isEventTime() {
        return true;
    }
}

使用代码:

        StreamExecutionEnvironment env = FlinkEnvironment.getEnv(true,1);
        JobConfig config = new JobConfig();
        env.getConfig().setGlobalJobParameters(config.getParameterTool());
        
        SingleOutputStreamOperator<String> source = env.addSource(new FakeRecordSource(100))
                .assignTimestampsAndWatermarks(watermarkStrategy).setParallelism(2);
        //读取配置,将需要调整的时间写入每条数据
        SingleOutputStreamOperator<FakeRecordSource.TrafficRecord> resultWithAdjustMap = source .map(new AddAdjustTimeFunction);

        SingleOutputStreamOperator<FakeRecordSource.TrafficRecord> result = resultWithAdjustMap.keyBy(new KeySelector<FakeRecordSource.TrafficRecord, Integer>() {
            @Override
            public Integer getKey(FakeRecordSource.TrafficRecord s) throws Exception {
                return s.getCityId();
            }
        }).window(DynSlidingEventTimeWindows.of(Time.seconds(2), Time.seconds(1), Time.seconds(0), new TimeAdjustExtractor() {
            @Override
            public long extract(Object element) {
                return ((FakeRecordSource.TrafficRecord)element).getAdjustSize();
            }
        }, new TimeAdjustExtractor() {
            @Override
            public long extract(Object element) {
                return ((FakeRecordSource.TrafficRecord)element).getAdjustSlide();
            }
        }))
                .process(new ProcessWindowFunction<FakeRecordSource.TrafficRecord, FakeRecordSource.TrafficRecord, Integer, TimeWindow>() {
                    @Override
                    public void process(Integer integer, Context context, Iterable<FakeRecordSource.TrafficRecord> iterable, Collector<FakeRecordSource.TrafficRecord> collector) throws Exception {
                            

                    }
                }).setParallelism(2);

        result.addSink(new SinkFunction<FakeRecordSource.TrafficRecord>() {
            @Override
            public void invoke(FakeRecordSource.TrafficRecord value) throws Exception {
               
            }
        }).setParallelism(2);

        env.execute("flink-dynamic");
    }

实现原理为:

1. 在数据进入窗口之前,将每条数据填充上需要动态变更的时间
2. 数据进入窗口以后,每条数据都会调用assignWindow 这个方法,这个方法里面我们是可以从每条数据中获得到此时窗口时间相关的数据,获取到窗口相关数据之后,就可以在窗口创建的时候指定我们需要调整的时间了

这样就实现了窗口大小动态改变

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值