Flink window Trigger

思考

flink 默认窗口Trigger,当数据不发送时,会自动关闭吗?

先说结论:当数据不发送时,不会自动关闭窗口,会一直等待数据;

感兴趣可看下方分析过程

问题分析

上述问题是使用flink 1.16.1 过程中一些疑惑点,以此记录;

进入正题window的使用方法为

stream
       .keyBy(...)               <-  keyed versus non-keyed windows
       .window(...)              <-  required: "assigner"
      [.trigger(...)]            <-  optional: "trigger" (else default trigger)
      [.evictor(...)]            <-  optional: "evictor" (else no evictor)
      [.allowedLateness(...)]    <-  optional: "lateness" (else zero)
      [.sideOutputLateData(...)] <-  optional: "output tag" (else no side output for late data)
       .reduce/aggregate/apply()      <-  required: "function"
      [.getSideOutput(...)]      <-  optional: "output tag"

可以看到Trigger为可选项,当不指定会走默认的,以滑动窗口为例,可从源码中看到默认为EventTimeTrigger

在这里插入图片描述

这样只需要查看EventTimeTrigger 逻辑即可

@PublicEvolving
public class EventTimeTrigger extends Trigger<Object, TimeWindow> {
    private static final long serialVersionUID = 1L;

    private EventTimeTrigger() {}

    @Override
    public TriggerResult onElement(
            Object element, long timestamp, TimeWindow window, TriggerContext ctx)
            throws Exception {
        if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
            // if the watermark is already past the window fire immediately
            return TriggerResult.FIRE;
        } else {
            ctx.registerEventTimeTimer(window.maxTimestamp());
            return TriggerResult.CONTINUE;
        }
    }

    @Override
    public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) {
        return time == window.maxTimestamp() ? TriggerResult.FIRE : TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx)
            throws Exception {
        return TriggerResult.CONTINUE;
    }

    @Override
    public void clear(TimeWindow window, TriggerContext ctx) throws Exception {
        ctx.deleteEventTimeTimer(window.maxTimestamp());
    }

    @Override
    public boolean canMerge() {
        return true;
    }

    @Override
    public void onMerge(TimeWindow window, OnMergeContext ctx) {
        // only register a timer if the watermark is not yet past the end of the merged window
        // this is in line with the logic in onElement(). If the watermark is past the end of
        // the window onElement() will fire and setting a timer here would fire the window twice.
        long windowMaxTimestamp = window.maxTimestamp();
        if (windowMaxTimestamp > ctx.getCurrentWatermark()) {
            ctx.registerEventTimeTimer(windowMaxTimestamp);
        }
    }

    @Override
    public String toString() {
        return "EventTimeTrigger()";
    }

    /**
     * Creates an event-time trigger that fires once the watermark passes the end of the window.
     *
     * <p>Once the trigger fires all elements are discarded. Elements that arrive late immediately
     * trigger window evaluation with just this one element.
     */
    public static EventTimeTrigger create() {
        return new EventTimeTrigger();
    }
}

看到源码,会出现疑惑,这些方法的作用,执行顺序是什么??带着这些疑惑 首先去查看了官网,有这样一段描述(https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/dev/datastream/operators/windows/#triggers):
在这里插入图片描述

通过官网的资料,大致可以看出每个方法的作用,

  • onElement() 方法在每个元素被加入窗口时调用。
  • onEventTime() 方法在注册的 event-time timer 触发时调用。
  • onProcessingTime() 方法在注册的 processing-time timer 触发时调用。
  • onMerge() 方法与有状态的 trigger 相关。该方法会在两个窗口合并时, 将窗口对应 trigger 的状态进行合并,比如使用会话窗口时。
  • 最后,clear() 方法处理在对应窗口被移除时所需的逻辑。

结合源码,可以看出 onElement() 每个数据经过,都会执行,在执行过程中对比当前数据时间戳与窗口最大时间戳,从而进行registerEventTimeTimer(注册完后,会注册到internalTimerService中,后续时间到达后会执行onEventTime方法)

综合上述可以看出,EventTimeTrigger 当数据不发送时,不会自动关闭窗口,会一直等待数据

测试样例

附上代码可自行验证

import cn.hutool.core.date.DateUtil;
import com.tigerclub.grassland.entity.Word;
import com.tigerclub.grassland.function.AggWindowFunction;
import com.tigerclub.grassland.function.SumAggFunction;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.assigners.SlidingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
import java.time.Duration;

public class WindowOperator {
    /**
     * test,2023-05-29 15:24:07
     * test,2023-05-29 15:27:07
     * test,2023-05-29 15:31:07
     * test,2023-05-29 15:36:07
     *
     *
     * test,2024-01-19 13:43:07
     */

    public static void main(String[] args) {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);
        DataStreamSource<String> source = env.socketTextStream("127.0.0.1", 9999);
        source.print("source>>>>>>");
        SingleOutputStreamOperator<Word> aggregate = source.flatMap(new Splitter()).assignTimestampsAndWatermarks(
                WatermarkStrategy.
                        <Word>forBoundedOutOfOrderness(Duration.ofSeconds(1))
                        .withTimestampAssigner((word, timestamp) -> word.getTime())
        ).keyBy(word -> word.getName())
                //.window(TumblingEventTimeWindows.of(Time.minutes(5)))
                .window(SlidingEventTimeWindows.of(Time.minutes(1),Time.seconds(2)))
//                .trigger(new EventTimeOrIntervalTrigger("78722"))
                .aggregate(new SumAggFunction(), new AggWindowFunction());
        aggregate.print(">>>>>>>>>>>>>>");
        try {
            env.execute("测试");
        } catch (Exception e) {
            e.printStackTrace();
        }

    }


    public static class Splitter implements FlatMapFunction<String, Word> {

        @Override
        public void flatMap(String sentence, Collector<Word> out) throws Exception {
            String[] word = sentence.split(",");
            out.collect(new Word(word[0],word[1], DateUtil.parse(word[1], "yyyy-MM-dd HH:mm:ss").getTime(), 1L));

        }
    }
}

下方图片可以看出,远超该触发时间,窗口却未进行计算

在这里插入图片描述

为解决此问题需要自定义Trigger,此处定义超过窗口长度1.2倍执行ProcessTimeTimer,来进行窗口关闭

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.tigerclub.grassland.function;

import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.streaming.api.windowing.triggers.Trigger;
import org.apache.flink.streaming.api.windowing.triggers.TriggerResult;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;

import java.io.IOException;

/**
 * A {@link Trigger} that fires once the watermark passes the end of the window
 * to which a pane belongs.
 *
 * @see org.apache.flink.streaming.api.watermark.Watermark
 */
@PublicEvolving
public class EventTimeOrIntervalTrigger extends Trigger<Object, TimeWindow> {
  private static final long serialVersionUID = 1L;

  ValueStateDescriptor<Integer> triggerStateDescriptor;

  public EventTimeOrIntervalTrigger(String instanceId) {
    triggerStateDescriptor = new ValueStateDescriptor<Integer>("triggerState" + instanceId, Types.INT);
  }

  @Override
  public TriggerResult onElement(Object element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception {
    ValueState<Integer> triggerState = ctx.getPartitionedState(triggerStateDescriptor);
    Integer state = triggerState.value();
    if (state == null) {// first element in this window
      long processTriggerTime = ctx.getCurrentProcessingTime() + (long) ((window.getEnd() - window.getStart()) * 1.2);
      ctx.registerProcessingTimeTimer(processTriggerTime);
      ctx.registerEventTimeTimer(window.maxTimestamp());

      triggerState.update(1);
      return TriggerResult.CONTINUE;
    }


    if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
      // if the watermark is already past the window fire immediately
      return TriggerResult.FIRE;
    }

    return TriggerResult.CONTINUE;
  }

  @Override
  public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) throws IOException {
    if (time == window.maxTimestamp()) {
      ValueState<Integer> triggerState = ctx.getPartitionedState(triggerStateDescriptor);
      if (triggerState.value() != null && triggerState.value() != 2) {
        triggerState.update(3);
        return TriggerResult.FIRE;
      }
    }
    return TriggerResult.CONTINUE;
  }

  @Override
  public TriggerResult onProcessingTime(long time, TimeWindow window, TriggerContext ctx) throws Exception {
    ValueState<Integer> triggerState = ctx.getPartitionedState(triggerStateDescriptor);
    if (triggerState != null && triggerState.value() != 3) {
      triggerState.update(2);
      return TriggerResult.FIRE_AND_PURGE;
    } else {
      return TriggerResult.CONTINUE;
    }
  }

  @Override
  public void clear(TimeWindow window, TriggerContext ctx) throws Exception {
    ctx.deleteEventTimeTimer(window.maxTimestamp());
  }

  @Override
  public boolean canMerge() {
    return true;
  }

  @Override
  public void onMerge(TimeWindow window,
                      OnMergeContext ctx) {
    // only register a timer if the watermark is not yet past the end of the merged window
    // this is in line with the logic in onElement(). If the watermark is past the end of
    // the window onElement() will fire and setting a timer here would fire the window twice.
    long windowMaxTimestamp = window.maxTimestamp();
    if (windowMaxTimestamp > ctx.getCurrentWatermark()) {
      ctx.registerEventTimeTimer(windowMaxTimestamp);
    }
  }

  @Override
  public String toString() {
    return "EventTimeTrigger()";
  }

  /**
   * Creates an event-time trigger that fires once the watermark passes the end of the window.
   *
   * <p>Once the trigger fires all elements are discarded. Elements that arrive late immediately
   * trigger window evaluation with just this one element.
   */
  public static EventTimeOrIntervalTrigger create(String instanceId) {
    return new EventTimeOrIntervalTrigger(instanceId);
  }
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值