Flink-触发器Trigger实现自定义窗口

背景

        Flink已经默认提供了多种窗口,满足大部分的场景使用,但是针对部分场景需要更加灵活的窗口。比如在使用处理时间窗口时,虽然保证了在稳定的时间内的进行数据计算,但是在不同的窗口内的数据量可能差异较大,有可能出现由于数据量过大导致下游算子计算异常的情况,所以期望实现一种窗口,基于处理时间的窗口,但是当窗口内数据量达到一定阈值时结束当前窗口触发计算。

目标

        使用触发器Trigger实现基于处理时间与计数的自定义窗口。

描述

        Trigger决定了一个窗口何时被下游算子进行逻辑计算处理,以及决定了窗口是否销毁。Flink提供的默认的每个窗口都有不同的触发机制,当这些无法满足需求时,可以通过自定义Trigger实现自定义的窗口。

        Trigger抽象类中通过重写方法实现不同的触发逻辑,其中包括以下五个方法:

  • onElement() 方法在每条数据进入窗口时调用
  • onEventTime() 方法在触发事件时间定时器时调用
  • onProcessingTime() 方法在触发处理时间定时器时调用
  • onMerge() 方法在窗口合并是调用,在窗口合并时会将对应的Trigger状态进行合并,如会话窗口的实现。
  • clear() 方法在窗口被清理是被调用

        前三个方法是窗口触发逻辑的核心方法,在这三个方法执行后窗口都会根据方法的返回值进行判断下一步操作。方法返回值都为TriggerResult枚举类型,该枚举类型共包含四个枚举值:

  • CONTINUE :不进行任何操作
  • FIRE:触发窗口计算(即触发下游算子的逻辑处理),但窗口继续保留,所有元素也都保留。
  • PURGE:清除窗口中的所有元素并丢弃窗口,不触发窗口计算或释放任何元素。
  • FIRE_AND_PURGE:先触发窗口计算,输出结果,再清除窗口内所有元素,丢弃窗口。

        结合上述的状态简单描述下当前设计自定义触发器,便于理解

        在窗口接收到每一条数据时,首先判断是否是当前窗口的第一条数据(或者说是否还没有创建过定时器),若是则创建当前窗口关闭时的定时器,继续判断下是否当前窗口的数据量已经满足设定的阈值,若是则创建下个窗口结束时的定时器,并触发计算,关闭当前窗口。若窗口内数据量在定时器时还没有达到设定阈值,触发器触发时则创建下个窗口结束时的定时器,并触发计算,关闭当前窗口。详细逻辑见下面代码

代码

package trigger;

import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.streaming.api.windowing.triggers.Trigger;
import org.apache.flink.streaming.api.windowing.triggers.TriggerResult;
import org.apache.flink.streaming.api.windowing.windows.Window;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class CountOrTimeTrigger<W extends Window> extends Trigger<Object, W> {
    private final long maxCount;
    private final Time interval;
    private long nextTimerTime;
    private boolean hasIntervalTimer = false;
    private final static Logger logger = LoggerFactory.getLogger(CountOrTimeTrigger.class);

    public CountOrTimeTrigger(long maxCount, Time interval) {
        this.maxCount = maxCount;
        this.interval = interval;
    }

    //定义状态描述器
    private final ReducingStateDescriptor<Long> countDescriptor = new ReducingStateDescriptor<>("count", new Sum(), Long.class);

    /**
     * 创建自定义触发器
     *
     * @param maxCount 最大数量
     * @return CountOrTimeTrigger
     */
    public static <W extends Window> CountOrTimeTrigger<W> of(long maxCount, Time interval) {
        return new CountOrTimeTrigger<>(maxCount, interval);
    }

    @Override
    public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception {
        if (!hasIntervalTimer) {
            registerNextTimer(ctx, interval);
        }
        ReducingState<Long> countState = ctx.getPartitionedState(countDescriptor);
        //计数状态加1
        countState.add(1L);
        if (countState.get() >= maxCount) {
//            logger.info("maxCount达到阈值");
            countState.clear();
//            logger.info("删除计时器:" + nextTimerTime);
            ctx.deleteProcessingTimeTimer(nextTimerTime);
            registerNextTimer(ctx, interval);
//            logger.info("执行计算");
            return TriggerResult.FIRE_AND_PURGE;
        }
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
//        logger.info(time + ",定时器执行");
        registerNextTimer(ctx, interval);
        ReducingState<Long> countState = ctx.getPartitionedState(countDescriptor);
        Long count = countState.get();
        if (count == null) {
//            logger.info("无事发生");
            return TriggerResult.CONTINUE;
        } else {
//            logger.info("窗口结束时间触发计算,time:" + time + ",count:" + count);
            countState.clear();
            return TriggerResult.FIRE_AND_PURGE;
        }
    }

    @Override
    public TriggerResult onEventTime(long time, W window, TriggerContext ctx) {
        return TriggerResult.CONTINUE;
    }

    @Override
    public void clear(W window, TriggerContext ctx) {
//        logger.info("窗口清理");
        ctx.deleteProcessingTimeTimer(nextTimerTime);
        ctx.getPartitionedState(countDescriptor).clear();
    }

    private static class Sum implements ReduceFunction<Long> {
        @Override
        public Long reduce(Long value1, Long value2) {
            return value1 + value2;
        }
    }

    private void registerNextTimer(TriggerContext ctx, Time time) {
        nextTimerTime = System.currentTimeMillis() + time.toMilliseconds();
        ctx.registerProcessingTimeTimer(nextTimerTime);
        hasIntervalTimer = true;
//        logger.info("创建定时器,time:" + nextTimerTime);
    }
}
package app;

import com.alibaba.fastjson.JSONObject;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessAllWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.GlobalWindows;
import org.apache.flink.streaming.api.windowing.windows.GlobalWindow;
import org.apache.flink.util.Collector;
import trigger.CountOrTimeTrigger;

import java.util.ArrayList;
import java.util.List;

public class CountOrTimeTriggerTest {

    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.socketTextStream("localhost", 9091)
                .windowAll(GlobalWindows.create())
                .trigger(new CountOrTimeTrigger<>(5, Time.seconds(3)))
                .process(new ProcessAllWindowFunction<String, String, GlobalWindow>() {
                    @Override
                    public void process(Context context, Iterable<String> elements, Collector<String> out) throws Exception {
                        System.out.println("数据集处理开始:");
                        List<String> dataList = new ArrayList<>();
                        for (String e : elements) {
                            dataList.add(e);
                        }
                        JSONObject result = new JSONObject();
                        result.put("data_size", dataList.size());
                        result.put("data_list", dataList);
                        out.collect(result.toJSONString());
                        System.out.println("数据集处理结束……");
                    }
                })
                .print("print_sink");
        env.execute();
    }
}

官方文档

地址:窗口 | Apache Flink

Triggers #

Trigger determines when a window (as formed by the window assigner) is ready to be processed by the window function. Each WindowAssigner comes with a default Trigger. If the default trigger does not fit your needs, you can specify a custom trigger using trigger(...).

The trigger interface has five methods that allow a Trigger to react to different events:

  • The onElement() method is called for each element that is added to a window.
  • The onEventTime() method is called when a registered event-time timer fires.
  • The onProcessingTime() method is called when a registered processing-time timer fires.
  • The onMerge() method is relevant for stateful triggers and merges the states of two triggers when their corresponding windows merge, e.g. when using session windows.
  • Finally the clear() method performs any action needed upon removal of the corresponding window.

Two things to notice about the above methods are:

  1. The first three decide how to act on their invocation event by returning a TriggerResult. The action can be one of the following:
  • CONTINUE: do nothing,
  • FIRE: trigger the computation,
  • PURGE: clear the elements in the window, and
  • FIRE_AND_PURGE: trigger the computation and clear the elements in the window afterwards.
  1. Any of these methods can be used to register processing- or event-time timers for future actions.

Fire and Purge #

Once a trigger determines that a window is ready for processing, it fires, i.e., it returns FIRE or FIRE_AND_PURGE. This is the signal for the window operator to emit the result of the current window. Given a window with a ProcessWindowFunction all elements are passed to the ProcessWindowFunction (possibly after passing them to an evictor). Windows with ReduceFunction, or AggregateFunction simply emit their eagerly aggregated result.

When a trigger fires, it can either FIRE or FIRE_AND_PURGE. While FIRE keeps the contents of the window, FIRE_AND_PURGE removes its content. By default, the pre-implemented triggers simply FIRE without purging the window state.

Purging will simply remove the contents of the window and will leave any potential meta-information about the window and any trigger state intact.

Default Triggers of WindowAssigners #

The default Trigger of a WindowAssigner is appropriate for many use cases. For example, all the event-time window assigners have an EventTimeTrigger as default trigger. This trigger simply fires once the watermark passes the end of a window.

The default trigger of the GlobalWindow is the NeverTrigger which does never fire. Consequently, you always have to define a custom trigger when using a GlobalWindow.

By specifying a trigger using  trigger() you are overwriting the default trigger of a  WindowAssigner. For example, if you specify a  CountTrigger for  TumblingEventTimeWindows you will no longer get window firings based on the progress of time but only by count. Right now, you have to write your own custom trigger if you want to react based on both time and count.

Built-in and Custom Triggers #

Flink comes with a few built-in triggers.

  • The (already mentioned) EventTimeTrigger fires based on the progress of event-time as measured by watermarks.
  • The ProcessingTimeTrigger fires based on processing time.
  • The CountTrigger fires once the number of elements in a window exceeds the given limit.
  • The PurgingTrigger takes as argument another trigger and transforms it into a purging one.

If you need to implement a custom trigger, you should check out the abstract Trigger class. Please note that the API is still evolving and might change in future versions of Flink.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值