前言
之前看到GlobalWindow需要自己定义trigger,写了个测试用例简单实现了下。
背景
前面文章讲到了窗口,在窗口中我们一般都会去使用api中定义好的滑动滚动窗口等等。但在一些特殊场景下,我们需要自定义去实现窗口的定义以及窗口的触发。
举个例子:如何去实现1min窗口的每10s输出一次该窗口的值。比如在10:00-10:10中每隔10s输出这个窗口的总和。
Trigger
今天主要讲下以下三个方法:
/**
* Called for every element that gets added to a pane. The result of this will determine
* whether the pane is evaluated to emit results.
*
* @param element The element that arrived.
* @param timestamp The timestamp of the element that arrived.
* @param window The window to which the element is being added.
* @param ctx A context object that can be used to register timer callbacks.
*/
public abstract TriggerResult onElement(T element, long timestamp, W window, TriggerContext ctx) throws Exception;
/**
* Called when a processing-time timer that was set using the trigger context fires.
*
* @param time The timestamp at which the timer fired.
* @param window The window for which the timer fired.
* @param ctx A context object that can be used to register timer callbacks.
*/
public abstract TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception;
/**
* Called when an event-time timer that was set using the trigger context fires.
*
* @param time The timestamp at which the timer fired.
* @param window The window for which the timer fired.
* @param ctx A context object that can be used to register timer callbacks.
*/
public abstract TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception;
onElement: 每条数据进来处理一下。参数 timestamp: 应该是assignTimestampsAndWatermarks设置的时间戳,没设置的话不能用哈
onProcessingTime:触发processtimer进来,time为触发的时间 。
onEventTime:同上。
测试代码:
package com.realtime.flink.trigger;
import com.realtime.flink.dto.OrderDto;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.triggers.Trigger;
import org.apache.flink.streaming.api.windowing.triggers.TriggerResult;
import org.apache.flink.streaming.api.windowing.windows.Window;
// 实现在1min中窗口内,每10s输出一次结果
public class MyTrigger extends Trigger<OrderDto,Window> {
// 1min窗口
private Long windowStep = 60000L;
// 10s触发
private Long outputStep = 10000L;
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("count", new Sum(), LongSerializer.INSTANCE);
// 每条数据进来
public TriggerResult onElement(OrderDto element, long timestamp, Window window, TriggerContext ctx) throws Exception {
timestamp = element.getOrderTime();
ReducingState<Long> reducerState = ctx.getPartitionedState(stateDesc);
if(reducerState.get()==null){
// 第一次进入,设置start为起始的触发时间
long start = timestamp - timestamp%outputStep;
// nextfire为下一次触发时间
long nextFire = outputStep+start;
reducerState.add(nextFire);
System.out.println("SSSSSSSSSSSSSS"+timestamp+"-->"+nextFire);
ctx.registerProcessingTimeTimer(nextFire);
return TriggerResult.CONTINUE;
}
return TriggerResult.CONTINUE;
}
// 触发eventtime窗口
public TriggerResult onProcessingTime(long time, Window window, TriggerContext ctx) throws Exception {
ReducingState<Long> reducerState = ctx.getPartitionedState(stateDesc);
if(time==reducerState.get()){
long nextFire = outputStep+time;
reducerState.add(outputStep);
ctx.registerProcessingTimeTimer(nextFire);
System.out.println("KKKKKKKKKKKKKKKK"+nextFire+"-->"+time%windowStep);
// 判断是否需要清空窗口,比如到了10:01:00时需要清空触发并窗口
if(time%windowStep==0){
// 触发并清空窗口数据
return TriggerResult.FIRE_AND_PURGE;
}
//
return TriggerResult.FIRE;
}
return null;
}
// 触发processtime的窗口
public TriggerResult onEventTime(long time, Window window, TriggerContext ctx) throws Exception {
return null;
}
//
public void clear(Window window, TriggerContext ctx) throws Exception {
}
private static class Sum implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L;
@Override
public Long reduce(Long value1, Long value2) throws Exception {
return value1 + value2;
}
}
}
测试类:
package com.realtime.flink.test
import java.time.Duration
import com.realtime.flink.dto.OrderDto
import com.realtime.flink.source.OrderSource
import com.realtime.flink.trigger.MyTrigger
import org.apache.flink.api.common.eventtime.{SerializableTimestampAssigner, WatermarkStrategy}
import org.apache.flink.api.common.functions.AggregateFunction
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.GlobalWindows
import scala.swing.Action.Trigger
object GlobalWindowTest {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// env.getConfig.setAutoWatermarkInterval(1000)
env.getConfig.setParallelism(1)
// val strategy = WatermarkStrategy.forBoundedOutOfOrderness(Duration.ofSeconds(2))
// .withTimestampAssigner(new SerializableTimestampAssigner[OrderDto] {
// override def extractTimestamp(element: OrderDto, recordTimestamp: Long): Long = {
// element.getOrderTime
// }
// }
// ) ;
env.addSource(new OrderSource)
// .assignTimestampsAndWatermarks(strategy)
.windowAll(GlobalWindows.create())
.trigger(new MyTrigger)
.aggregate (new AggregateFunction[OrderDto,(String,Double),(String,Double)] {
override def createAccumulator(): (String, Double) = {
("",0L)
}
override def add(in: OrderDto, acc: (String, Double)): (String, Double) = {
("",in.getOrderPrice+acc._2)
}
override def getResult(acc: (String, Double)): (String, Double) = {
acc
}
override def merge(acc: (String, Double), acc1: (String, Double)): (String, Double) = {
("",acc._2+acc1._2)
}
}).map(x=>{
println(x._2)
})
env.execute("tttt")
}
}
输出结果: 可以看到 12=5+7 38=7+5+3+3+2+2+1+7+3+5… 后面那分钟数据清空重新计算,计算频率为10s一次~符合预期。
AAAAAAAAAAA数据:2021-03-29 00:38:47-->7
SSSSSSSSSSSSSS1616949527985-->1616949530000
AAAAAAAAAAA数据:2021-03-29 00:38:49-->5
AAAAAAAAAAA数据:2021-03-29 00:38:50-->3
KKKKKKKKKKKKKKKK1616949540000-->50000
12.0
AAAAAAAAAAA数据:2021-03-29 00:38:51-->3
AAAAAAAAAAA数据:2021-03-29 00:38:52-->2
AAAAAAAAAAA数据:2021-03-29 00:38:53-->0
AAAAAAAAAAA数据:2021-03-29 00:38:54-->2
AAAAAAAAAAA数据:2021-03-29 00:38:55-->1
AAAAAAAAAAA数据:2021-03-29 00:38:56-->7
AAAAAAAAAAA数据:2021-03-29 00:38:57-->3
AAAAAAAAAAA数据:2021-03-29 00:38:58-->5
AAAAAAAAAAA数据:2021-03-29 00:38:59-->0
KKKKKKKKKKKKKKKK1616949550000-->0
38.0
AAAAAAAAAAA数据:2021-03-29 00:39:00-->1
AAAAAAAAAAA数据:2021-03-29 00:39:01-->4
AAAAAAAAAAA数据:2021-03-29 00:39:02-->1
AAAAAAAAAAA数据:2021-03-29 00:39:03-->3
AAAAAAAAAAA数据:2021-03-29 00:39:04-->2
AAAAAAAAAAA数据:2021-03-29 00:39:05-->3
AAAAAAAAAAA数据:2021-03-29 00:39:06-->8
AAAAAAAAAAA数据:2021-03-29 00:39:07-->1
AAAAAAAAAAA数据:2021-03-29 00:39:08-->1
AAAAAAAAAAA数据:2021-03-29 00:39:09-->9
KKKKKKKKKKKKKKKK1616949560000-->10000
33.0
AAAAAAAAAAA数据:2021-03-29 00:39:10-->1
AAAAAAAAAAA数据:2021-03-29 00:39:11-->1
AAAAAAAAAAA数据:2021-03-29 00:39:12-->0
AAAAAAAAAAA数据:2021-03-29 00:39:13-->9
AAAAAAAAAAA数据:2021-03-29 00:39:14-->3
AAAAAAAAAAA数据:2021-03-29 00:39:15-->3
AAAAAAAAAAA数据:2021-03-29 00:39:16-->0
AAAAAAAAAAA数据:2021-03-29 00:39:17-->9
AAAAAAAAAAA数据:2021-03-29 00:39:18-->3
AAAAAAAAAAA数据:2021-03-29 00:39:19-->4
KKKKKKKKKKKKKKKK1616949570000-->20000
66.0
AAAAAAAAAAA数据:2021-03-29 00:39:20-->8
AAAAAAAAAAA数据:2021-03-29 00:39:21-->9
AAAAAAAAAAA数据:2021-03-29 00:39:22-->1
AAAAAAAAAAA数据:2021-03-29 00:39:23-->9
AAAAAAAAAAA数据:2021-03-29 00:39:24-->7
AAAAAAAAAAA数据:2021-03-29 00:39:25-->1
AAAAAAAAAAA数据:2021-03-29 00:39:26-->6
AAAAAAAAAAA数据:2021-03-29 00:39:27-->0
AAAAAAAAAAA数据:2021-03-29 00:39:28-->7
AAAAAAAAAAA数据:2021-03-29 00:39:29-->9
KKKKKKKKKKKKKKKK1616949580000-->30000
123.0
AAAAAAAAAAA数据:2021-03-29 00:39:30-->2
AAAAAAAAAAA数据:2021-03-29 00:39:31-->9
AAAAAAAAAAA数据:2021-03-29 00:39:32-->1
AAAAAAAAAAA数据:2021-03-29 00:39:33-->3
AAAAAAAAAAA数据:2021-03-29 00:39:34-->8
AAAAAAAAAAA数据:2021-03-29 00:39:35-->7
AAAAAAAAAAA数据:2021-03-29 00:39:36-->2
AAAAAAAAAAA数据:2021-03-29 00:39:37-->9
AAAAAAAAAAA数据:2021-03-29 00:39:38-->7
AAAAAAAAAAA数据:2021-03-29 00:39:39-->7
KKKKKKKKKKKKKKKK1616949590000-->40000
178.0
AAAAAAAAAAA数据:2021-03-29 00:39:40-->6
AAAAAAAAAAA数据:2021-03-29 00:39:41-->7
AAAAAAAAAAA数据:2021-03-29 00:39:42-->6
AAAAAAAAAAA数据:2021-03-29 00:39:43-->6
AAAAAAAAAAA数据:2021-03-29 00:39:44-->9
AAAAAAAAAAA数据:2021-03-29 00:39:45-->1
AAAAAAAAAAA数据:2021-03-29 00:39:46-->6
AAAAAAAAAAA数据:2021-03-29 00:39:47-->2
AAAAAAAAAAA数据:2021-03-29 00:39:48-->9
AAAAAAAAAAA数据:2021-03-29 00:39:49-->6
KKKKKKKKKKKKKKKK1616949600000-->50000
236.0
AAAAAAAAAAA数据:2021-03-29 00:39:50-->5
AAAAAAAAAAA数据:2021-03-29 00:39:51-->1
AAAAAAAAAAA数据:2021-03-29 00:39:52-->7
AAAAAAAAAAA数据:2021-03-29 00:39:53-->2
AAAAAAAAAAA数据:2021-03-29 00:39:54-->4
AAAAAAAAAAA数据:2021-03-29 00:39:55-->9
AAAAAAAAAAA数据:2021-03-29 00:39:56-->4
AAAAAAAAAAA数据:2021-03-29 00:39:57-->5
AAAAAAAAAAA数据:2021-03-29 00:39:58-->5
AAAAAAAAAAA数据:2021-03-29 00:39:59-->3
KKKKKKKKKKKKKKKK1616949610000-->0
281.0
AAAAAAAAAAA数据:2021-03-29 00:40:00-->0
AAAAAAAAAAA数据:2021-03-29 00:40:01-->3
AAAAAAAAAAA数据:2021-03-29 00:40:02-->6
AAAAAAAAAAA数据:2021-03-29 00:40:03-->4
AAAAAAAAAAA数据:2021-03-29 00:40:04-->1
AAAAAAAAAAA数据:2021-03-29 00:40:05-->3
AAAAAAAAAAA数据:2021-03-29 00:40:06-->2
AAAAAAAAAAA数据:2021-03-29 00:40:07-->8
AAAAAAAAAAA数据:2021-03-29 00:40:08-->1
AAAAAAAAAAA数据:2021-03-29 00:40:09-->9
KKKKKKKKKKKKKKKK1616949620000-->10000
37.0
AAAAAAAAAAA数据:2021-03-29 00:40:10-->6
AAAAAAAAAAA数据:2021-03-29 00:40:11-->8
AAAAAAAAAAA数据:2021-03-29 00:40:12-->4
Process finished with exit code -1
总结
工作中以报表统计为主,基本不使用自定义trigger。感觉线上业务开发可能会用到。之后的文章还会继续讲下trigger剩下的几个方法用法,以及如何处理延迟数据。