一、Flink转换算子是无法访问事件的时间戳和watermark,因此DataStream提供了一套底层API,用于访问事件时间戳,watermark和注册的定时事件。Flink SQL是基于ProcessFunction实现的。
Flink提供了8个ProcessFunction
ProcessFunction
KeyedProcessFunction
CoProcessFunction
ProcessJoinFunction
BroadcastProcessFunction
KeyedBroadcastFunction
ProcessWindowFunction
ProcessAllWindowFunction
以KeyedProcessFunction为例:
DataStreamSource<String> stream = environment.socketTextStream("localhost", 7777);
SingleOutputStreamOperator<String> operator = stream.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String s) throws Exception {
String[] split = s.split("\\s");
return new Tuple2<String, Integer>(split[0], Integer.valueOf(split[1]));
}
}).keyBy(0)
.process(new KeyedProcessFunction<Tuple, Tuple2<String, Integer>, String>() {
private long lazyTime = 10 * 1000;
@Override
public void processElement(Tuple2<String, Integer> stringIntegerTuple2, Context context, Collector<String> collector) throws Exception {
Integer value = stringIntegerTuple2.f1;
if (value > 10) {//添加timer
context.timerService().registerProcessingTimeTimer(context.timerService().currentProcessingTime() + lazyTime);
}
collector.collect(stringIntegerTuple2.toString());
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
out.collect(ctx.getCurrentKey() + "超过阈值");
}
});
operator.print();
environment.execute(KeyProcessFunctionTest.class.getSimpleName());
注意:
(1)processElement 流中每条数据都会执行
(2)context 可以取到watermark,时间服务(timerService)等信息
(3)context.timerService().registerProcessingTimeTimer(context.timerService().currentProcessingTime() + lazyTime)
每个定时时间都有唯一的时间戳,registerProcessingTimeTimer的入参是定时事件执行的时间戳
二、侧输出流(sideoutput)
大部分的DataStream API的算子的输出是单一输出流,split算子可以产生的多条流,但是这些多条流的数据类型是一样的。
processFunction的sideoutput可以产生数据类型不一致的多条流。
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); //输入的值:key 数字 DataStreamSource<String> stream = environment.socketTextStream("localhost", 7777); DataStream<Tuple2<String, Integer>> outputTest = stream.map(new MapFunction<String, Tuple2<String, Integer>>() { @Override public Tuple2<String, Integer> map(String s) throws Exception { String[] split = s.split("\\s"); return new Tuple2<String, Integer>(split[0], Integer.valueOf(split[1])); } }).process(new ProcessFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>() { @Override public void processElement(Tuple2<String, Integer> stringIntegerTuple2, Context context, Collector<Tuple2<String, Integer>> collector) throws Exception { Integer value = stringIntegerTuple2.f1; if (value > 10) { OutputTag<Tuple2<String, Integer>> outputTag = new OutputTag<Tuple2<String, Integer>>("outputTest_tuple"){}; context.output(outputTag, stringIntegerTuple2); } else if(value>5) { OutputTag<String> outputTag = new OutputTag<String>("outputTest_String"){}; context.output(outputTag, stringIntegerTuple2.f0); }else { collector.collect(stringIntegerTuple2); } } }); //根据outputTest_tuple取Tuple类型的侧输出流 DataStream<Tuple2<String, Integer>> sideOutput_tuple = ((SingleOutputStreamOperator<Tuple2<String, Integer>>) outputTest).getSideOutput(new OutputTag<Tuple2<String, Integer>>("outputTest_tuple"){}); sideOutput_tuple.print(); //根据outputTest_String取String类型的侧输出流 DataStream<String> sideOutput_String = ((SingleOutputStreamOperator<Tuple2<String, Integer>>) outputTest).getSideOutput(new OutputTag<String>("outputTest_String"){}); sideOutput_String.print(); environment.execute(SideoutputTest.class.getSimpleName());