前言
process function是flink中的低阶流处理算子,它的几个概念:
event:数据流中数据
state:容错和一致性状态
timers:基于事件时间或处理时间的定时器
示例
统计每秒钟key出现的次数,打印出每秒的key和count。
1.数据
public class CountWithTimestamp {
public String key;
public long count;
public long lastModified;
}
2.示例方法
@Test
public void KeyedProcessFunctionTest() throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.STREAMING)
.setParallelism(1);
env.socketTextStream("172.16.10.159", 8888)
.map(new MapFunction<String, Tuple2<String, Integer>>() {
@Override
public Tuple2<String, Integer> map(String value) throws Exception {
String[] s = value.split(",");
return new Tuple2<>(s[0], Integer.parseInt(s[1]));
}
})
.assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple2<String, Integer>>forMonotonousTimestamps().withTimestampAssigner((element, recordTimestamp) -> element.f1))
.keyBy(new KeySelector<Tuple2<String, Integer>, String>() {
@Override
public String getKey(Tuple2<String, Integer> value) throws Exception {
return value.f0;
}
})
.process(new KeyedProcessFunction<String, Tuple2<String, Integer>, Tuple2<String, Long>>() {
private ValueState<CountWithTimestamp> state;
@Override
public void open(Configuration parameters) throws Exception {
state = getRuntimeContext().getState(new ValueStateDescriptor<>("state", CountWithTimestamp.class));
}
@Override
public void processElement(Tuple2<String, Integer> value, Context ctx, Collector<Tuple2<String, Long>> out) throws Exception {
CountWithTimestamp current = state.value();
if (current == null) {
current = new CountWithTimestamp();
current.key = value.f0;
}
//统计key出现的次数
current.count++;
//更新时间戳
current.lastModified = ctx.timestamp();
//更新窗口状态
state.update(current);
//注册定时器,每秒统计一次次数
ctx.timerService().registerEventTimeTimer(current.lastModified / 1000 * 1000 + 1000);
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<Tuple2<String, Long>> out) throws Exception {
CountWithTimestamp result = state.value();
if (result.lastModified > timestamp) {
out.collect(new Tuple2<String, Long>(result.key, result.count - 1));
//重置计数为1
result.count = 1;
} else {
out.collect(new Tuple2<String, Long>(result.key, result.count));
//重置计数为1
result.count = 0;
}
state.update(result);
}
})
.print("KeyedProcessFunction");
env.execute("KeyedProcessFunctionTest");
}
3.测试
开启nc
nc -lp 8888
输入数据,在第一个1s窗口内,a出现了三次

在第2s内,a出现了2次,b出现了2次

1097

被折叠的 条评论
为什么被折叠?



