Flink个人学习整理-Stage篇(八)
需要记住多个事件信息的操作就是有状态的
流处理的状态功能:去重、检测、聚合、更新机器学习模型
状态分类:
Managed State 和 Raw State
Flink Runtime托管, 自动存储, 自动恢复, 自动伸缩
Flink提供多种常用数据结构, 例如:ListState, MapState等
使用前提:继承Rich函数类或其他提供好的接口类
Managed State 分类
1、Keyed State(键控状态):只适用于KeyedStream 【ValueState、ListState、MapState、ReduceState、AggregatingState(可以不同)】
2、Operator State(算子状态):常用于source 【 ListState、BroadCastState】
状态有效期 (TTL)
任何类型的 keyed state 都可以有 有效期 (TTL)。
在使用状态 TTL 前,需要先构建一个StateTtlConfig 配置对象。 然后把配置传递到 state descriptor 中启用 TTL 功能:
TTL 配置有以下几个选项: newBuilder 的第一个参数表示数据的有效期,是必选项。
TTL 的更新策略(默认是 OnCreateAndWrite):
StateTtlConfig.UpdateType.OnCreateAndWrite — 仅在创建和写入时更新
StateTtlConfig.UpdateType.OnReadAndWrite — 读取时也更新
数据在过期但还未被清理时的可见性配置如下(默认为 NeverReturnExpired):
StateTtlConfig.StateVisibility.NeverReturnExpired —不返回过期数据
StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp —会返回过期但未清理的数据
NeverReturnExpired 情况下,过期数据就像不存在一样,不管是否被物理删除。这对于不能访问过期数据的场景下非常有用,比如敏感数据。 ReturnExpiredIfNotCleanedUp 在数据被物理删除前都会返回。
TTL更新策略转载自:https://blog.csdn.net/weixin_42155491/article/details/104883019
ValueState + 侧输出流 demo跳变大于10
public class Flink_State_ValueState_Process {
public static void main(String[] args) throws Exception {
// 获取运行时环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
SingleOutputStreamOperator<Sensor> localhostDS = env.socketTextStream("localhost", 9999)
.map(new MapFunction<String, Sensor>() {
@Override
public Sensor map(String value) throws Exception {
String[] strings = value.split(",");
return new Sensor(
strings[0],
Long.parseLong(strings[1]),
Integer.parseInt(strings[2])
);
}
})
.keyBy(Sensor::getId)
.process(new KeyedProcessFunction<String, Sensor, Sensor>() {
private ValueState<Integer> valueState;
@Override
public void open(Configuration parameters) throws Exception {
valueState = getRuntimeContext().getState(new ValueStateDescriptor<Integer>("va-state", Integer.class));
}
@Override
public void processElement(Sensor value, Context ctx, Collector<Sensor> out) throws Exception {
Integer vc = value.getVc();
Integer lastVa = valueState.value();
System.out.println(lastVa);
if ( lastVa != null && Math.abs(vc - lastVa) >= 10 ) {
ctx.timerService().registerProcessingTimeTimer(ctx.timerService().currentProcessingTime());
}
valueState.update(vc);
out.collect(value);
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<Sensor> out) throws Exception {
ctx.output(new OutputTag<String>("side") {
}, ctx.getCurrentKey() + "跳变大于10");
}
});
localhostDS.print("主流");
localhostDS.getSideOutput(new OutputTag<String>("side"){}).print("告警流");
env.execute();
}
}
MapState demo去重
public class Flink_State_MapState {
public static void main(String[] args) throws Exception {
// 获取运行时环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.socketTextStream("localhost",9999)
.map(new MapFunction<String, Sensor>() {
@Override
public Sensor map(String value) throws Exception {
String[] strings = value.split(",");
return new Sensor(
strings[0],
Long.parseLong(strings[1]),
Integer.parseInt(strings[2])
);
}
})
.keyBy(Sensor::getId)
.filter(new RichFilterFunction<Sensor>() {
// 定义状态
private MapState<Integer,Integer> mapState;
@Override
public void open(Configuration parameters) throws Exception {
mapState = getRuntimeContext().getMapState(new MapStateDescriptor<Integer, Integer>(
"map-state",
Integer.class,
Integer.class
));
}
@Override
public boolean filter(Sensor value) throws Exception {
// true的保留
if (mapState.contains(value.getVc())){
return false;
}else {
mapState.put(value.getVc(),value.getVc());
return true;
}
}
})
.print("去重");
env.execute();
}
}
ReducingState demo累加
public class Flink_State_ReducingState {
public static void main(String[] args) throws Exception {
// 获取运行时环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.socketTextStream("localhost",9999)
.map(new MapFunction<String, Sensor>() {
@Override
public Sensor map(String value) throws Exception {
String[] strings = value.split(",");
return new Sensor(
strings[0],
Long.parseLong(strings[1]),
Integer.parseInt(strings[2])
);
}
})
.keyBy(Sensor::getId)
// 使用状态编程的方式实现累加传感器的水位线
.process(new KeyedProcessFunction<String, Sensor, Sensor>() {
// 定义状态
private ReducingState<Sensor> reducingState;
@Override
public void open(Configuration parameters) throws Exception {
// 初始化状态
reducingState = getRuntimeContext().getReducingState(new ReducingStateDescriptor<Sensor>(
"reduce-state",
new ReduceFunction<Sensor>() {
@Override
public Sensor reduce(Sensor value1, Sensor value2) throws Exception {
return new Sensor(value1.getId(),Math.max(value1.getTs(),value2.getTs()),value1.getVc() + value2.getVc());
}
},Sensor.class));
}
@Override
public void processElement(Sensor value, Context ctx, Collector<Sensor> out) throws Exception {
reducingState.add(value);
Sensor sensor = reducingState.get();
out.collect(sensor);
}
})
.print();
env.execute();
}
}