Flink之状态编程
一、按键分区状态(Keyed State)
1.1、值状态(ValueState)
1.1.1、定义
状态中只保存一个“值”(value)。ValueState本身是一个接口,源码中定义如下:
public interface ValueState<T> extends State {
T value() throws IOException;
void update(T value) throws IOException;
}
1.1.2、使用案例
利用ValueState和定时器每10秒输出一次用户的pv量
package com.hpsk.flink.state;
import com.hpsk.flink.beans.Event;
import com.hpsk.flink.source.EventWithWatermarkSource;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.KeyedProcessFunction;
import org.apache.flink.util.Collector;
import java.sql.Timestamp;
public class ValueStateDS {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
DataStreamSource<Event> stream = env.addSource(new EventWithWatermarkSource());
SingleOutputStreamOperator<String> result = stream.keyBy(t -> t.user)
.process(new KeyedProcessFunction<String, Event, String>() {
// 定义两个状态,保存当前 pv 值,以及定时器时间戳
private ValueState<Long> valueState;
private ValueState<Long> timerTsState;
@Override
public void open(Configuration parameters) throws Exception {
valueState = getRuntimeContext().getState(new ValueStateDescriptor<Long>("value-state", Long.class));
timerTsState = getRuntimeContext().getState(new ValueStateDescriptor<Long>("timerTs", Long.class));
}
@Override
public void processElement(Event value, Context ctx, Collector<String> collector) throws Exception {
Long count = valueState.value();
if (count == null) {
valueState.update(1L);
} else {
valueState.update(count + 1);
}
// 注册定时器
if (timerTsState.value() == null) {
ctx.timerService().registerEventTimeTimer(value.timestamp + 10 * 1000L);
timerTsState.update(value.timestamp + 10 * 1000L);
}
}
@Override
public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
out.collect("时间:"+new Timestamp(timestamp) + " ->用户:"+ ctx.getCurrentKey() + "的pv值为:" + valueState.value());
timerTsState.clear();
}
});
result.print(">>>>");
env.execute();
}
}
1.2、列表状态(ListState)
1.2.1、定义
将需要保存的数据,以列表(List)的形式组织起来。在 ListState接口中同样有一个类型参数 T,表示列表中数据的类型。ListState 也提供了一系列的方法来操作状态,使用方式
与一般的 List 非常相似。
1.2.2、使用案例
利用ListState进行实现sql中的join操作
package com.hpsk.flink.state;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.state.ListState;
import org.apache.flink.api.common.state.ListStateDescriptor;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.co.CoProcessFunction;
import org.apache.flink.util.Collector;
public class ListStateDS {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
SingleOutputStreamOperator<Tuple3<String, String, Long>> stream1 = env.fromElements(
Tuple3.of("a", "stream-1", 1000L),
Tuple3.of("b", "stream-1", 2000L)
).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple3<String, String, Long>>forMonotonousTimestamps().withTimestampAssigner(

本文详细介绍了Flink的状态编程,包括按键分区状态如值状态、列表状态、映射状态、归约状态和聚合状态的定义及使用案例。此外,还讨论了广播状态的概念及其在实时数仓中的应用。
最低0.47元/天 解锁文章
820

被折叠的 条评论
为什么被折叠?



