Flink的状态机制
状态类型
- managed State 和 raw state
managed state: 由flink runtime托管,状态是自动存储,自动恢复的,flink在存储管理和持久化上做了一些优化。横向扩展时候,状态能够自动重新分布到多个并行实例上。flink提供常用的数据结构,ListState,MapState
raw state: 开发者自己管理,以字节数组存储,用户自定义算子 - keyed State: 是keyedstream上的状态,每个key对应自己的状态,是一种特殊的operator state
operator State:可以用在所有算子上,每个slot里面的算子实例共享一个状态,流入这个滋任务和数据可以访问和更新这个状态
两种算子都是基于本地的,即每个算子子任务维护者这个算子子任务对应的状态存储,算子子任务之间不能互相访问
代码实现
keyedStream 需要 在实现算子房中继承rich的方法,比如说不用flatmapfunction,而用richFlatMapFunction
package com.lagou.state;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
public class StateDemo {
public static void main(String[] args) throws Exception {
//(1,3)(1,5)(1,7)(1,4)(1,2)
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(2000);
DataStreamSource<String> data = env.socketTextStream("hdp-