深入理解Flink中的状态实现

最新推荐文章于 2024-04-23 12:24:08 发布

哥伦布112

最新推荐文章于 2024-04-23 12:24:08 发布

阅读量330

点赞数

分类专栏： flink

本文链接：https://blog.csdn.net/u013939918/article/details/107068565

版权

本文深入探讨了Flink中的状态管理，包括内部和外部状态的区别、状态的使用、snapshot和restore过程。详细阐述了Flink如何在Operator和KeyedStream中使用状态，以及不同类型的State如OperatorState和KeyedState的特性。此外，文章还介绍了状态的快照和恢复机制，特别是KeyedStateBackend的增量和全量snapshot，并讨论了并发度变化时的状态重新分配策略。

摘要由CSDN通过智能技术生成

state的层次结构
keyedState => windowState
OperatorState => kafkaOffset
stateBackend
snapshot/restore
internalTimerService
RocksDB操作的初探
state ttL
state local recovery
QueryableState
increamental checkpoint
state redistribution
broadcasting state
CheckpointStreamFactory

内部和外部状态

flink状态分为了内部和外部使用接口，但是两个层级都是一一对应，内部接口都实现了外部接口，主要是有两个目的

内部接口提供了更多的方法，包括获取state中的serialize之后的byte，以及Namespace的操作方法。内部状态主要用于内部runtime实现时所需要用到的一些状态比如window中的windowState，CEP中的sharedBuffer,kafkaConsumer中offset管理的ListState,而外部State接口主要是用户自定义使用的一些状态
考虑到各个版本的兼容性，外部接口要保障跨版本之间的兼容问题，而内部接口就很少受到这个限制，因此也就比较灵活

状态的使用

了解了flink 状态的层次结构，那么编程中和flink内部是如何使用这些状态呢？

flink中使用状态主要是两部分，一部分是函数中使用状态，另一部分是在operator中使用状态

方式：

CheckpointedFunction
ListCheckpointed
RuntimeContext （DefaultKeyedStateStore）
StateContext

StateContext

StateInitializationContext

Iterable<StatePartitionStreamProvider> getRawOperatorStateInputs();

Iterable<KeyGroupStatePartitionStreamProvider> getRawKeyedStateInputs();

ManagedInitializationContext

OperatorStateStore getOperatorStateStore();
KeyedStateStore getKeyedStateStore();

举例：

AbstractStreamOperator封装了这个方法initializeState(StateInitializationContext context)用以在operator中进行raw和managed的状态管理
CheckpointedFunction的用法其实也是借助于StateContext进行相关实现

CheckpointedFunction#initializeState方法在transformation function的各个并发实例初始化的时候被调用这个方法提供了FunctionInitializationContext的对象，可以通过这个context来获取OperatorStateStore或者KeyedStateStore，也就是说通过这个接口可以注册这两种类型的State，这也是和ListCheckpointed接口不一样的地方，只是说KeyedStateStore只能在keyedstream上才能注册，否则就会报错而已,以下是一个使用这两种类型状态的样例。可以参见FlinkKafkaConsumerBase通过这个接口来实现offset的管理。

public class MyFunction<T> implements MapFunction<T, T>, CheckpointedFunction {

     private ReducingState<Long> countPerKey;
     private ListState<Long> countPerPartition;

     private long localCount;

     public void initializeState(FunctionInitializationContext context) throws Exception {
         // get the state data structure for the per-key state
         countPerKey = context.getKeyedStateStore().getReducingState(
                 new ReducingStateDescriptor<>("perKeyCount", new AddFunction<>(), Long.class));

         // get the state data structure for the per-partition state
         countPerPartition = context.getOperatorStateStore().getOperatorState(
                 new ListStateDescriptor<>("perPartitionCount", Long.class));

         // initialize the "local count variable" based on the operator state
         for (Long l : countPerPartition.get()) {
             localCount += l;
         }
     }

     public void snapshotState(FunctionSnapshotContext context) throws Exception {
         // the keyed state is always up to date anyways
         // just bring the per-partition state in shape
         countPerPartition.clear();
         countPerPartition.add(localCount);
     }

     public T map(T value) throws Exception {
         // update the states
         countPerKey.add(1L);
         localCount++;