Flink函数系列(2)-CheckpointedFunction

要想使用Operator State(non-keyed state),可以实现CheckpointedFunction接口实现一个有状态的函数。

关键点:

1. CheckpointedFunction是stateful transformation functions的核心接口,用于跨stream维护state。虽然有更轻量级的接口存在(假如不实现该接口,代替方案,比如operator state可以实现ListCheckpointed,已经废弃;keyed state可以用RuntimeContext,而RuntimeContext出现在RichFunction中,所以可以实现RichFunction),但是该接口为管理keyed state和operator state提供了最大的灵活性。

2. snapshotState是在执行checkpoint的时候会被调用;initializeState是在每次用户定义的function初始化的时候(第一次初始化或者从前一次checkpoint recover的时候)被调用,该方法不仅可以用来初始化state,还可以用于处理state recovery的逻辑。

3. 对于manageed operator state,目前仅仅支持list-style的形式,即要求state是serializable objects的List结构,方便在rescale的时候进行redistributed;关于redistribution schemes的模式目前有两种,分别是Even-split redistribution(在restore/redistribution的时候每个operator仅仅得到整个state的sublist,即多parallel下)及Union redistribution(在restore/redistribution的时候每个operator得到整个state的完整list,状态值比较大时可能会报内存错误或rpc帧过大)

CheckpointedFunction提供了两个函数:

/**
 * This method is called when a snapshot for a checkpoint is requested. This acts as a hook to
 * the function to ensure that all state is exposed by means previously offered through {@link
 * FunctionInitializationContext} when the Function was initialized, or offered now by {@link
 * FunctionSnapshotContext} itself.
 *
 * @param context the context for drawing a snapshot of the operator
 * @throws Exception Thrown, if state could not be created ot restored.
 */
void snapshotState(FunctionSnapshotContext context) throws Exception;

/**
 * This method is called when the parallel function instance is created during distributed
 * execution. Functions typically set up their state storing data structures in this method.
 *
 * @param context the context for initializing the operator
 * @throws Exception Thrown, if state could not be created ot restored.
 */
void initializeState(FunctionInitializationContext context) throws Exception;

FunctionSnapshotContext继承了ManagedSnapshotContext接口,它定义了getCheckpointId、getCheckpointTimestamp方法;FunctionInitializationContext继承了ManagedInitializationContext接口,它定义了isRestored、getOperatorStateStore、getKeyedStateStore方法,可以用来判断是否是在前一次execution的snapshot中restored,以及获取OperatorStateStore、KeyedStateStore对象。

例子1:BufferingSink

下面是一个有状态的SinkFunction的例子,它使用CheckpointedFunction在将元素发送到外部之前对其进行缓冲。它演示了Even-split redistribution列表状态:

public class BufferingSink
        implements SinkFunction<Tuple2<String, Integer>>,
                   CheckpointedFunction {

    private final int threshold;

    private transient ListState<Tuple2<String, Integer>> checkpointedState;

    private List<Tuple2<String, Integer>> bufferedElements;

    public BufferingSink(int threshold) {
        this.threshold = threshold;
        this.bufferedElements = new ArrayList<>();
    }

    @Override
    public void invoke(Tuple2<String, Integer> value, Context contex) throws Exception {
        // 发往sink 前 填充buffer 假如到达threshold了则发往sink 然后清除buffer
        bufferedElements.add(value);
        if (bufferedElements.size() >= threshold) {
            for (Tuple2<String, Integer> element: bufferedElements) {
                // send it to the sink
            }
            bufferedElements.clear();
        }
    }

    @Override
    public void snapshotState(FunctionSnapshotContext context) throws Exception {
        // 每次发出checkpoints前 需要从buffer里读出最新数据
        checkpointedState.clear();
        for (Tuple2<String, Integer> element : bufferedElements) {
            checkpointedState.add(element);
        }
    }

    @Override
    public void initializeState(FunctionInitializationContext context) throws Exception {
        ListStateDescriptor<Tuple2<String, Integer>> descriptor =
            new ListStateDescriptor<>(
                "buffered-elements",
                TypeInformation.of(new TypeHint<Tuple2<String, Integer>>() {}));

        //初始化 ListState
        checkpointedState = context.getOperatorStateStore().getListState(descriptor);

        // 如果需要从checkpoints恢复 则将checkpoints里的元素添加到buffer
        if (context.isRestored()) {
            for (Tuple2<String, Integer> element : checkpointedState.get()) {
                bufferedElements.add(element);
            }
        }
    }
}

例子2:

public class CountFunction<T> implements MapFunction<T, T>, CheckpointedFunction {
 
      private ReducingState<Long> countPerKey;
      private ListState<Long> countPerPartition;
 
      private long localCount;
 
      public void initializeState(FunctionInitializationContext context) throws Exception {
          // get the state data structure for the per-key state
          countPerKey = context.getKeyedStateStore().getReducingState(
                  new ReducingStateDescriptor<>("perKeyCount", new AddFunction<>(), Long.class));
 
          // get the state data structure for the per-partition state
          countPerPartition = context.getOperatorStateStore().getOperatorState(
                  new ListStateDescriptor<>("perPartitionCount", Long.class));
 
          // initialize the "local count variable" based on the operator state
          for (Long l : countPerPartition.get()) {
              localCount += l;
          }
      }
 
      public void snapshotState(FunctionSnapshotContext context) throws Exception {
          // the keyed state is always up to date anyways
          // just bring the per-partition state in shape
          countPerPartition.clear();
          countPerPartition.add(localCount);
      }
 
      public T map(T value) throws Exception {
          // update the states
          countPerKey.add(1L);
          localCount++;
 
          return value;
      }
  } 

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值