状态初始化源码解读

TaskManager启动Task线程后,会调用StreamTask#invoke()来触发当前Task内StreamOperator的执行。其中会在beforeInvoke()来初始化StreamOperator的State。

/**
 * 对当前StreamTask内所有的StreamOperator,(提供一个上下文来)初始化(算子 or 键控)State
 */
private void initializeStateAndOpen() throws Exception {

    // 取出OperatorChain中所有的StreamOperator
    StreamOperator<?>[] allOperators = operatorChain.getAllOperators();

    // 遍历这些StreamOperator,对每个StreamOperator,进行状态初始化
    for (StreamOperator<?> operator : allOperators) {
        if (null != operator) {
            // 对每个StreamOperator进行“状态初始化”:提供一个包含各种xxxStateStore的上下文StateInitializationContext
            // 本质:将(拥有xxxStateStore的)StateInitializationContext提供给自定义Function使用(基于xxxStateStore初始化xxxState)
            operator.initializeState();
            // 调用StreamOperator的open()方法,执行open()方法内的初始化逻辑
            // 例如:(实现了RichFunction的自定义函数)可以在open()方法中去getXXXState
            operator.open();
        }
    }
}

在对StreamOperator内的状态进行初始化后,就会执行StreamOperator内自定义Function中的open()方法。

/**
 * 对StreamOperator进行“状态初始化”:提供一个上下文,确定使用哪个StateBackend
 */
@Override
public final void initializeState() throws Exception {

    // 从ExecutionConfig中获取Key的序列化器
    final TypeSerializer<?> keySerializer = config.getStateKeySerializer(getUserCodeClassloader());

    // 当前StreamOperator所属的StreamTask
    final StreamTask<?, ?> containingTask =
        Preconditions.checkNotNull(getContainingTask());
    final CloseableRegistry streamTaskCloseableRegistry =
        Preconditions.checkNotNull(containingTask.getCancelables());
    /**
	 * StreamTaskStateInitializer是专门用来创建StreamOperatorStateContext的,
	 * StreamOperator必须得通过StreamOperatorStateContext,来初始化StateBackend、InternalTimeServiceManager等。
	 */
    final StreamTaskStateInitializer streamTaskStateManager =
        Preconditions.checkNotNull(containingTask.createStreamTaskStateInitializer());

    /**
	 * StreamOperatorStateContext存储了键控状态、算子状态的状态后端,以及管理时间服务的InternalTimeServiceManager
	 */
    final StreamOperatorStateContext context =
        // 创建AbstractKeyedStateBackend、OperatorStateBackend以及管理时间服务的InternalTimeServiceManager,并保存起来
        streamTaskStateManager.streamOperatorStateContext(
        getOperatorID(),
        getClass().getSimpleName(),
        getProcessingTimeService(),
        this,
        keySerializer,
        streamTaskCloseableRegistry,
        metrics);

    // 获取算子状态的状态后端
    this.operatorStateBackend = context.operatorStateBackend();
    // 获取键控状态的状态后端
    this.keyedStateBackend = context.keyedStateBackend();

    if (keyedStateBackend != null) {
        // 将KeyedStateBackend进一步包装成代理类:KeyedStateStore
        this.keyedStateStore = new DefaultKeyedStateStore(keyedStateBackend, getExecutionConfig());
    }

    // 获取InternalTimeServiceManager(被用来管理内部专用的InternalTimerService),以便能在当前StreamOperator中注册/管理Timer
    timeServiceManager = context.internalTimerServiceManager();

    // 提供了创建、获取KeyedState的方法
    CloseableIterable<KeyGroupStatePartitionStreamProvider> keyedStateInputs = context.rawKeyedStateInputs();
    // 提供了创建、获取OperatorState的方法
    CloseableIterable<StatePartitionStreamProvider> operatorStateInputs = context.rawOperatorStateInputs();

    try {
        /**
		 * 将各种xxxStateBackend进一步包装成xxxStateStore(状态后端的代理类),保存到StateInitializationContext。因此它就具备了管理算子状态、键控状态、原生状态的能力!
		 * StateInitializationContext会被用在StreamOperator、自定义Function中,(通过内部保存的xxxStateStore)来初始化State(getState、getMapState...)
		 */
        StateInitializationContext initializationContext = new StateInitializationContextImpl(
            context.isRestored(), // information whether we restore or start for the first time
            operatorStateBackend, // access to operator state backend
            keyedStateStore, // access to keyed state backend
            keyedStateInputs, // access to keyed state stream
            operatorStateInputs); // access to operator state stream

        // AbstractUdfStreamOperator使用自定义CheckpointedFunction时,会通过这个接口方法来初始化状态
        // 本质:利用(拥有xxxStateStore的)StateInitializationContext,来初始化StreamOperator中的xxxState
        initializeState(initializationContext);
    } finally {
        closeFromRegistry(operatorStateInputs, streamTaskCloseableRegistry);
        closeFromRegistry(keyedStateInputs, streamTaskCloseableRegistry);
    }
}

在以上方法中,核心操作就是“包装上下文”

核心 1:StreamTaskStateInitializer

StreamTaskStateInitializer是专门用来创建StreamOperatorStateContext的,只有把StreamOperatorStateContext暴露给StreamOperator,才能进一步的创建xxxStateBackend。可以说,StreamTaskStateInitializer就是连接StreamOperator和StreamOperatorStateContext的桥梁。

核心 2:StreamOperatorStateContext

在StreamTaskStateInitializer创建StreamOperatorStateContext的过程中,会创建好键控状态、算子状态的StateBackend,以及管理时间服务的InternalTimeServiceManager。然后将它们包装到StreamOperatorStateContext里面保存起来。

这样一来,AbstractStreamOperator就能对各种StateBackend和InternalTimeServiceManager,“随用随取”

/**
 * 基于StateBackend的实现子类提供的“创建xxxStateBackend”的具体实现逻辑,得到xxxStateBackend。
 * 然后初始化InternalTimeServiceManager。
 */
@Override
public StreamOperatorStateContext streamOperatorStateContext(
    @Nonnull OperatorID operatorID,
    @Nonnull String operatorClassName,
    @Nonnull ProcessingTimeService processingTimeService,
    @Nonnull KeyContext keyContext,
    @Nullable TypeSerializer<?> keySerializer,
    @Nonnull CloseableRegistry streamTaskCloseableRegistry,
    @Nonnull MetricGroup metricGroup) throws Exception {

    // 获取TaskInfo
    TaskInfo taskInfo = environment.getTaskInfo();
    // StreamOperator的SubTask描述(包含OperatorID、OperatorClassName),最终形态是String
    OperatorSubtaskDescriptionText operatorSubtaskDescription =
        new OperatorSubtaskDescriptionText(
        operatorID,
        operatorClassName,
        taskInfo.getIndexOfThisSubtask(),
        taskInfo.getNumberOfParallelSubtasks());

    final String operatorIdentifierText = operatorSubtaskDescription.toString();

    final PrioritizedOperatorSubtaskState prioritizedOperatorSubtaskStates =
        taskStateManager.prioritizedOperatorState(operatorID);

    AbstractKeyedStateBackend<?> keyedStatedBackend = null;
    OperatorStateBackend operatorStateBackend = null;
    CloseableIterable<KeyGroupStatePartitionStreamProvider> rawKeyedStateInputs = null;
    CloseableIterable<StatePartitionStreamProvider> rawOperatorStateInputs = null;
    InternalTimeServiceManager<?> timeServiceManager;

    try {

        // -------------- Keyed State Backend --------------
        // 创建键控状态的状态后端
        keyedStatedBackend = keyedStatedBackend(
            keySerializer,
            operatorIdentifierText,
            prioritizedOperatorSubtaskStates,
            streamTaskCloseableRegistry,
            metricGroup);

        // -------------- Operator State Backend --------------
        // 创建算子状态的状态后端
        operatorStateBackend = operatorStateBackend(
            operatorIdentifierText,
            prioritizedOperatorSubtaskStates,
            streamTaskCloseableRegistry);

        // -------------- Raw State Streams --------------
        rawKeyedStateInputs = rawKeyedStateInputs(
            prioritizedOperatorSubtaskStates.getPrioritizedRawKeyedState().iterator());
        streamTaskCloseableRegistry.registerCloseable(rawKeyedStateInputs);

        rawOperatorStateInputs = rawOperatorStateInputs(
            prioritizedOperatorSubtaskStates.getPrioritizedRawOperatorState().iterator());
        streamTaskCloseableRegistry.registerCloseable(rawOperatorStateInputs);

        // -------------- Internal Timer Service Manager --------------
        // InternalTimeServiceManager实例,用来管理内部的时间服务
        timeServiceManager = internalTimeServiceManager(keyedStatedBackend, keyContext, processingTimeService, rawKeyedStateInputs);

        // -------------- Preparing return value --------------

        // 将算子/键控状态的状态后端、管理时间服务的TimeServiceManager,封装到StreamOperatorStateContextImpl中
        return new StreamOperatorStateContextImpl(
            prioritizedOperatorSubtaskStates.isRestored(),
            operatorStateBackend,
            keyedStatedBackend,
            timeServiceManager,
            rawOperatorStateInputs,
            rawKeyedStateInputs);
    } catch (Exception ex) {

        // 省略部分代码...
    }
}

核心 3:StateInitializationContext

创建好的各种StateBackend还不会直接给CheckpointedFunction使用,而是要包装成代理类StateStore。为了能够保存这些StateStore,特地准备了StateInitializationContext,由StateInitializationContext来持有StateStore。

然后将StateInitializationContext传递给AbstractUdfStreamOperator中的CheckpointedFunction,如果有自定义需求,就可以在CheckpointedFunction接口定义的接口方法initializeState()的具体实现中,利用StateInitializationContext取出StateStore,并完成“状态初始化”操作!

/**
 * 把“保存有StateStore”的StateInitializationContext,交给CheckpointedFunction使用,
 * 使之能够在接口方法initializeState()中,使用这个上下文来初始化状态
 */
@Override
public void initializeState(StateInitializationContext context) throws Exception {
    super.initializeState(context);
    // 对自定义Function中的xxxState进行初始化和恢复,实际就是将StateInitializationContext提供给自定义Function使用(因为要用到xxxStateStore)
    StreamingFunctionUtils.restoreFunctionState(context, userFunction);
}

以下为百度扒来的自定义CheckpointedFunction的实现(不看逻辑,只关心状态初始化即可):

public static class 自定义Function implements MapFunction<Transaction, Long>, CheckpointedFunction {

    private ListState<Long> countPerPartition;
    private long localCount;

    /**
     * 计算交易总比数
     */
    @Override
    public Long map(Transaction transaction) throws Exception {
        localCount++;
        return localCount;
    }

    /**
     * 快照操作的特殊逻辑
     */
    @Override
    public void snapshotState(FunctionSnapshotContext context) throws Exception {
        countPerPartition.clear();
        countPerPartition.add(localCount);
    }

    /**
     * 初始化状态
     */
    @Override
    public void initializeState(FunctionInitializationContext context) throws Exception {

        // 从StateInitializationContext中取出StateStore,并初始化出ListState
        countPerPartition = context.getOperatorStateStore().getListState(
            new ListStateDescriptor<>("perPartitionCount", Long.class));

        // initialize the "local count variable" based on the operator state
        for (Long l : countPerPartition.get()) {
            localCount += l;
        }
    }
}

又比如我们在键控流上使用getRuntimeContext().getState(Desc)来初始化ValueState,也是由RuntimeContext接口的抽象实现AbstractRuntimeUDFContext提供,由StreamingRuntimeContext为其提供具体的实现逻辑:

/**
 * 通过RuntimeContext获取ValueState
 */
@Override
public <T> ValueState<T> getState(ValueStateDescriptor<T> stateProperties) {
    // 校验完StateDescription后,得到包装好的KeyedStateStore
    KeyedStateStore keyedStateStore = checkPreconditionsAndGetKeyedStateStore(stateProperties);
    // 初始化StateDescription的序列化器
    stateProperties.initializeSerializerUnlessSet(getExecutionConfig());
    // 通过keyedStateStore接口(默认实现子类为DefaultKeyedStateStore),创建和获取ValueState
    return keyedStateStore.getState(stateProperties);
}


/**
 * 校验StateDescription,并返回包装好的KeyedStateStore(可以用来初始化KeyedState)
 */
private KeyedStateStore checkPreconditionsAndGetKeyedStateStore(StateDescriptor<?, ?> stateDescriptor) {
    // 对StateDescriptor进行安全校验
    Preconditions.checkNotNull(stateDescriptor, "The state properties must not be null");
    // 获取StreamOperator中的KeyedStateStore(AbstractStreamOperator已经把创建好的KeyedStateStore保存起来了)
    KeyedStateStore keyedStateStore = operator.getKeyedStateStore();
    Preconditions.checkNotNull(keyedStateStore, "Keyed state can only be used on a 'keyed stream', i.e., after a 'keyBy()' operation.");
    // KeyedStateStore是KeyedStateBackend的代理,会提供给RuntimeContext来创建xxxState
    return keyedStateStore;
}

可以看出,AbstractStreamOperator会提供出StateBackend的代理类StateStore。别忘了,Flink系统内部进行状态初始化的过程中,已经创建好了StateStore,且由AbstractStreamOperator的全局变量持有。这里正是将StateStore取出,并利用它来初始化得到ValueState。至此,一切就都解释得通了。

总结一下,Flink系统内部初始化状态,就是将状态后端保存到一个上下文中,并将这个上下文传递给StreamOperator的自定义function。开发者可以在自定义Function中,使用上下文来灵活的获取各种State。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值