【flink】Flink 1.12.2 源码浅析 : StreamTask 浅析

最新推荐文章于 2024-06-06 17:57:19 发布

九师兄

最新推荐文章于 2024-06-06 17:57:19 发布

阅读量574

点赞数

分类专栏：大数据-flink

原文链接：https://zhangboyi.blog.csdn.net/article/details/116432184

版权

大数据-flink 专栏收录该内容

837 篇文章 841 订阅 ¥99.90 ¥99.00

订阅专栏

在这里插入图片描述

1.概述

转载：Flink 1.12.2 源码浅析 : StreamTask 浅析

在Task类的doRun方法中, 首先会构建一个运行环境变量RuntimeEnvironment . 然后会调用loadAndInstantiateInvokable方法来加载&实例化task的可执行代码 .

可以看一下loadAndInstantiateInvokable 方法会根据传入的类加载器userCodeClassLoader.asClassLoader()、实例化类的名字nameOfInvokableClass以及构建实例化任务所需要的环境变量信息RuntimeEnvironment.

org.apache.flink.runtime.taskmanager.Task#doRun

private void doRun() {
	// 初始化状态相关 代码..
	// 记载执行代码所需要的各种任务相关...
	// 请求与初始化用户的代码&方法
	// 构建代码执行所需要的环境变量
    Environment env =
              new RuntimeEnvironment(
                      jobId,
                      vertexId,
                      executionId,
                      executionConfig,
                      taskInfo,
                      jobConfiguration,
                      taskConfiguration,
                      userCodeClassLoader,
                      memoryManager,
                      ioManager,
                      broadcastVariableManager,
                      taskStateManager,
                      aggregateManager,
                      accumulatorRegistry,
                      kvStateRegistry,
                      inputSplitProvider,
                      distributedCacheEntries,
                      consumableNotifyingPartitionWriters,
                      inputGates,
                      taskEventDispatcher,
                      checkpointResponder,
                      operatorCoordinatorEventGateway,
                      taskManagerConfig,
                      metrics,
                      this,
                      externalResourceInfoProvider);

	// 加载&实例化task的可执行代码
            // now load and instantiate the task's invokable code
            invokable =
                    loadAndInstantiateInvokable(
                            userCodeClassLoader.asClassLoader(), nameOfInvokableClass, env);
	// 执行代码
	invokable.invoke();
	
	// 其他代码略.......

}

在这里,我们看一下实例化的类nameOfInvokableClass的主要的四种类型 .

名称	描述
org.apache.flink.streaming.runtime.tasks.SourceStreamTask	Source相关的StreamTask
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask	单输入的StreamTask
org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask	两输入的StreamTask
org.apache.flink.streaming.runtime.tasks.MultipleInputStreamTask	多输入的StreamTask
org.apache.flink.streaming.runtime.tasks.StreamIterationHead	A special {@link StreamTask} that is used for executing feedback edges.
org.apache.flink.streaming.runtime.tasks.StreamIterationTail	A special {@link StreamTask} that is used for executing feedback edges.

二 .AbstractInvokable

这是TaskManager可以执行的每个任务的抽象基类。
具体的任务扩展了这个类，例如流式处理和批处理任务。
TaskManager在执行任务时调用{@link#invoke（）}方法。
任务的所有操作都在此方法中发生（设置输入输出流读写器以及任务的核心操作）。
所有扩展的类都必须提供构造函数{@code MyTask(Environment,TaskStateSnapshot)}.
为了方便起见，总是无状态的任务也只能实现构造函数{@code MyTask(Environment)}.

开发说明：
虽然构造函数不能在编译时强制执行，但我们还没有冒险引入工厂（毕竟它只是一个内部API，对于java8，可以像工厂lambda一样使用 {@code Class::new} ）。

注意:
没有接受初始任务状态快照并将其存储在变量中的构造函数。
这是出于目的，因为抽象调用本身不需要状态快照（只有StreamTask等子类需要状态），我们不希望无限期地存储引用，从而防止垃圾收集器清理初始状态结构。

任何支持可恢复状态并参与检查点设置的子类都需要重写
{@link #triggerCheckpointAsync(CheckpointMetaData, CheckpointOptions, boolean)},
{@link #triggerCheckpointOnBarrier(CheckpointMetaData, CheckpointOptions, CheckpointMetricsBuilder)},
{@link #abortCheckpointOnBarrier(long, Throwable)}and {@link #notifyCheckpointCompleteAsync(long)}.

2.1. 属性&初始化

AbstractInvokable 抽象类只有两个属性Environment environment和shouldInterruptOnCancel = true

属性

  /**
     * 分配给此可调用对象的环境。
     * The environment assigned to this invokable.
     * */
    private final Environment environment;

    /**
     * 标记取消是否应中断正在执行的线程。
     * Flag whether cancellation should interrupt the executing thread.
     * */
    private volatile boolean shouldInterruptOnCancel = true;

构造方法
构造方法就是传入一个Environment对象.

/**
 * Create an Invokable task and set its environment.
 *
 * @param environment The environment assigned to this invokable.
 */
public AbstractInvokable(Environment environment) {
    this.environment = checkNotNull(environment);
}

2.2. Environment

Environment 是AbstractInvokable抽象类(以及子类)的构造函数入参. 在构造Task的时候会把环境参数信息封装成Environment的子类.
交给任务的实现类(比如: SourceStreamTask 或者 OneInputStreamTask 来处理.)

这个没啥可说的,就是封装了一系列的环境引用信息.

private final JobID jobId;
private final JobVertexID jobVertexId;
private final ExecutionAttemptID executionId;

private final TaskInfo taskInfo;

private final Configuration jobConfiguration;
private final Configuration taskConfiguration;
private final ExecutionConfig executionConfig;

private final UserCodeClassLoader userCodeClassLoader;

private final MemoryManager memManager;
private final IOManager ioManager;
private final BroadcastVariableManager bcVarManager;
private final TaskStateManager taskStateManager;
private final GlobalAggregateManager aggregateManager;
private final InputSplitProvider splitProvider;
private final ExternalResourceInfoProvider externalResourceInfoProvider;

private final Map<String, Future<Path>> distCacheEntries;

private final ResultPartitionWriter[] writers;
private final IndexedInputGate[] inputGates;

private final TaskEventDispatcher taskEventDispatcher;

private final CheckpointResponder checkpointResponder;
private final TaskOperatorEventGateway operatorEventGateway;

private final AccumulatorRegistry accumulatorRegistry;

private final TaskKvStateRegistry kvStateRegistry;

private final TaskManagerRuntimeInfo taskManagerInfo;
private final TaskMetricGroup metrics;

private final Task containingTask;

2.3. 方法清单

核心的方法

名称	描述
invoke	Starts the execution 必须被具体的任务实现所覆盖。当任务的实际执行开始时，task manager 将调用此方法。
cancel	当由于用户中止或执行失败而取消任务时，将调用此方法. 它可以被覆盖以响应正确关闭用户代码。
shouldInterruptOnCancel	设置执行{@link #invoke()}方法的线程是否应在取消过程中中断。此方法为 initial interrupt 和 repeated interrupt 设置标志。
dispatchOperatorEvent	外部影响task执行的入口. Operator Events

Checkpoint相关方法

名称	描述
triggerCheckpointAsync	此方法由检查点协调器异步调用以触发检查点。
triggerCheckpointOnBarrier	在所有 input streams 上接收到检查点屏障而触发检查点时，将调用此方法。
abortCheckpointOnBarrier	在接收一些checkpoint barriers 的结果时, 放弃checkpoint …
notifyCheckpointCompleteAsync	通知checkpoint完成
notifyCheckpointAbortAsync	通知notifyCheckpointAbortAsync取消

三 .StreamTask

所有流式处理任务的基类。
Task是由TaskManager部署和执行的本地处理单元。
每个任务运行一个或多个{@link StreamOperator}，这些{@link StreamOperator}构成任务的操作符 chained 。
chained 接在一起的运算符在同一线程中同步执行，因此在同一流分区上执行。
这些chained的常见情况是连续的map/flatmap/filter任务。

任务 chained 包含一个“head”operator和多个 chained operator。

StreamTask专门用于 head operator 的类型：

one-input : OneInputStreamTask
two-input tasks : TwoInputStreamTask
sources : SourceStreamTask
iteration heads : StreamIterationHead
iteration tails : StreamIterationTail

Task类处理由head操作符读取的流的设置，以及操作符在操作符 chained 的末端生成的流。

注意， chained 可能分叉，因此有多个端部。

任务的生命周期设置如下：

 <pre>{@code
 -- setInitialState -> 提供chain中所有operators的状态

 -- invoke()
       |
       +----> Create basic utils (config, etc) and load the chain of operators
       +----> operators.setup()
       +----> task specific init()
       +----> initialize-operator-states()
       +----> open-operators()
       +----> run()
       +----> close-operators()
       +----> dispose-operators()
       +----> common cleanup
       +----> task specific cleanup()
 }</pre>

{@code StreamTask}有一个名为{@code lock}的锁对象。
必须在此锁对象上同步对{@code StreamOperator}上方法的所有调用，以确保没有方法被并发调用。

3.1. 属性& 构造方法

属性相关

  /** The thread group that holds all trigger timer threads. */
    public static final ThreadGroup TRIGGER_THREAD_GROUP = new ThreadGroup("Triggers");

    /** The logger used by the StreamTask and its subclasses. */
    protected static final Logger LOG = LoggerFactory.getLogger(StreamTask.class);

    // ------------------------------------------------------------------------

    /**
     * 任务之外的所有操作{@link #mailboxProcessor mailbox} , 比如 （i.e. 另一个线程执行）
     * 必须通过此执行器执行，以确保没有使一致检查点无效的并发方法调用。
     *
     *
     *
     * All actions outside of the task {@link #mailboxProcessor mailbox}
     * (i.e. performed by another thread)
     * must be executed through this executor to ensure that we don't have concurrent method
     * calls that void consistent checkpoints.
     *
     * <p>CheckpointLock is superseded by {@link MailboxExecutor}, with {@link
     * StreamTaskActionExecutor.SynchronizedStreamTaskActionExecutor
     * SynchronizedStreamTaskActionExecutor} to provide lock to {@link SourceStreamTask}.
     */
    private final StreamTaskActionExecutor actionExecutor;

    /**
     *  输入处理器。在{@link #init()}方法中初始化。
     *  The input processor. Initialized in {@link #init()} method. */
    @Nullable protected StreamInputProcessor inputProcessor;

    /**
     * [重要] 使用此任务的输入流的主运算符。
     * the main operator that consumes the input streams of this task.
     * */
    protected OP mainOperator;

    /**
     * task执行的 OperatorChain
     * The chain of operators executed by this task. */
    protected OperatorChain<OUT, OP> operatorChain;

    /**
     * streaming task的配置信息.
     * The configuration of this streaming task. */
    protected final StreamConfig configuration;

    /**
     * 我们的状态后端。
     *
     * 我们使用它来创建检查点流和 keyed 状态后端。
     * Our state backend. We use this to create checkpoint streams and a keyed state backend. */
    protected final StateBackend stateBackend;

    /**
     * 子任务 Checkpoint 协调器
     */
    private final SubtaskCheckpointCoordinator subtaskCheckpointCoordinator;

    /**
     * 内部{@link TimerService}用于定义当前处理时间（默认值={@code System.currentTimeMillis（）}）
     * 并为将来要执行的任务注册计时器。
     *
     * The internal {@link TimerService} used to define the current processing time (default =
     * {@code System.currentTimeMillis()}) and register timers for tasks to be executed in the
     * future.
     */
    protected final TimerService timerService;

    /**
     * 当前活动的后台具体线程
     * The currently active background materialization threads.
     * */
    private final CloseableRegistry cancelables = new CloseableRegistry();

    /**
     * 异常处理相关
     */
    private final StreamTaskAsyncExceptionHandler asyncExceptionHandler;

    /**
     * 将任务标记为“操作中”的标志，在这种情况下，需要将check初始化为true，
     * 以便invoke（）之前的early cancel（）正常工作。
     *
     * Flag to mark the task "in operation", in which case check needs to be initialized to true, so
     * that early cancel() before invoke() behaves correctly.
     */
    private volatile boolean isRunning;

    /**
     * 标识任务被取消.
     * Flag to mark this task as canceled. */
    private volatile boolean canceled;

    /**
     * 标识任务失败, 比如在invoke方法中发生异常...
     *
     * Flag to mark this task as failing, i.e. if an exception has occurred inside {@link
     * #invoke()}.
     */
    private volatile boolean failing;

    /**
     * ???? 干啥的
     */
    private boolean disposedOperators;

    /** Thread pool for async snapshot workers. */
    private final ExecutorService asyncOperationsThreadPool;

    private final RecordWriterDelegate<SerializationDelegate<StreamRecord<OUT>>> recordWriter;

    protected final MailboxProcessor mailboxProcessor;

    final MailboxExecutor mainMailboxExecutor;

    /** TODO it might be replaced by the global IO executor on TaskManager level future. */
    private final ExecutorService channelIOExecutor;

    private Long syncSavepointId = null;
    private Long activeSyncSavepointId = null;

    private long latestAsyncCheckpointStartDelayNanos;

构造方法就是普通的赋值操作, 需要注意的是 this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor);

   protected StreamTask(
            Environment environment,
            @Nullable TimerService timerService,
            Thread.UncaughtExceptionHandler uncaughtExceptionHandler,
            StreamTaskActionExecutor actionExecutor,
            TaskMailbox mailbox)
            throws Exception {

        super(environment);

        this.configuration = new StreamConfig(getTaskConfiguration());
        this.recordWriter = createRecordWriterDelegate(configuration, environment);
        this.actionExecutor = Preconditions.checkNotNull(actionExecutor);
        this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor);
        this.mailboxProcessor.initMetric(environment.getMetricGroup());
        this.mainMailboxExecutor = mailboxProcessor.getMainMailboxExecutor();
        this.asyncExceptionHandler = new StreamTaskAsyncExceptionHandler(environment);
        this.asyncOperationsThreadPool =
                Executors.newCachedThreadPool(
                        new ExecutorThreadFactory("AsyncOperations", uncaughtExceptionHandler));

        this.stateBackend = createStateBackend();

        // ????????????
        this.subtaskCheckpointCoordinator =
                new SubtaskCheckpointCoordinatorImpl(
                        stateBackend.createCheckpointStorage(getEnvironment().getJobID()),
                        getName(),
                        actionExecutor,
                        getCancelables(),
                        getAsyncOperationsThreadPool(),
                        getEnvironment(),
                        this,
                        configuration.isUnalignedCheckpointsEnabled(),
                        this::prepareInputSnapshot);

        // if the clock is not already set, then assign a default TimeServiceProvider
        if (timerService == null) {
            ThreadFactory timerThreadFactory =
                    new DispatcherThreadFactory(
                            TRIGGER_THREAD_GROUP, "Time Trigger for " + getName());
            this.timerService =
                    new SystemProcessingTimeService(this::handleTimerException, timerThreadFactory);
        } else {
            this.timerService = timerService;
        }

        this.channelIOExecutor =
                Executors.newSingleThreadExecutor(
                        new ExecutorThreadFactory("channel-state-unspilling"));

        injectChannelStateWriterIntoChannels();
    }

3.2. invoke

invoke是Task的核心方法, 看下都干了啥…

Invoke之前 : beforeInvoke();
Invoke: runMailboxLoop();
Invoke之后: afterInvoke();
清理: cleanUpInvoke();

// map之类的算子...
    @Override
    public final void invoke() throws Exception {
        try {

            // 初始化行管...
            beforeInvoke();

            // final check to exit early before starting to run
            if (canceled) {
                throw new CancelTaskException();
            }

            // [核心] 执行任务...
            // let the task do its work
            runMailboxLoop();

            // if this left the run() method cleanly despite the fact that this was canceled,
            // make sure the "clean shutdown" is not attempted
            if (canceled) {
                throw new CancelTaskException();
            }

            afterInvoke();
        } catch (Throwable invokeException) {
            failing = !canceled;
            try {
                cleanUpInvoke();
            }
            // TODO: investigate why Throwable instead of Exception is used here.
            catch (Throwable cleanUpException) {
                Throwable throwable =
                        ExceptionUtils.firstOrSuppressed(cleanUpException, invokeException);
                ExceptionUtils.rethrowException(throwable);
            }
            ExceptionUtils.rethrowException(invokeException);
        }
        cleanUpInvoke();
    }

3.2.1. beforeInvoke

构造: OperatorChain 和执行实例化Task类的初始化init方法…

protected void beforeInvoke() throws Exception {
    disposedOperators = false;

    // Initializing Source: Socket Stream -> Flat Map (1/1)#0.
    // Initializing Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1)#0.
    LOG.debug("Initializing {}.", getName());

    operatorChain = new OperatorChain<>(this, recordWriter);

    //    mainOperator = {StreamSource@6752}
    //        ctx = null
    //        canceledOrStopped = false
    //        hasSentMaxWatermark = false
    //        userFunction = {SocketTextStreamFunction@6754}
    //        functionsClosed = false
    //        chainingStrategy = {ChainingStrategy@6755} "HEAD"
    //        container = {SourceStreamTask@6554} "Source: Socket Stream -> Flat Map (1/1)#0"
    //        config = {StreamConfig@6756} "\n=======================Stream Config=======================\nNumber of non-chained inputs: 0\nNumber of non-chained outputs: 0\nOutput names: []\nPartitioning:\nChained subtasks: [(Source: Socket Stream-1 -> Flat Map-2, typeNumber=0, outputPartitioner=FORWARD, bufferTimeout=-1, outputTag=null)]\nOperator: SimpleUdfStreamOperatorFactory\nState Monitoring: false\n\n\n---------------------\nChained task configs\n---------------------\n{2=\n=======================Stream Config=======================\nNumber of non-chained inputs: 0\nNumber of non-chained outputs: 1\nOutput names: [(Flat Map-2 -> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction)-4, typeNumber=0, outputPartitioner=HASH, bufferTimeout=-1, outputTag=null)]\nPartitioning:\n\t4: HASH\nChained subtasks: []\nOperator: SimpleUdfStreamOperatorFactory\nState Monitoring: false}"
    //        output = {CountingOutput@6757}
    //        runtimeContext = {StreamingRuntimeContext@6758}
    //        stateKeySelector1 = null
    //        stateKeySelector2 = null
    //        stateHandler = null
    //        timeServiceManager = null
    //        metrics = {OperatorMetricGroup@6759}
    //        latencyStats = {LatencyStats@6760}
    //        processingTimeService = {ProcessingTimeServiceImpl@6761}
    //        combinedWatermark = -9223372036854775808
    //        input1Watermark = -9223372036854775808
    //        input2Watermark = -9223372036854775808




    mainOperator = operatorChain.getMainOperator();

    // 执行任务初始化操作.
    // task specific initialization
    init();

    // save the work of reloading state, etc, if the task is already canceled
    if (canceled) {
        throw new CancelTaskException();
    }

    // -------- Invoke --------
    // Invoking Source: Socket Stream -> Flat Map (1/1)#0
    LOG.debug("Invoking {}", getName());

    // 我们需要确保open（）中安排的所有触发器在所有操作符打开之前都不能执行
    // we need to make sure that any triggers scheduled in open() cannot be
    // executed before all operators are opened
    actionExecutor.runThrowing(
            () -> {
                
                SequentialChannelStateReader reader =
                        getEnvironment()
                                .getTaskStateManager()
                                .getSequentialChannelStateReader();
                // TODO: for UC rescaling, reenable notifyAndBlockOnCompletion for non-iterative
                // jobs
                reader.readOutputData(getEnvironment().getAllWriters(), false);

                operatorChain.initializeStateAndOpenOperators(
                        createStreamTaskStateInitializer());

                channelIOExecutor.execute(
                        () -> {
                            try {
                                reader.readInputData(getEnvironment().getAllInputGates());
                            } catch (Exception e) {
                                asyncExceptionHandler.handleAsyncException(
                                        "Unable to read channel state", e);
                            }
                        });

                for (InputGate inputGate : getEnvironment().getAllInputGates()) {
                    inputGate
                            .getStateConsumedFuture()
                            .thenRun(
                                    () ->
                                            mainMailboxExecutor.execute(
                                                    inputGate::requestPartitions,
                                                    "Input gate request partitions"));
                }
            });

    isRunning = true;
}

3.2.2. runMailboxLoop

 public void runMailboxLoop() throws Exception {
        // runMailboxLoop ??
        //
        mailboxProcessor.runMailboxLoop();
    }


    // 运行邮箱处理循环。 这是完成主要工作的地方。
    /** Runs the mailbox processing loop. This is where the main work is done. */
    public void runMailboxLoop() throws Exception {

        final TaskMailbox localMailbox = mailbox;

        Preconditions.checkState(
                localMailbox.isMailboxThread(),
                "Method must be executed by declared mailbox thread!");

        assert localMailbox.getState() == TaskMailbox.State.OPEN : "Mailbox must be opened!";

        final MailboxController defaultActionContext = new MailboxController(this);

        // 邮箱里有邮件,就进行处理. 邮件就是类似map之类的任务...
        while (isMailboxLoopRunning()) {
            // 在默认操作可用之前，阻塞的`processMail`调用将不会返回。
            // The blocking `processMail` call will not return until default action is available.
            processMail(localMailbox, false);

            if (isMailboxLoopRunning()) {
                // 邮箱默认操作在StreamTask构造器中指定,为 processInput
                mailboxDefaultAction.runDefaultAction(
                        // 根据需要在默认操作中获取锁
                        // lock is acquired inside default action as needed
                        defaultActionContext);

            }

        }
    }

3.2.3. afterInvoke

   protected void afterInvoke() throws Exception {
        LOG.debug("Finished task {}", getName());
        getCompletionFuture().exceptionally(unused -> null).join();

        final CompletableFuture<Void> timersFinishedFuture = new CompletableFuture<>();

        // 以 chain effect 方式关闭所有运算符
        // close all operators in a chain effect way
        operatorChain.closeOperators(actionExecutor);

        // 确保没有进一步的检查点和通知操作发生。
        //同时，这可以确保在任何“常规”出口时

        // make sure no further checkpoint and notification actions happen.
        // at the same time, this makes sure that during any "regular" exit where still
        actionExecutor.runThrowing(
                () -> {

                    // 确保没有新的计时器
                    // make sure no new timers can come
                    FutureUtils.forward(timerService.quiesce(), timersFinishedFuture);

                    // 让邮箱执行拒绝从这一点开始的所有新信件
                    // let mailbox execution reject all new letters from this point
                    mailboxProcessor.prepareClose();

                    // 仅在关闭所有运算符后将StreamTask设置为not running！
                    // only set the StreamTask to not running after all operators have been closed!
                    // See FLINK-7430
                    isRunning = false;
                });
        // 处理剩余邮件；无法排队发送新邮件
        // processes the remaining mails; no new mails can be enqueued
        mailboxProcessor.drain();

        // 确保所有计时器都完成
        // make sure all timers finish
        timersFinishedFuture.get();

        LOG.debug("Closed operators for task {}", getName());

        // 确保刷新了所有缓冲数据
        // make sure all buffered data is flushed
        operatorChain.flushOutputs();

        // 尝试释放操作符，使dispose调用中的失败仍然会导致计算失败
        // make an attempt to dispose the operators such that failures in the dispose call
        // still let the computation fail
        disposeAllOperators();
    }

3.2.4. cleanUpInvoke

释放各种资源…

protected void cleanUpInvoke() throws Exception {
    getCompletionFuture().exceptionally(unused -> null).join();
    // clean up everything we initialized
    isRunning = false;

    // Now that we are outside the user code, we do not want to be interrupted further
    // upon cancellation. The shutdown logic below needs to make sure it does not issue calls
    // that block and stall shutdown.
    // Additionally, the cancellation watch dog will issue a hard-cancel (kill the TaskManager
    // process) as a backup in case some shutdown procedure blocks outside our control.
    setShouldInterruptOnCancel(false);

    // clear any previously issued interrupt for a more graceful shutdown
    Thread.interrupted();

    // stop all timers and threads
    Exception suppressedException =
            runAndSuppressThrowable(this::tryShutdownTimerService, null);

    // stop all asynchronous checkpoint threads
    suppressedException = runAndSuppressThrowable(cancelables::close, suppressedException);
    suppressedException =
            runAndSuppressThrowable(this::shutdownAsyncThreads, suppressedException);

    // we must! perform this cleanup
    suppressedException = runAndSuppressThrowable(this::cleanup, suppressedException);

    // if the operators were not disposed before, do a hard dispose
    suppressedException =
            runAndSuppressThrowable(this::disposeAllOperators, suppressedException);

    // release the output resources. this method should never fail.
    suppressedException =
            runAndSuppressThrowable(this::releaseOutputResources, suppressedException);

    suppressedException =
            runAndSuppressThrowable(channelIOExecutor::shutdown, suppressedException);

    suppressedException = runAndSuppressThrowable(mailboxProcessor::close, suppressedException);

    if (suppressedException != null) {
        throw suppressedException;
    }
}

3.3. Checkpoint相关

Checkpoint相关方法

名称	描述
triggerCheckpointAsync	此方法由检查点协调器异步调用以触发检查点。
triggerCheckpointOnBarrier	在所有 input streams 上接收到检查点屏障而触发检查点时，将调用此方法。
abortCheckpointOnBarrier	在接收一些checkpoint barriers 的结果时, 放弃checkpoint …
notifyCheckpointCompleteAsync	通知checkpoint完成
notifyCheckpointAbortAsync	通知notifyCheckpointAbortAsync取消

3.3.1. triggerCheckpointAsync

通过akka通知触发Checkpoint操作.

@Override
public Future<Boolean> triggerCheckpointAsync(
        CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions) {

    CompletableFuture<Boolean> result = new CompletableFuture<>();
    mainMailboxExecutor.execute(
            () -> {
                latestAsyncCheckpointStartDelayNanos =
                        1_000_000
                                * Math.max(
                                        0,
                                        System.currentTimeMillis()
                                                - checkpointMetaData.getTimestamp());
                try {
                    // 触发Checkpoint操作
                    result.complete(triggerCheckpoint(checkpointMetaData, checkpointOptions));
                } catch (Exception ex) {
                    // Report the failure both via the Future result but also to the mailbox
                    result.completeExceptionally(ex);
                    throw ex;
                }
            },
            "checkpoint %s with %s",
            checkpointMetaData,
            checkpointOptions);
    return result;
}

triggerCheckpoint
triggerCheckpoint会调用performCheckpoint开始执行Checkpoint .

private boolean triggerCheckpoint(
        CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions)
        throws Exception {
    try {
        // 如果我们注入检查点，则不对齐
        // No alignment if we inject a checkpoint
        CheckpointMetricsBuilder checkpointMetrics =
                new CheckpointMetricsBuilder()
                        .setAlignmentDurationNanos(0L)
                        .setBytesProcessedDuringAlignment(0L);

        // 初始化Checkpoint
        subtaskCheckpointCoordinator.initCheckpoint(
                checkpointMetaData.getCheckpointId(), checkpointOptions);

        // 执行 Checkpoint操作...
        boolean success =
                performCheckpoint(checkpointMetaData, checkpointOptions, checkpointMetrics);
        if (!success) {
            declineCheckpoint(checkpointMetaData.getCheckpointId());
        }
        return success;
    } catch (Exception e) {
        // propagate exceptions only if the task is still in "running" state
        if (isRunning) {
            throw new Exception(
                    "Could not perform checkpoint "
                            + checkpointMetaData.getCheckpointId()
                            + " for operator "
                            + getName()
                            + '.',
                    e);
        } else {
            LOG.debug(
                    "Could not perform checkpoint {} for operator {} while the "
                            + "invokable was not in state running.",
                    checkpointMetaData.getCheckpointId(),
                    getName(),
                    e);
            return false;
        }
    }
}

   private boolean performCheckpoint(
            CheckpointMetaData checkpointMetaData,
            CheckpointOptions checkpointOptions,
            CheckpointMetricsBuilder checkpointMetrics)
            throws Exception {

        LOG.debug(
                "Starting checkpoint ({}) {} on task {}",
                checkpointMetaData.getCheckpointId(),
                checkpointOptions.getCheckpointType(),
                getName());

        if (isRunning) {
            actionExecutor.runThrowing(
                    () -> {
                        if (checkpointOptions.getCheckpointType().isSynchronous()) {
                            setSynchronousSavepointId(
                                    checkpointMetaData.getCheckpointId(),
                                    checkpointOptions.getCheckpointType().shouldIgnoreEndOfInput());

                            if (checkpointOptions.getCheckpointType().shouldAdvanceToEndOfTime()) {
                                advanceToEndOfEventTime();
                            }
                        } else if (activeSyncSavepointId != null
                                && activeSyncSavepointId < checkpointMetaData.getCheckpointId()) {
                            activeSyncSavepointId = null;
                            operatorChain.setIgnoreEndOfInput(false);
                        }
                        // 交由subtaskCheckpointCoordinator 进行checkpointState 操作...
                        subtaskCheckpointCoordinator.checkpointState(
                                checkpointMetaData,
                                checkpointOptions,
                                checkpointMetrics,
                                operatorChain,
                                this::isRunning);
                    });

            return true;
        } else {
            actionExecutor.runThrowing(
                    () -> {
                        // we cannot perform our checkpoint - let the downstream operators know that
                        // they
                        // should not wait for any input from this operator

                        // we cannot broadcast the cancellation markers on the 'operator chain',
                        // because it may not
                        // yet be created
                        final CancelCheckpointMarker message =
                                new CancelCheckpointMarker(checkpointMetaData.getCheckpointId());
                        recordWriter.broadcastEvent(message);
                    });

            return false;
        }
    }

3.3.2. triggerCheckpointOnBarrier

@Override
public void triggerCheckpointOnBarrier(
        CheckpointMetaData checkpointMetaData,
        CheckpointOptions checkpointOptions,
        CheckpointMetricsBuilder checkpointMetrics)
        throws IOException {

    try {
        // 执行Checkpoint
        if (performCheckpoint(checkpointMetaData, checkpointOptions, checkpointMetrics)) {
            if (isSynchronousSavepointId(checkpointMetaData.getCheckpointId())) {
                runSynchronousSavepointMailboxLoop();
            }
        }
    } catch (CancelTaskException e) {
        LOG.info(
                "Operator {} was cancelled while performing checkpoint {}.",
                getName(),
                checkpointMetaData.getCheckpointId());
        throw e;
    } catch (Exception e) {
        throw new IOException(
                "Could not perform checkpoint "
                        + checkpointMetaData.getCheckpointId()
                        + " for operator "
                        + getName()
                        + '.',
                e);
    }
}

3.3.3. abortCheckpointOnBarrier

取消CheckpointOnBarrier

  @Override
    public void abortCheckpointOnBarrier(long checkpointId, Throwable cause) throws IOException {
        resetSynchronousSavepointId(checkpointId, false);
        subtaskCheckpointCoordinator.abortCheckpointOnBarrier(checkpointId, cause, operatorChain);
    }

3.3.4. notifyCheckpointCompleteAsync

@Override
public Future<Void> notifyCheckpointCompleteAsync(long checkpointId) {
    return notifyCheckpointOperation(
            () -> notifyCheckpointComplete(checkpointId),
            String.format("checkpoint %d complete", checkpointId));
}

3.3.5. notifyCheckpointAbortAsync

  @Override
    public Future<Void> notifyCheckpointAbortAsync(long checkpointId) {
        return notifyCheckpointOperation(
                () -> {
                    resetSynchronousSavepointId(checkpointId, false);
                    subtaskCheckpointCoordinator.notifyCheckpointAborted(
                            checkpointId, operatorChain, this::isRunning);
                },
                String.format("checkpoint %d aborted", checkpointId));
    }

3.4. dispatchOperatorEvent

@Override
public void dispatchOperatorEvent(OperatorID operator, SerializedValue<OperatorEvent> event)
        throws FlinkException {
    try {
        mainMailboxExecutor.execute(
                () -> operatorChain.dispatchOperatorEvent(operator, event),
                "dispatch operator event");
    } catch (RejectedExecutionException e) {
        // this happens during shutdown, we can swallow this
    }
}

3.5. injectChannelStateWriterIntoChannels

建立 channels, 操作的的输入和输出进行打通.

 private void injectChannelStateWriterIntoChannels() {
        final Environment env = getEnvironment();
        final ChannelStateWriter channelStateWriter =
                subtaskCheckpointCoordinator.getChannelStateWriter();
        for (final InputGate gate : env.getAllInputGates()) {
            gate.setChannelStateWriter(channelStateWriter);
        }
        for (ResultPartitionWriter writer : env.getAllWriters()) {
            if (writer instanceof ChannelStateHolder) {
                ((ChannelStateHolder) writer).setChannelStateWriter(channelStateWriter);
            }
        }
    }

四. 处理数据

在invoke方法中的runMailboxLoop 会调用mailboxProcessor.runMailboxLoop();
获取默认的 mailboxDefaultAction 执行runDefaultAction 操作…

 // 运行邮箱处理循环。 这是完成主要工作的地方。
    /** Runs the mailbox processing loop. This is where the main work is done. */
    public void runMailboxLoop() throws Exception {

        final TaskMailbox localMailbox = mailbox;

        Preconditions.checkState(
                localMailbox.isMailboxThread(),
                "Method must be executed by declared mailbox thread!");

        assert localMailbox.getState() == TaskMailbox.State.OPEN : "Mailbox must be opened!";

        final MailboxController defaultActionContext = new MailboxController(this);

        // 邮箱里有邮件,就进行处理. 邮件就是类似map之类的任务...
        while (isMailboxLoopRunning()) {
            // 在默认操作可用之前，阻塞的`processMail`调用将不会返回。
            // The blocking `processMail` call will not return until default action is available.
            processMail(localMailbox, false);

            if (isMailboxLoopRunning()) {
                // 邮箱默认操作在StreamTask构造器中指定,为 processInput
                mailboxDefaultAction.runDefaultAction(
                        // 根据需要在默认操作中获取锁
                        // lock is acquired inside default action as needed
                        defaultActionContext);

            }

        }
    }

在这里有个疑问, runDefaultAction是什么.
在StreamTask构造方法中构造MailboxProcessor的时候, 有指定默认的runDefaultAction

// todo [重点关注]
this.mailboxProcessor = new MailboxProcessor(this::processInput, mailbox, actionExecutor);

所以默认的实现是processInput .

processInput方法实现任务的默认操作（例如，从输入中处理一个事件）。实现应该（通常）是非阻塞的。

   /**
     *
     * 此方法实现任务的默认操作（例如，处理来自输入的一个事件）。 （通常）实现应是非阻塞的。
     *
     * This method implements the default action of the task (e.g. processing one event from the
     * input). Implementations should (in general) be non-blocking.
     *
     *  控制器对象，用于操作和流任务之间的协作交互。
     * @param controller controller object for collaborative interaction between the action and the stream task.
     *
     * @throws Exception on any problems in the action.
     */
    protected void processInput(MailboxDefaultAction.Controller controller) throws Exception {
        // 获取 输入 Processor

        // 有三种 :
        // StreamOneInputProcessor
        // StreamTwoInputProcessor
        // StreamMultipleInputProcessor
        InputStatus status = inputProcessor.processInput();
        if (status == InputStatus.MORE_AVAILABLE && recordWriter.isAvailable()) {
            return;
        }
        if (status == InputStatus.END_OF_INPUT) {
            controller.allActionsCompleted();
            return;
        }
        CompletableFuture<?> jointFuture = getInputOutputJointFuture(status);
        MailboxDefaultAction.Suspension suspendedDefaultAction = controller.suspendDefaultAction();
        assertNoException(jointFuture.thenRun(suspendedDefaultAction::resume));
    }

org.apache.flink.streaming.runtime.io.StreamOneInputProcessor#processInput
	@Override
	public InputStatus processInput() throws Exception {
	    // StreamTaskInput#emitNext ???
	    //  input 直接发送数据给 output
        // StreamTaskNetworkInput#emitNext
	    // org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput#emitNext
		InputStatus status = input.emitNext(output);

		if (status == InputStatus.END_OF_INPUT) {
			endOfInputAware.endInput(input.getInputIndex() + 1);
		}

		return status;
	}

org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput#emitNext
	@Override
	public InputStatus emitNext(DataOutput<T> output) throws Exception {

		while (true) {
			// get the stream element from the deserializer
			if (currentRecordDeserializer != null) {
				DeserializationResult result = currentRecordDeserializer.getNextRecord(deserializationDelegate);
				if (result.isBufferConsumed()) {
					currentRecordDeserializer.getCurrentBuffer().recycleBuffer();
					currentRecordDeserializer = null;
				}

				if (result.isFullRecord()) {
					//todo  processElement
					processElement(deserializationDelegate.getInstance(), output);
					return InputStatus.MORE_AVAILABLE;
				}
			}

			Optional<BufferOrEvent> bufferOrEvent = checkpointedInputGate.pollNext();
			if (bufferOrEvent.isPresent()) {
				// return to the mailbox after receiving a checkpoint barrier to avoid processing of
				// data after the barrier before checkpoint is performed for unaligned checkpoint mode
				if (bufferOrEvent.get().isBuffer()) {
					processBuffer(bufferOrEvent.get());
				} else {
					processEvent(bufferOrEvent.get());
					return InputStatus.MORE_AVAILABLE;
				}
			} else {
				if (checkpointedInputGate.isFinished()) {
					checkState(checkpointedInputGate.getAvailableFuture().isDone(), "Finished BarrierHandler should be available");
					return InputStatus.END_OF_INPUT;
				}
				return InputStatus.NOTHING_AVAILABLE;
			}
		}
	}

org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput#processElement
	  // 处理任务...
    private void processElement(StreamElement recordOrMark, DataOutput<T> output) throws Exception {


        if (recordOrMark.isRecord()) {
            //  [ 重点 ]  如果是数据
            // OneInputStreamTask $ StreamTaskNetworkOutput#emitRecord
            output.emitRecord(recordOrMark.asRecord());

        } else if (recordOrMark.isWatermark()) {
            // 如果是 Watermark ...
            statusWatermarkValve.inputWatermark(
                    recordOrMark.asWatermark(), flattenedChannelIndices.get(lastChannel), output);
        } else if (recordOrMark.isLatencyMarker()) {
            // 如果是 迟到的数据
            output.emitLatencyMarker(recordOrMark.asLatencyMarker());
        } else if (recordOrMark.isStreamStatus()) {
            // 如果是 StreamStatus
            statusWatermarkValve.inputStreamStatus(
                    recordOrMark.asStreamStatus(),
                    flattenedChannelIndices.get(lastChannel),
                    output);
        } else {
            throw new UnsupportedOperationException("Unknown type of StreamElement");
        }
    }

如果是 map 算子， emitRecord 应该在 OneInputStreamTask.java 调用

@Override
public void emitRecord(StreamRecord<IN> record) throws Exception {

    numRecordsIn.inc();

    operator.setKeyContextElement1(record);
    // 转换操作
    // 如果是map之类的算子, processElement应该在 StreamMap.java调用
    operator.processElement(record);

}

@Override
public void processElement(StreamRecord<IN> element) throws Exception {
    // userFunction.map(element.getValue()) 就是用户定义的MapFunction里面的map方法
    // 将element.getValue() 用用户自定义的map方法里面的内容进行处理...
    output.collect(element.replace(userFunction.map(element.getValue())));
}

九师兄

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
【flink】Flink 1.12.2 源码浅析 : StreamTask 浅析

1.概述转载：Flink 1.12.2 源码浅析 : StreamTask 浅析在Task类的doRun方法中, 首先会构建一个运行环境变量RuntimeEnvironment . 然后会调用loadAndInstantiateInvokable方法来加载&实例化task的可执行代码 .可以看一下loadAndInstantiateInvokable 方法会根据传入的类加载器userCodeClassLoader.asClassLoader()、实例化类的名字nameOfInvokableC.
复制链接

扫一扫