【flink】Flink 1.12.2 源码浅析 : Task数据输入

839 篇文章 848 订阅 ¥99.90 ¥299.90
本文详细分析了Flink 1.12.2版本中Task的InputGate和InputChannel,包括InputGate的实现如SingleInputGate和UnionInputGate,以及InputGate的接口如PullingAsyncDataInput和ChannelStateHolder。InputGate由InputChannel组成,InputChannel分为LocalInputChannel和RemoteInputChannel,分别处理本地和远程数据交换。此外,还讨论了InputGate和InputChannel的关键方法和功能,如数据获取、通道状态管理和checkpoint相关操作。
摘要由CSDN通过智能技术生成

在这里插入图片描述

1.概述

转载:Flink 1.12.2 源码浅析 : Task数据输入

在这里插入图片描述
在 Task 中,InputGate 是对输入的封装,InputGate 是和 JobGraph 中 JobEdge 一一对应的。
也就是说,InputGate 实际上对应的是该 Task 依赖的上游算子(包含多个并行子任务),每个 InputGate 消费了一个或多个 ResultPartition。
InputGate 由 InputChannel 构成,InputChannel 和 ExecutionGraph 中的 ExecutionEdge 一一对应;也就是说, InputChannel 和 ResultSubpartition 一一相连,一个 InputChannel 接收一个 ResultSubpartition 的输出。根据读取的 ResultSubpartition 的位置,InputChannel 有 LocalInputChannel 和 RemoteInputChannel 两种不同的实现。

二 .InputGate

Task 的输入被抽象为 InputGate, 而 InputGate 则由 InputChannel 组成, InputChannel 和该 Task 需要消费的 ResultSubpartition 是一一对应的。

InputGate 消费 单个生成的中间结果的一个或多个分区。
每个中间结果在其产生的并行子任务上被划分;
这些分区中的每一个都被进一步划分为一个或多个子分区。

例如,考虑一个map reduce程序,其中map操作符生成数据,reduce操作符使用生成的数据。

/**
 * InputGate 消费 单个生成的中间结果的一个或多个分区。
 *
 * 每个中间结果在其产生的并行子任务上被划分;
 *
 * 这些分区中的每一个都被进一步划分为一个或多个子分区。
 *
 * 例如,考虑一个map reduce程序,其中map操作符生成数据,reduce操作符使用生成的数据。
 *
 * <pre>{@code
 * +-----+              +---------------------+              +--------+
 * | Map | = produce => | Intermediate Result | <= consume = | Reduce |
 * +-----+              +---------------------+              +--------+
 * }</pre>
 *
 * 
 * 
 * */

当并行部署这样一个程序时,中间结果将被划分到它产生的并行子任务上;
这些分区中的每一个都被进一步划分为一个或多个子分区。

 * <pre>{@code
 *                            Intermediate result
 *               +-----------------------------------------+
 *               |                      +----------------+ |              +-----------------------+
 * +-------+     | +-------------+  +=> | Subpartition 1 | | <=======+=== | Input Gate | Reduce 1 |
 * | Map 1 | ==> | | Partition 1 | =|   +----------------+ |         |    +-----------------------+
 * +-------+     | +-------------+  +=> | Subpartition 2 | | <==+    |
 *               |                      +----------------+ |    |    | Subpartition request
 *               |                                         |    |    |
 *               |                      +----------------+ |    |    |
 * +-------+     | +-------------+  +=> | Subpartition 1 | | <==+====+
 * | Map 2 | ==> | | Partition 2 | =|   +----------------+ |    |         +-----------------------+
 * +-------+     | +-------------+  +=> | Subpartition 2 | | <==+======== | Input Gate | Reduce 2 |
 *               |                      +----------------+ |              +-----------------------+
 *               +-----------------------------------------+
 * }</pre>
 *
 */

在上述示例中,两个map子任务并行生成中间结果,从而产生两个分区(分区1和分区2)。
每个分区进一步划分为两个子分区 —— 每个并行reduce子任务一个子分区。
如图所示,每个reduce任务都有一个连接到它的输入gate。
这将提供它的输入,它将由中间结果的每个分区的一个子分区组成。

2.1. InputGate实现的接口

InputGate是一个抽象类, 它实现了 PullingAsyncDataInput, AutoCloseable, ChannelStateHolder 这三个 接口
调重点看下

在这里插入图片描述

2.1.1. PullingAsyncDataInput

接口定义了异步和非阻塞数据轮询的两种基本方法。

为了获得最有效的使用,这个类的用户应该调用{@link #pollNext()},直到它返回不再有可用的元素为止。

如果发生这种情况,他应该检查输入{@link #isFinished()}。

如果没有,他应该等待{@link #getAvailableFuture()}{@link CompletableFuture}完成。例如:

 /**
 * <pre>{@code
 * AsyncDataInput<T> input = ...;
 * while (!input.isFinished()) {
 * 	Optional<T> next;
 *
 * 	while (true) {
 * 		next = input.pollNext();
 * 		if (!next.isPresent()) {
 * 			break;
 * 		}
 * 		// do something with next
 * 	}
 *
 * 	input.getAvailableFuture().get();
 * }
 */ }</pre>

方法
PullingAsyncDataInput 就两个方法

名称描述
Optional pollNext() throws Exception;获取下一个 元素, 该方法应该是非阻塞的
boolean isFinished();是否完成

2.1.2. ChannelStateHolder

由持有任何类型通道状态并需要引用{@link ChannelStateWriter}的实体实现。

仅仅有一个方法, 并且只能调用一次…

/** Injects the {@link ChannelStateWriter}. Must only be called once. */
    void setChannelStateWriter(ChannelStateWriter channelStateWriter);

2.1.3. AutoCloseable

只有一个 void close() throws Exception; 方法…

2.2. 方法清单

名称描述
setChannelStateWriter(ChannelStateWriter channelStateWriter)设置ChannelStateWriter(
abstract int getNumberOfInputChannels是否完成
abstract Optional getNext()正在阻塞等待下一个{@link BufferOrEvent}的调用。( 在得到下一个缓冲区之前,应该保证上一个返回的缓冲区已经被回收。)
abstract Optional pollNext()非阻塞 , 轮询{@link BufferOrEvent}。
abstract void sendTaskEvent(TaskEvent event)发送 任务事件
abstract void resumeConsumption(InputChannelInfo channelInfo)请求消费 ResultPartition
abstract InputChannel getChannel(int channelIndex)返回次gate的channel
List getChannelInfos()返回这个gate 的 channel 信息
CompletableFuture<?> getPriorityEventAvailableFuture()当优先级事件已排队时通知。如果从任务线程查询这个future,可以保证优先级事件可用并通过 {@link #getNext()} 检索。
abstract void setup()设置gate,可能很重的重量,阻塞操作相比,只是创建。
abstract void requestPartitions()请求消费 ResultPartition

三 .InputGate 实现

Task 通过循环调用 InputGate.getNextBufferOrEvent 方法获取输入数据,并将获取的数据交给它所封装的算子进行处理,这构成了一个 Task 的基本运行逻辑。

InputGate 有两个具体的实现,分别为 SingleInputGate 和 UnionInputGate, UnionInputGate 有多个 SingleInputGate 联合构成。
在这里插入图片描述

3.1. SingleInputGate

3.1.1. IndexedInputGate

SingleInputGate的父类为IndexedInputGate 里面定义了一下checkpoint相关的方法…

名称描述
abstract int getGateIndex()获取改inputgate的编号
void checkpointStarted(CheckpointBarrier barrier)启动 checkpoint
void checkpointStopped(long cancelledCheckpointId)停止checkpoint
getInputGateIndex()获取改inputgate的编号
void blockConsumption(InputChannelInfo channelInfo)未使用。网络堆栈通过取消信用卡自动阻止消费。

3.1.2. 属性

/**
 *
 Lock object to guard partition requests and runtime channel updates.
 锁定对象以保护分区请求和运行时通道更新。
 *  Lock object to guard partition requests and runtime channel updates. */
private final Object requestLock = new Object();

/**
 * 所属任务的名称,用于日志记录。
 * owningTaskName = "Flat Map (2/4)#0 (0ef8b3d70af60be8633af8af4e1c0698)"
 * The name of the owning task, for logging purposes. */
private final String owningTaskName;

private final int gateIndex;

/**
 * The ID of the consumed intermediate result. Each input gate consumes partitions of the
 * intermediate result specified by this ID. This ID also identifies the input gate at the
 * consuming task.
 *
 * 消费上一算子输出结果子分区的 ID
 * {IntermediateDataSetID@8214} "5eba1007ad48ad2243891e1eff29c32b"
 *
 */
private final IntermediateDataSetID consumedResultId;

/**
 * 结果分区的类型 : {ResultPartitionType@7380} "PIPELINED_BOUNDED"
 * The type of the partition the input gate is consuming. */
private final ResultPartitionType consumedPartitionType;

/**
 * 消费子分区的 index
 * The index of the consumed subpartition of each consumed partition. This index depends on the
 * {@link DistributionPattern} and the subtask indices of the producing and consuming task.
 */
private final int consumedSubpartitionIndex;

/**
 * inputchannel的数量
 * The number of input channels (equivalent to the number of consumed partitions). */
private final int numberOfInputChannels;

/**
 * InputGate中所有的 Input channels.
 * 结果分区 --> input channels
 * 每个消耗的中间结果分区都有一个输入通道。
 * 我们将其存储在一个映射中,用于单个通道的运行时更新
 * inputChannels = {HashMap@8215}  size = 1
 *         {IntermediateResultPartitionID@8237} "5eba1007ad48ad2243891e1eff29c32b#0" -> {LocalRecoveredInputChannel@8238}
 *
 *
 *
 * Input channels. There is a one input channel for each consumed intermediate result partition.
 * We store this in a map for runtime updates of single channels.
 */
private final Map<IntermediateResultPartitionID, InputChannel> inputChannels;

/**
 * InputGate中所有的 Input channels.
 *        channels = {InputChannel[1]@8216}
 *              0 = {LocalRecoveredInputChannel@8238}
 */
@GuardedBy("requestLock")
private final InputChannel[] channels;

/**
 * InputChannel 构成的队列,这些 InputChannel 中都有有可供消费的数据
 * inputChannelsWithData = {PrioritizedDeque@8217} "[]"
 * Channels, which notified this input gate about available data.
 * */
private final PrioritizedDeque<InputChannel> inputChannelsWithData = new PrioritizedDeque<>();

/**
 * 保证inputChannelsWithData队列唯一性的字段。
 *
 * 这两个字段应该统一到一个字段上。
 *
 * enqueuedInputChannelsWithData = {BitSet@8218} "{}"
 *
 * Field guaranteeing uniqueness for inputChannelsWithData queue.
 * Both of those fields should be unified onto one.
 */
@GuardedBy("inputChannelsWithData")
private final BitSet enqueuedInputChannelsWithData;

// 无分区事件的通道 ??
private final BitSet channelsWithEndOfPartitionEvents;

// 最后优先级序列号
@GuardedBy("inputChannelsWithData")
private int[] lastPrioritySequenceNumber;

/** The partition producer state listener. */
private final PartitionProducerStateProvider partitionProducerStateProvider;

/**
 * 内存管理器: LocalBufferPool
 * {LocalBufferPool@8221} "[size: 8, required: 1, requested: 1, available: 1, max: 8, listeners: 0,subpartitions: 0, maxBuffersPerChannel: 2147483647, destroyed: false]"
 *
 * Buffer pool for incoming buffers. Incoming data from remote channels is copied to buffers
 * from this pool.
 */
private BufferPool bufferPool;

private boolean hasReceivedAllEndOfPartitionEvents;

/**
 * 指示是否已请求分区的标志
 * Flag indicating whether partitions have been requested. */
private boolean requestedPartitionsFlag;

/**
 * 阻塞的Evnet
 */
private final List<TaskEvent> pendingEvents = new ArrayList<>();

//未初始化通道数
private int numberOfUninitializedChannels;

/**
 * 重新触发本地分区请求的计时器。仅在实际需要时初始化。
 * A timer to retrigger local partition requests. Only initialized if actually needed. */
private Timer retriggerLocalRequestTimer;

// bufferpoolFactory的工厂类
// {SingleInputGateFactory$lambda@8223}
private final SupplierWithException<BufferPool, IOException> bufferPoolFactory;

private final CompletableFuture<Void> closeFuture;

@Nullable private final BufferDecompressor bufferDecompressor;

// {NetworkBufferPool@7512}
private final MemorySegmentProvider memorySegmentProvider;

/**
 *  {HybridMemorySegment@8225}
 *  
 * The segment to read data from file region of bounded blocking partition by local input
 * channel.
 */
private final MemorySegment unpooledSegment;

3.1.3. setup

在InputGate的setup阶段为所有的input channel分配专属内存。查看SingleInputGate的setup方法

其实就是分配LocalBufferPool@8221 , 同一个InputGate里的所有inputchannel共用一个LocalBufferPool@8221 .

@Override
    public void setup() throws IOException {
        checkState(
                this.bufferPool == null,
                "Bug in input gate setup logic: Already registered buffer pool.");

        // 为所有的InputChannel分配专用buffer,剩下的作为浮动buffer
        setupChannels();

        // 设置bufferPool,用于分配浮动buffer
        BufferPool bufferPool = bufferPoolFactory.get();

        // 请求各个input channel需要读取的subpartition
        setBufferPool(bufferPool);
    }

	    /** Assign the exclusive buffers to all remote input channels directly for credit-based mode. */
    @VisibleForTesting
    public void setupChannels() throws IOException {
        synchronized (requestLock) {
            for (InputChannel inputChannel : inputChannels.values()) {
                // 分别调用SingleInputGate中每个InputChannel的setup方法。
                inputChannel.setup();
            }
        }
    }

3.1.4. requestPartitions

   //请求分区
    @Override
    public void requestPartitions() {
        synchronized (requestLock) {

            // 只能请求一次partition,第一次调用该方法后此flag会被设置为true
            if (!requestedPartitionsFlag) {
                if (closeFuture.isDone()) {
                    throw new IllegalStateException("Already released.");
                }

                // Sanity checks
                if (numberOfInputChannels != inputChannels.size()) {
                    throw new IllegalStateException(
                            String.format(
                                    "Bug in input gate setup logic: mismatch between "
                                            + "number of total input channels [%s] and the currently set number of input "
                                            + "channels [%s].",
                                    inputChannels.size(), numberOfInputChannels));
                }

                convertRecoveredInputChannels();

                // 请求分区数据
                internalRequestPartitions();
            }

            // 方法调用完毕设置flag为true,防止重复调用
            requestedPartitionsFlag = true;
        }
    }

  // 请求数据 ???
    private void internalRequestPartitions() {
        for (InputChannel inputChannel : inputChannels.values()) {
            try {
                //每一个channel都请求对应的子分区
                inputChannel.requestSubpartition(consumedSubpartitionIndex);
            } catch (Throwable t) {
                inputChannel.setError(t);
                return;
            }
        }
    }

3.1.5. getChannel(int channelIndex)

根据channelIndex获取指定的 InputChannel

@Override
public InputChannel getChannel(int channelIndex) {
    return channels[channelIndex];
}

3.1.6. updateInputChannel

根据 是否是本地操作,将unknownChannel转化为LocalInputChannel 或者 RemoteInputChannel.

public void updateInputChannel(
            ResourceID localLocation, NettyShuffleDescriptor shuffleDescriptor)
            throws IOException, InterruptedException {
        synchronized (requestLock) {
            if (closeFuture.isDone()) {
                // There was a race with a task failure/cancel
                return;
            }

            IntermediateResultPartitionID partitionId =
                    shuffleDescriptor.getResultPartitionID().getPartitionId();

            InputChannel current = inputChannels.get(partitionId);

            // 该InputChannel尚未明确...
            if (current instanceof UnknownInputChannel) {
                UnknownInputChannel unknownChannel = (UnknownInputChannel) current;
                boolean isLocal = shuffleDescriptor.isLocalTo(localLocation);
                InputChannel newChannel;
                if (isLocal) {
                    // LocalInputChannel
                    newChannel = unknownChannel.toLocalInputChannel();
                } else {
                    // RemoteInputChannel
                    RemoteInputChannel remoteInputChannel =
                            unknownChannel.toRemoteInputChannel(
                                    shuffleDescriptor.getConnectionId());
                    remoteInputChannel.setup();
                    newChannel = remoteInputChannel;
                }
                LOG.debug("{}: Updated unknown input channel to {}.", owningTaskName, newChannel);

                inputChannels.put(partitionId, newChannel);
                channels[current.getChannelIndex()] = newChannel;

                if (requestedPartitionsFlag) {
                    newChannel.requestSubpartition(consumedSubpartitionIndex);
                }

                for (TaskEvent event : pendingEvents) {
                    newChannel.sendTaskEvent(event);
                }

                if (--numberOfUninitializedChannels == 0) {
                    pendingEvents.clear();
                }
            }
        }
    }

3.1.7. retriggerPartitionRequest

重新触发分区请求。 其实就是触发对应的inputchannel的retriggerSubpartitionRequest

/** Retriggers a partition request. */
public void retriggerPartitionRequest(IntermediateResultPartitionID partitionId)
        throws IOException {
    synchronized (requestLock) {
        if (!closeFuture.isDone()) {
            final InputChannel ch = inputChannels.get(partitionId);

            checkNotNull(ch, "Unknown input channel with ID " + partitionId);

            LOG.debug(
                    "{}: Retriggering partition request {}:{}.",
                    owningTaskName,
                    ch.partitionId,
                    consumedSubpartitionIndex);

            if (ch.getClass() == RemoteInputChannel.class) {

                // RemoteInputChannel
                final RemoteInputChannel rch = (RemoteInputChannel) ch;
                rch.retriggerSubpartitionRequest(consumedSubpartitionIndex);
            } else if (ch.getClass() == LocalInputChannel.class) {

                // RemoteInputChannel
                final LocalInputChannel ich = (LocalInputChannel) ch;

                if (retriggerLocalRequestTimer == null) {
                    retriggerLocalRequestTimer = new Timer(true);
                }

                ich.retriggerSubpartitionRequest(
                        retriggerLocalRequestTimer, consumedSubpartitionIndex);
            } else {
                throw new IllegalStateException(
                        "Unexpected type of channel to retrigger partition: " + ch.getClass());
            }
        }
    }
}

3.1.8. close

关闭操作. 释放每一个InputChannel中的资源, 延迟释放LocalBufferPool , 如果释放成功通知所有…

@Override
public void close() throws IOException {
    boolean released = false;
    synchronized (requestLock) {
        if (!closeFuture.isDone()) {
            try {
                LOG.debug("{}: Releasing {}.", owningTaskName, this);

                if (retriggerLocalRequestTimer != null) {
                    retriggerLocalRequestTimer.cancel();
                }

                for (InputChannel inputChannel : inputChannels.values()) {
                    try {
                        // 释放资源
                        inputChannel.releaseAllResources();
                    } catch (IOException e) {
                        LOG.warn(
                                "{}: Error during release of channel resources: {}.",
                                owningTaskName,
                                e.getMessage(),
                                e);
                    }
                }

                // The buffer pool can actually be destroyed immediately after the
                // reader received all of the data from the input channels.
                if (bufferPool != null) {
                   // 释放 bufferPool
                    bufferPool.lazyDestroy();
                }
            } finally {
                released = true;
                closeFuture.complete(null);
            }
        }
    }

    if (released) {
        synchronized (inputChannelsWithData) {
            // 通知所有
            inputChannelsWithData.notifyAll();
        }
    }
}

3.1.9. getNextBufferOrEvent

Task 通过循环调用 InputGate.getNextBufferOrEvent 方法获取输入数据,
getNextBufferOrEvent方法会调用waitAndGetNextData 方法…
并将获取的数据交给它所封装的算子进行处理,
这构成了一个 Task 的基本运行逻辑。

   
/**
 * Task 通过循环调用 InputGate.getNextBufferOrEvent 方法获取输入数据,
 * 并将获取的数据交给它所封装的算子进行处理,
 * 这构成了一个 Task 的基本运行逻辑。
 *
 * @param blocking
 * @return
 * @throws IOException
 * @throws InterruptedException
 */
private Optional<BufferOrEvent> getNextBufferOrEvent(boolean blocking)
        throws IOException, InterruptedException {
    // 如果接收到所有分区终止的事件,则返回空
    if (hasReceivedAllEndOfPartitionEvents) {
        return Optional.empty();
    }

    // 如果input gate被关闭
    if (closeFuture.isDone()) {
        throw new CancelTaskException("Input gate is already closed.");
    }

    // 以{ blocking : 阻塞/非阻塞 }方式读取数据
    Optional<InputWithData<InputChannel, BufferAndAvailability>> next =
            waitAndGetNextData(blocking);
    if (!next.isPresent()) {
        return Optional.empty();
    }
	// 获取到数据
    InputWithData<InputChannel, BufferAndAvailability> inputWithData = next.get();
	// 根据Buffer 判断数据是事件还是数据..
    return Optional.of(
            transformToBufferOrEvent(
                    inputWithData.data.buffer(),
                    inputWithData.moreAvailable,
                    inputWithData.input,
                    inputWithData.morePriorityEvents));
}

有两种获取方式
阻塞式获取下一个Buffer

@Override
public Optional<BufferOrEvent> getNext() throws IOException, InterruptedException {
    return getNextBufferOrEvent(true);
}

非阻塞式获取下一个Buffer

@Override
public Optional<BufferOrEvent> pollNext() throws IOException, InterruptedException {
    return getNextBufferOrEvent(false);
}

3.1.10. waitAndGetNextData

等待获取下一个数据… [根据参数boolean blocking 判断是否阻塞获取]

private Optional<InputWithData<InputChannel, BufferAndAvailability>> waitAndGetNextData(
        boolean blocking) throws IOException, InterruptedException {
    while (true) {
        synchronized (inputChannelsWithData) {


            Optional<InputChannel> inputChannelOpt = getChannel(blocking);
            if (!inputChannelOpt.isPresent()) {
                return Optional.empty();
            }

            // 获取channel,根据blocking参数决定是否是阻塞方式
            final InputChannel inputChannel = inputChannelOpt.get();
            Optional<BufferAndAvailability> bufferAndAvailabilityOpt =
                    inputChannel.getNextBuffer();

            if (!bufferAndAvailabilityOpt.isPresent()) {
                checkUnavailability();
                continue;
            }

            final BufferAndAvailability bufferAndAvailability = bufferAndAvailabilityOpt.get();
            if (bufferAndAvailability.moreAvailable()) {
                // 将输入通道排在末尾以避免饥饿
                // enqueue the inputChannel at the end to avoid starvation
                queueChannelUnsafe(inputChannel, bufferAndAvailability.morePriorityEvents());
            }

            final boolean morePriorityEvents =
                    inputChannelsWithData.getNumPriorityElements() > 0;
            if (bufferAndAvailability.hasPriority()) {
                lastPrioritySequenceNumber[inputChannel.getChannelIndex()] =
                        bufferAndAvailability.getSequenceNumber();
                if (!morePriorityEvents) {
                    priorityAvailabilityHelper.resetUnavailable();
                }
            }

            // 如果inputChannelsWithData为空,设置为不可用状态
            checkUnavailability();

            // 返回包装后的结果
            return Optional.of(
                    new InputWithData<>(
                            inputChannel,
                            bufferAndAvailability,
                            !inputChannelsWithData.isEmpty(),
                            morePriorityEvents));
        }

3.1.11. transformToBufferOrEvent

根据Buffer 判断数据是事件还是数据… 返回对应的BufferOrEvent对象实例

	private BufferOrEvent transformToBufferOrEvent(
        Buffer buffer,
        boolean moreAvailable,
        InputChannel currentChannel,
        boolean morePriorityEvents)
        throws IOException, InterruptedException {
    // 根据Buffer 判断数据是事件还是数据..
    if (buffer.isBuffer()) {
        return transformBuffer(buffer, moreAvailable, currentChannel, morePriorityEvents);
    } else {
        return transformEvent(buffer, moreAvailable, currentChannel, morePriorityEvents);
    }
}
  private BufferOrEvent transformBuffer(
            Buffer buffer,
            boolean moreAvailable,
            InputChannel currentChannel,
            boolean morePriorityEvents) {
        return new BufferOrEvent(
                decompressBufferIfNeeded(buffer),
                currentChannel.getChannelInfo(),
                moreAvailable,
                morePriorityEvents);
    }

    private BufferOrEvent transformEvent(
            Buffer buffer,
            boolean moreAvailable,
            InputChannel currentChannel,
            boolean morePriorityEvents)
            throws IOException, InterruptedException {
        final AbstractEvent event;
        try {
            event = EventSerializer.fromBuffer(buffer, getClass().getClassLoader());
        } finally {
            buffer.recycleBuffer();
        }

        //如果是 EndOfPartitionEvent 事件,那么如果所有的 InputChannel 都接收到这个事件了
        //将 hasReceivedAllEndOfPartitionEvents 标记为 true,此后不再能获取到数据
        if (event.getClass() == EndOfPartitionEvent.class) {
            channelsWithEndOfPartitionEvents.set(currentChannel.getChannelIndex());

            if (channelsWithEndOfPartitionEvents.cardinality() == numberOfInputChannels) {

                //由于以下双方的竞争条件:
                //      1.在此方法中释放inputChannelsWithData锁并到达此位置
                //      2.空数据通知,对通道重新排队,我们可以将moreAvailable标志设置为true,而不需要更多数据。


                // Because of race condition between:
                // 1. releasing inputChannelsWithData lock in this method and reaching this place
                // 2. empty data notification that re-enqueues a channel
                // we can end up with moreAvailable flag set to true, while we expect no more data.
                checkState(!moreAvailable || !pollNext().isPresent());
                moreAvailable = false;
                hasReceivedAllEndOfPartitionEvents = true;
                markAvailable();
            }

            currentChannel.releaseAllResources();
        }

        return new BufferOrEvent(
                event,
                buffer.getDataType().hasPriority(),
                currentChannel.getChannelInfo(),
                moreAvailable,
                buffer.getSize(),
                morePriorityEvents);
    }

3.1.12. sendTaskEvent

@Override
public void sendTaskEvent(TaskEvent event) throws IOException {
    synchronized (requestLock) {
        // 循环所有的InputChannel 调用其sendTaskEvent
        for (InputChannel inputChannel : inputChannels.values()) {
            inputChannel.sendTaskEvent(event);
        }

        // 如果有尚未初始化完成的队列, 将Event加入队列
        if (numberOfUninitializedChannels > 0) {
            pendingEvents.add(event);
        }
    }
}

3.1.13. 恢复消费

@Override
public void resumeConsumption(InputChannelInfo channelInfo) throws IOException {
    checkState(!isFinished(), "InputGate already finished.");
    // BEWARE: consumption resumption only happens for streaming jobs in which all slots
    // are allocated together so there should be no UnknownInputChannel. As a result, it
    // is safe to not synchronize the requestLock here. We will refactor the code to not
    // rely on this assumption in the future.
    channels[channelInfo.getInputChannelIdx()].resumeConsumption();
}

3.1.14. Channel 通知相关

名称描述
void notifyChannelNonEmpty(InputChannel channel)当一个 InputChannel 有数据时的回调
void notifyPriorityEvent(InputChannel inputChannel, int prioritySequenceNumber)通知事件
void triggerPartitionStateCheck(ResultPartitionID partitionId)触发分区的状态监测
void queueChannel(InputChannel channel, @Nullable Integer prioritySequenceNumber)将新的channel加入队列
boolean queueChannelUnsafe(InputChannel channel, boolean priority)如果尚未排队,则对通道进行排队,可能会提高优先级。

3.2. InputGateWithMetrics

InputGateWithMetrics是InputGate的子类 其实就是多了一个 Counter 属性,用于计数相关.
DataSinkTask有用到…

有两个属性

private final IndexedInputGate inputGate;

private final Counter numBytesIn;

属性中的IndexedInputGate 其实就是SingleInputGate或者UnionInputGate之类的实现…

3.3. UnionInputGate

UnionInputGate 时多个 SingleInputGate 联合组成,它的内部有一个 inputGatesWithData 队列
Input gate wrapper合并来自多个 input gates 的输入。
每个 input gates 都连接有输入通道,从中读取数据。
在每个 input gates 处,输入通道具有从0(包含)到输入通道数(排除)的唯一ID。

 *
 * <pre>
 * +---+---+      +---+---+---+
 * | 0 | 1 |      | 0 | 1 | 2 |
 * +--------------+--------------+
 * | Input gate 0 | Input gate 1 |
 * +--------------+--------------+
 * </pre>

union input gate 将这些ID从0映射到所有union input gates的total 输入通道数,
例如,输入门0的通道保留其原始索引,
gate 1的通道索引由2设置为2–4。

 * 
 * <pre>
 * +---+---++---+---+---+
 * | 0 | 1 || 2 | 3 | 4 |
 * +--------------------+
 * | Union input gate   |
 * +--------------------+
 * </pre>
 *
 /**
     * Gates, which notified this input gate about available data. We are using it as a FIFO queue
     * of {@link InputGate}s to avoid starvation and provide some basic fairness.
     */
    private final PrioritizedDeque<IndexedInputGate> inputGatesWithData = new PrioritizedDeque<>();

四 .InputChannel

InputGate 中包含多个InputChannel的实现.
在这里插入图片描述

InputChannel 的基本逻辑也比较简单,
它的生命周期按照 requestSubpartition(int subpartitionIndex),
getNextBuffer() 和 releaseAllResources() 这样的顺序进行。

根据 InputChannel 消费的 ResultPartition 的位置,
InputChannel 有 LocalInputChannel 和 RemoteInputChannel 两中不同的实现,
分别对应本地和远程数据交换。

InputChannel 还有一个实现类是 UnknownInputChannel,
相当于是还未确定 ResultPartition 位置的情况下的占位符,
最终还是会更新为 LocalInputChannel 或是 RemoteInputChannel。

4.1. 属性

//消费的目标 ResultPartitionID
protected final ResultPartitionID partitionId;

// 归属于具体的 InputGate
protected final SingleInputGate inputGate;

// - Asynchronous error notification --------------------------------------

private final AtomicReference<Throwable> cause = new AtomicReference<Throwable>();

// - Partition request backoff --------------------------------------------

/**
 * 初始化backoff(ms)
 * The initial backoff (in ms).
 * */
protected final int initialBackoff;

/**
 * 最大的backoff(ms)
 * The maximum backoff (in ms). */
protected final int maxBackoff;

// 数据大小数量计数器
protected final Counter numBytesIn;

// Buffer数量计数器
protected final Counter numBuffersIn;

/**
 * 当前的Backoff (ms)
 * The current backoff (in ms). */
private int currentBackoff;

4.2. 构造方法

赋值相关…

protected InputChannel(
        SingleInputGate inputGate,
        int channelIndex,
        ResultPartitionID partitionId,
        int initialBackoff,
        int maxBackoff,
        Counter numBytesIn,
        Counter numBuffersIn) {

    checkArgument(channelIndex >= 0);

    int initial = initialBackoff;
    int max = maxBackoff;

    checkArgument(initial >= 0 && initial <= max);

    this.inputGate = checkNotNull(inputGate);
    this.channelInfo = new InputChannelInfo(inputGate.getGateIndex(), channelIndex);
    this.partitionId = checkNotNull(partitionId);

    this.initialBackoff = initial;
    this.maxBackoff = max;
    this.currentBackoff = initial == 0 ? -1 : 0;

    this.numBytesIn = numBytesIn;
    this.numBuffersIn = numBuffersIn;
}

4.3. 方法

名称描述
void setup()初始化相关
abstract void resumeConsumption()恢复数据消费
void notifyChannelNonEmpty回调函数,告知 InputGate 当前 channel 有数据
void notifyPriorityEvent(int priorityBufferNumber)回调函数,告知 InputGate 当前 channel 有事件
void notifyBufferAvailable(int numAvailableBuffers)通知可用Buffer的数量, 背压相关…
abstract void requestSubpartition(int subpartitionIndex)请求ResultSubpartition
void checkpointStarted(CheckpointBarrier barrier)checkpoint启动
void checkpointStopped(long checkpointId)停止checkpoint
abstract void sendTaskEvent(TaskEvent event)发送 TaskEvent
abstract boolean isReleased()是否释放资源
abstract void releaseAllResources()释放所有资源

五 .LocalInputChannel

LocalInputChannel 是InputChannel的子类. 用于同一个进程内,不同线程之间交换数据…

如果一个 InputChannel 和其消费的上游 ResultPartition 所属 Task 都在同一个 TaskManager 中运行,
那么它们之间的数据交换就在同一个 JVM 进程内不同线程之间进行,无需通过网络交换。

LocalInputChannel 实现了 InputChannel 接口,同时也实现了 BufferAvailabilityListener 接口。

LocalInputChannel 通过 ResultPartitionManager 请求创建和指定 ResultSubparition 关联的 ResultSubparitionView,
并以自身作为 ResultSubparitionView 的回调。
这样,一旦 ResultSubparition 有数据产出时,ResultSubparitionView 会得到通知,
同时 LocalInputChannel 的回调函数也会被调用,
这样消费者这一端就可以及时获取到数据的生产情况,从而及时地去消费数据。

5.1. 属性

// ------------------------------------------------------------------------

private final Object requestLock = new Object();

/**
 *
 * 分区管理器,里面存放着所有的分区信息
 *    partitionManager = {ResultPartitionManager@7381}
 *            registeredPartitions = {HashMap@7405}  size = 8
 *                    {ResultPartitionID@7416} "6b3e5e999219f9532114514c4bdbb773#0@51ad11521e991efaad6349cdf2accda7" -> {PipelinedResultPartition@7417} "PipelinedResultPartition 6b3e5e999219f9532114514c4bdbb773#0@51ad11521e991efaad6349cdf2accda7 [PIPELINED_BOUNDED, 1 subpartitions, 1 pending consumptions]"
 *                    {ResultPartitionID@7418} "6b3e5e999219f9532114514c4bdbb773#2@aecbd0682c0973976efe563eca747cc0" -> {PipelinedResultPartition@7419} "PipelinedResultPartition 6b3e5e999219f9532114514c4bdbb773#2@aecbd0682c0973976efe563eca747cc0 [PIPELINED_BOUNDED, 1 subpartitions, 1 pending consumptions]"
 *                    {ResultPartitionID@7420} "e07667949eeb5fe115288459d1d137f1#1@0ef8b3d70af60be8633af8af4e1c0698" -> {PipelinedResultPartition@7421} "PipelinedResultPartition e07667949eeb5fe115288459d1d137f1#1@0ef8b3d70af60be8633af8af4e1c0698 [PIPELINED_BOUNDED, 4 subpartitions, 4 pending consumptions]"
 *                    {ResultPartitionID@7422} "e07667949eeb5fe115288459d1d137f1#2@5e3aaeed65818bcfeb1485d0fd22d1ac" -> {PipelinedResultPartition@7423} "PipelinedResultPartition e07667949eeb5fe115288459d1d137f1#2@5e3aaeed65818bcfeb1485d0fd22d1ac [PIPELINED_BOUNDED, 4 subpartitions, 4 pending consumptions]"
 *                    {ResultPartitionID@7424} "e07667949eeb5fe115288459d1d137f1#3@30e457019371f01a403bd06cf3041eeb" -> {PipelinedResultPartition@7425} "PipelinedResultPartition e07667949eeb5fe115288459d1d137f1#3@30e457019371f01a403bd06cf3041eeb [PIPELINED_BOUNDED, 4 subpartitions, 4 pending consumptions]"
 *                    {ResultPartitionID@7426} "e07667949eeb5fe115288459d1d137f1#0@bfbc34d8314d506a39528d9c86f16859" -> {PipelinedResultPartition@7427} "PipelinedResultPartition e07667949eeb5fe115288459d1d137f1#0@bfbc34d8314d506a39528d9c86f16859 [PIPELINED_BOUNDED, 4 subpartitions, 4 pending consumptions]"
 *                    {ResultPartitionID@7428} "6b3e5e999219f9532114514c4bdbb773#1@bcf3be98463b672ea899cee1290423a2" -> {PipelinedResultPartition@7429} "PipelinedResultPartition 6b3e5e999219f9532114514c4bdbb773#1@bcf3be98463b672ea899cee1290423a2 [PIPELINED_BOUNDED, 1 subpartitions, 1 pending consumptions]"
 *                    {ResultPartitionID@7379} "5eba1007ad48ad2243891e1eff29c32b#0@db0c587a67c31a83cff5fd8be9496e5d" -> {PipelinedResultPartition@7371} "PipelinedResultPartition 5eba1007ad48ad2243891e1eff29c32b#0@db0c587a67c31a83cff5fd8be9496e5d [PIPELINED_BOUNDED, 4 subpartitions, 4 pending consumptions]"
 * The local partition manager. */
private final ResultPartitionManager partitionManager;

/**
 *
 * taskEventPublisher = {TaskEventDispatcher@6289}
 *
 * Task event dispatcher for backwards events.
 *
 * */
private final TaskEventPublisher taskEventPublisher;

/** The consumed subpartition. */
@Nullable private volatile ResultSubpartitionView subpartitionView;

private volatile boolean isReleased;

private final ChannelStatePersister channelStatePersister;

5.2. BufferAvailabilityListener

LocalInputChannel 实现了BufferAvailabilityListener 接口.

这个接口有两个方法.

名称描述
void notifyDataAvailable();当ResultSubpartion有数据到达的时候会调用这个方法通知InputChannel的有数据到达,可以消费数据.
void notifyPriorityEvent(int prioritySequenceNumber)将第一个优先级事件添加到缓冲区队列头时调用。

最重要的是notifyDataAvailable方法. . 因为Flink的流计算,考虑到时效性, 是采用push的方式. 当ResultSubpartion有数据到达的时候会调用notifyDataAvailable方法通知InputChannel的有数据到达,可以消费数据

5.3. ChannelStatePersister

checkpint的持久化操作是交由ChannelStatePersister实现的. 主要是看public void checkpointStarted(CheckpointBarrier barrier) 和 public void checkpointStopped(long checkpointId) 方法…

5.3.1. CheckpointBarrier

Checkpoint barriers用于在整个流拓扑中对齐检查点。
当JobManager指示时,source 会发出barriers。
当operators在其一个input 上接收到CheckpointBarrier时,它知道这是 pre-checkpoint 和 post-checkpoint数据之间的点。
一旦操作员从其所有输入通道接收到Checkpoint barriers,它就知道 某个检查点已完成。
它可以触发特定于操作员的检查点行为和 向下游运营商广播屏障。

根据语义保证,可以将检查点后数据延迟到检查点 完成(正好一次)。

Checkpoint barriersID是严格单调递增的。

CheckpointBarrier只有三个属性 :

private final long id;
private final long timestamp;
private final CheckpointOptions checkpointOptions;

目前为止CheckpointBarrier里面还不支持read/writer方法, 所有的操作都是由参数 CheckpointOptions来控制.

5.3.2. CheckpointOptions属性

这里定义了一些属性, 比如是否是isExactlyOnceMode 精准一次模式 , CheckpointType类型等相关操作.

/** 
 * checkpoint 的类型
 * Type of the checkpoint. */
private final CheckpointType checkpointType;

// checkpint持久化相关
/** Target location for the checkpoint. */
private final CheckpointStorageLocationReference targetLocation;

//是否是精准一次模式.
private final boolean isExactlyOnceMode;

// 是否对齐
private final boolean isUnalignedCheckpoint;

// 超时时间
private final long alignmentTimeout;

5.3.3. CheckpointType

CheckpointType 是一个Checkpoint的枚举类…

CheckpointType有四种类型 .

名称描述
CHECKPOINT(false, PostCheckpointAction.NONE, “Checkpoint”)checkpoint操作
SAVEPOINT(true, PostCheckpointAction.NONE, “Savepoint”),savepoint操作
SAVEPOINT_SUSPEND(true, PostCheckpointAction.SUSPEND, “Suspend Savepoint”),暂停savepoint
SAVEPOINT_TERMINATE(true, PostCheckpointAction.TERMINATE, “Terminate Savepoint”);停止savepoint

CheckpointType 的每种类型归纳起来, 有三个属性 …

// 是否是svepont类型
private final boolean isSavepoint;
// 动作
private final PostCheckpointAction postCheckpointAction;
// CheckpointType名称
private final String name;

5.3.4. CheckpointStatus

CheckpointStatus有三个状态

private enum CheckpointStatus {
    // 完成
    COMPLETED,
    // 挂起
    BARRIER_PENDING,
    // 已接收
    BARRIER_RECEIVED
}

5.3.5. checkpointStarted

启动checkpoint 操作.

protected void startPersisting(long barrierId, List<Buffer> knownBuffers)
        throws CheckpointException {
    logEvent("startPersisting", barrierId);

    // 判断 检查点的状态必须为已接收完毕,并且最后一个 lastSeenBarrier 要大于当前输入的barrierId
    if (checkpointStatus == CheckpointStatus.BARRIER_RECEIVED && lastSeenBarrier > barrierId) {
        throw new CheckpointException(
                String.format(
                        "Barrier for newer checkpoint %d has already been received compared to the requested checkpoint %d",
                        lastSeenBarrier, barrierId),
                CheckpointFailureReason
                        .CHECKPOINT_SUBSUMED); // currently, at most one active unaligned
    }

    if (lastSeenBarrier < barrierId) {
        // 不管当前的检查点状态如何,如果我们收到关于最近的检查点的通知,那么我们到目前为止已经看到了,总是标记这个最近的屏障是挂起的。
        //
        //BARRIER_RECEIVED status可以发生,如果我们看到一个旧的BARRIER,该BARRIER可能还没有被任务处理,但是任务现在通知我们检查点已经为新的检查点启动。
        //
        //我们应该把我们所知道的都说出来,并表明我们正在等待新的障碍的到来

        // Regardless of the current checkpointStatus, if we are notified about a more recent
        // checkpoint then we have seen so far, always mark that this more recent barrier is
        // pending.
        // BARRIER_RECEIVED status can happen if we have seen an older barrier, that probably
        // has not yet been processed by the task, but task is now notifying us that checkpoint
        // has started for even newer checkpoint. We should spill the knownBuffers and mark that
        // we are waiting for that newer barrier to arrive
        checkpointStatus = CheckpointStatus.BARRIER_PENDING;
        lastSeenBarrier = barrierId;
    }
    if (knownBuffers.size() > 0) {
        channelStateWriter.addInputData(
                barrierId,
                channelInfo,
                ChannelStateWriter.SEQUENCE_NUMBER_UNKNOWN,
                CloseableIterator.fromList(knownBuffers, Buffer::recycleBuffer));
    }
}

5.3.6. checkpointStopped

停止checkpoint 操作.

   protected void stopPersisting(long id) {
        logEvent("stopPersisting", id);
        if (id >= lastSeenBarrier) {
            checkpointStatus = CheckpointStatus.COMPLETED;
            lastSeenBarrier = id;
        }
    }

5.4. requestSubpartition

请求消费对应的子分区

//请求消费对应的子分区
@Override
protected void requestSubpartition(int subpartitionIndex) throws IOException {

    boolean retriggerRequest = false;
    boolean notifyDataAvailable = false;

    // The lock is required to request only once in the presence of retriggered requests.
    synchronized (requestLock) {
        checkState(!isReleased, "LocalInputChannel has been released already");

        if (subpartitionView == null) {
            LOG.debug(
                    "{}: Requesting LOCAL subpartition {} of partition {}. {}",
                    this,
                    subpartitionIndex,
                    partitionId,
                    channelStatePersister);

            try {
                // Local,无需网络通信,通过 ResultPartitionManager 创建一个 ResultSubpartitionView
                // LocalInputChannel 实现了 BufferAvailabilityListener
                // 在有数据时会得到通知,notifyDataAvailable 会被调用,
                // 进而将当前 channel 加到 InputGate 的可用 Channel 队列中
                ResultSubpartitionView subpartitionView =
                        partitionManager.createSubpartitionView(
                                partitionId, subpartitionIndex, this);

                if (subpartitionView == null) {
                    throw new IOException("Error requesting subpartition.");
                }

                // make the subpartition view visible
                this.subpartitionView = subpartitionView;

                // check if the channel was released in the meantime
                if (isReleased) {
                    subpartitionView.releaseAllResources();
                    this.subpartitionView = null;
                } else {
                    notifyDataAvailable = true;
                }
            } catch (PartitionNotFoundException notFound) {
                if (increaseBackoff()) {
                    retriggerRequest = true;
                } else {
                    throw notFound;
                }
            }
        }
    }

    if (notifyDataAvailable) {
        notifyDataAvailable();
    }

    // Do this outside of the lock scope as this might lead to a
    // deadlock with a concurrent release of the channel via the
    // input gate.
    if (retriggerRequest) {
        inputGate.retriggerPartitionRequest(partitionId.getPartitionId());
    }
}

5.5. ResultSubpartitionView

在requestSubpartition方法中有下列代码,构建了一个ResultSubpartitionView. 用于去读取ResultSubpartition中的数据.

5.5.1. 构建

// Local,无需网络通信,通过 ResultPartitionManager 创建一个 ResultSubpartitionView
// LocalInputChannel 实现了 BufferAvailabilityListener
// 在有数据时会得到通知,notifyDataAvailable 会被调用,
// 进而将当前 channel 加到 InputGate 的可用 Channel 队列中
ResultSubpartitionView subpartitionView =
partitionManager.createSubpartitionView(
        partitionId, subpartitionIndex, this);

创建的过程

ResultPartitionManager#createSubpartitionView
-->    ResultPartition#createSubpartitionView
    -->    BufferWritingResultPartition#createSubpartitionView
        -->    PipelinedSubpartition#createReadView
            -->    readView = new PipelinedSubpartitionView(this, availabilityListener)

5.5.2. 方法

ResultSubpartitionView 是一个接口,里面定义了一些列的方法

名称描述
BufferAndBacklog getNextBuffer()从队列中获取{@link Buffer}的实例
void notifyDataAvailable();通知 ResultSubpartition 的数据可供消费
default void notifyPriorityEvent(int priorityBufferNumber) {}已经完成对 ResultSubpartition 的Event消费
void releaseAllResources() throws IOException;释放所有资源
boolean isReleased();是否释放资源
void resumeConsumption();重新进行消费
Throwable getFailureCause();获取异常
boolean isAvailable(int numCreditsAvailable);获取可用额度
int unsynchronizedGetNumberOfQueuedBuffers();未同步获取排队缓冲区的数目

5.6. PipelinedSubpartitionView

PipelinedSubpartitionView 是ResultSubpartitionView的实现类.

5.6.1. 属性

/**
 * 标识这个视图归属于哪个  PipelinedSubpartition
 * The subpartition this view belongs to. */
private final PipelinedSubpartition parent;

/**
 * 当有数据的时候通过BufferAvailabilityListener的实现通知
 * LocalInputChannel
 * 或者
 * CreditBasedSequenceNumberingViewReader(RemoteInputChannel)有数据到来,可以消费数据
 */
private final BufferAvailabilityListener availabilityListener;

/**
 * 这个视图是否被释放
 * Flag indicating whether this view has been released. */
final AtomicBoolean isReleased;

5.6.2. 方法

名称描述
BufferAndBacklog getNextBuffer()从队列中获取{@link Buffer}的实例
void notifyDataAvailable();通知 ResultSubpartition 的数据可供消费
default void notifyPriorityEvent(int priorityBufferNumber) {}已经完成对 ResultSubpartition 的Event消费
void releaseAllResources() throws IOException;释放所有资源
boolean isReleased();是否释放资源
void resumeConsumption();重新进行消费
Throwable getFailureCause();获取异常
boolean isAvailable(int numCreditsAvailable);获取可用额度
int unsynchronizedGetNumberOfQueuedBuffers();未同步获取排队缓冲区的数目

方法基本都是调用父类PipelinedSubpartition中的方法处理数据…

 @Nullable
    @Override
    public BufferAndBacklog getNextBuffer() {
        return parent.pollBuffer();
    }

    @Override
    public void notifyDataAvailable() {
        //回调接口,通知inputchannel有数据到来
        availabilityListener.notifyDataAvailable();
    }

    @Override
    public void notifyPriorityEvent(int priorityBufferNumber) {
        //回调接口,通知inputchannel有事件到来
        availabilityListener.notifyPriorityEvent(priorityBufferNumber);
    }

    // 释放所有的资源
    @Override
    public void releaseAllResources() {
        if (isReleased.compareAndSet(false, true)) {
            // The view doesn't hold any resources and the parent cannot be restarted. Therefore,
            // it's OK to notify about consumption as well.
            parent.onConsumedSubpartition();
        }
    }

    @Override
    public boolean isReleased() {
        return isReleased.get() || parent.isReleased();
    }

    @Override
    public void resumeConsumption() {
        parent.resumeConsumption();
    }

    @Override
    public boolean isAvailable(int numCreditsAvailable) {
        return parent.isAvailable(numCreditsAvailable);
    }

    @Override
    public Throwable getFailureCause() {
        return parent.getFailureCause();
    }

    @Override
    public int unsynchronizedGetNumberOfQueuedBuffers() {
        return parent.unsynchronizedGetNumberOfQueuedBuffers();
    }

5.7. retriggerSubpartitionRequest

  /** Retriggers a subpartition request. */
    void retriggerSubpartitionRequest(Timer timer, final int subpartitionIndex) {
        synchronized (requestLock) {
            checkState(subpartitionView == null, "already requested partition");

            timer.schedule(
                    new TimerTask() {
                        @Override
                        public void run() {
                            try {
                                requestSubpartition(subpartitionIndex);
                            } catch (Throwable t) {
                                setError(t);
                            }
                        }
                    },
                    getCurrentBackoff());
        }
    }

5.8. notifyDataAvailable

回调,在 ResultSubparition 通知 ResultSubparitionView 有数据可供消费,

//回调,在 ResultSubparition 通知 ResultSubparitionView 有数据可供消费,
@Override
public void notifyDataAvailable() {
    //LocalInputChannel 通知 InputGate
    notifyChannelNonEmpty();
}

5.9. resumeConsumption

@Override
public void resumeConsumption() {
    checkState(!isReleased, "Channel released.");

    subpartitionView.resumeConsumption();

    if (subpartitionView.isAvailable(Integer.MAX_VALUE)) {
        notifyChannelNonEmpty();
    }
}

5.10. sendTaskEvent

@Override
void sendTaskEvent(TaskEvent event) throws IOException {
    checkError();
    checkState(
            subpartitionView != null,
            "Tried to send task event to producer before requesting the subpartition.");

    //事件分发
    if (!taskEventPublisher.publish(partitionId, event)) {
        throw new IOException(
                "Error while publishing event "
                        + event
                        + " to producer. The producer could not be found.");
    }
}

六 .RemoteInputChannel

数据接收端主要是指RemoteInputChannel,该channel用来接受和处理从其他节点读取到的数据。
和RemoteInputChanel对应的是 LocalInputChannel ,负责读取本地的subpartition,不需要使用接收端缓存。

RemoteInputChannel 请求远端的 ResultSubpartition会创建一个 PartitionRequestClient,并通过 Netty 发送 PartitionRequest 请求,
这时会带上当前 InputChannel 的 id 和初始的 credit 信息:

CreditBasedPartitionRequestClientHandler 从网络中读取数据后交给 RemoteInputChannel,RemoteInputChannel 会将接收到的加入队列中,并根据生产端的堆积申请 floating buffer .

在这里插入图片描述

6.1. 属性

private static final int NONE = -1;

/** ID to distinguish this channel from other channels sharing the same TCP connection. */
private final InputChannelID id = new InputChannelID();

/** The connection to use to request the remote partition. */
private final ConnectionID connectionId;

/** The connection manager to use connect to the remote partition provider. */
private final ConnectionManager connectionManager;

/**
 * The received buffers. Received buffers are enqueued by the network I/O thread and the queue
 * is consumed by the receiving task thread.
 */
private final PrioritizedDeque<SequenceBuffer> receivedBuffers = new PrioritizedDeque<>();

/**
 * Flag indicating whether this channel has been released. Either called by the receiving task
 * thread or the task manager actor.
 */
private final AtomicBoolean isReleased = new AtomicBoolean();

/**
 * RemoteInputChannel 请求远端的 ResultSubpartition,
 * 会创建一个 PartitionRequestClient,
 * 并通过 Netty 发送 PartitionRequest 请求,
 * 这时会带上当前 InputChannel 的 id 和初始的 credit 信息
 *
 * Client to establish a (possibly shared) TCP connection and request the partition.
 * */
private volatile PartitionRequestClient partitionRequestClient;

/** The next expected sequence number for the next buffer. */
private int expectedSequenceNumber = 0;

/**
 * 初始化信用值
 * The initial number of exclusive buffers assigned to this channel. */
private final int initialCredit;

/** The number of available buffers that have not been announced to the producer yet. */
private final AtomicInteger unannouncedCredit = new AtomicInteger(0);

private final BufferManager bufferManager;

@GuardedBy("receivedBuffers")
private int lastBarrierSequenceNumber = NONE;

@GuardedBy("receivedBuffers")
private long lastBarrierId = NONE;

private final ChannelStatePersister channelStatePersister;

6.2. SequenceBuffer

在属性汇总有 一个队列 receivedBuffers , 用于记录收到的SequenceBuffer.

/**
 * The received buffers. Received buffers are enqueued by the network I/O thread and the queue
 * is consumed by the receiving task thread.
 */
private final PrioritizedDeque<SequenceBuffer> receivedBuffers = new PrioritizedDeque<>();

SequenceBuffer只是一个包装类, 有一个Buffer 属性和 序列号 sequenceNumber

private static final class SequenceBuffer {
        final Buffer buffer;
        final int sequenceNumber;

        private SequenceBuffer(Buffer buffer, int sequenceNumber) {
            this.buffer = buffer;
            this.sequenceNumber = sequenceNumber;
        }
    }

6.3. PartitionRequestClient

RemoteInputChannel 请求远端的 ResultSubpartition,
会创建一个 PartitionRequestClient,并通过 Netty 发送 PartitionRequest 请求,这时会带上当前 InputChannel 的 id 和初始的 credit 信息

在这里插入图片描述

6.4. 构造方法

只是简单的赋值操作,
需要关注的是构建BufferManager , 每一个 RemoteinputChannel只有一个BufferManager用于内存管理
但是实际上 内存的最终管理者MemorySegmentProvider globalPool 依旧是NetworkBufferPool级别的.

   public RemoteInputChannel(
            SingleInputGate inputGate,
            int channelIndex,
            ResultPartitionID partitionId,
            ConnectionID connectionId,
            ConnectionManager connectionManager,
            int initialBackOff,
            int maxBackoff,
            int networkBuffersPerChannel,
            Counter numBytesIn,
            Counter numBuffersIn,
            ChannelStateWriter stateWriter) {

        super(
                inputGate,
                channelIndex,
                partitionId,
                initialBackOff,
                maxBackoff,
                numBytesIn,
                numBuffersIn);

        this.initialCredit = networkBuffersPerChannel;
        this.connectionId = checkNotNull(connectionId);
        this.connectionManager = checkNotNull(connectionManager);

        // 构建BufferManager , 每一个 RemoteinputChannel只有一个BufferManager用于内存管理
        // 但是实际上 内存的最终管理者MemorySegmentProvider globalPool 依旧是NetworkBufferPool级别的.
        this.bufferManager = new BufferManager(inputGate.getMemorySegmentProvider(), this, 0);
        this.channelStatePersister = new ChannelStatePersister(stateWriter, getChannelInfo());
    }

6.5. setup

/**
 * Setup includes assigning exclusive buffers to this input channel, and this method should be
 * called only once after this input channel is created.
 */
@Override
void setup() throws IOException {
    checkState(
            bufferManager.unsynchronizedGetAvailableExclusiveBuffers() == 0,
            "Bug in input channel setup logic: exclusive buffers have already been set for this input channel.");

    // 根据初始的信用值  申请 MemorySegment .
    bufferManager.requestExclusiveBuffers(initialCredit);
}

6.6. requestSubpartition

请求子分区数据

/** Requests a remote subpartition. */
@VisibleForTesting
@Override
public void requestSubpartition(int subpartitionIndex)
        throws IOException, InterruptedException {

    if (partitionRequestClient == null) {
        LOG.debug(
                "{}: Requesting REMOTE subpartition {} of partition {}. {}",
                this,
                subpartitionIndex,
                partitionId,
                channelStatePersister);
        // Create a client and request the partition
        try {


            // 构建一个client, 请求partition
            //REMOTE,需要网络通信,使用 Netty 建立网络
            //通过 ConnectionManager 来建立连接:创建 PartitionRequestClient,通过 PartitionRequestClient 发起请求
            partitionRequestClient =
                    connectionManager.createPartitionRequestClient(connectionId);
        } catch (IOException e) {
            // IOExceptions indicate that we could not open a connection to the remote
            // TaskExecutor
            throw new PartitionConnectionException(partitionId, e);
        }

        //请求分区,通过 netty 发起请求
        partitionRequestClient.requestSubpartition(partitionId, subpartitionIndex, this, 0);
    }
}

6.7. getNextBuffer

从receivedBuffers队列中获取buffer

    @Override
    Optional<BufferAndAvailability> getNextBuffer() throws IOException {
        checkPartitionRequestQueueInitialized();

        final SequenceBuffer next;
        final DataType nextDataType;

        // 从receivedBuffers队列中获取buffer
        synchronized (receivedBuffers) {
            next = receivedBuffers.poll();
            nextDataType =
                    receivedBuffers.peek() != null
                            ? receivedBuffers.peek().buffer.getDataType()
                            : DataType.NONE;
        }

        if (next == null) {
            if (isReleased.get()) {
                throw new CancelTaskException(
                        "Queried for a buffer after channel has been released.");
            }
            return Optional.empty();
        }

        numBytesIn.inc(next.buffer.getSize());
        numBuffersIn.inc();
        return Optional.of(
                new BufferAndAvailability(next.buffer, nextDataType, 0, next.sequenceNumber));
    }

6.8. sendTaskEvent

@Override
void sendTaskEvent(TaskEvent event) throws IOException {
    checkState(
            !isReleased.get(),
            "Tried to send task event to producer after channel has been released.");
    checkPartitionRequestQueueInitialized();

    partitionRequestClient.sendTaskEvent(partitionId, event, this);
}

6.9. notifyCreditAvailable

通知当前 channel 有新的 credit

/**
 * Enqueue this input channel in the pipeline for notifying the producer of unannounced credit.
 */
private void notifyCreditAvailable() throws IOException {
    checkPartitionRequestQueueInitialized();
    //通知当前 channel 有新的 credit
    partitionRequestClient.notifyCreditAvailable(this);
}

6.10. requestBuffer

 /**
     *
     * 该方法从bufferQueue中拿到一个buffer并返回。
     *
     *
     * Requests buffer from input channel directly for receiving network data. It should always
     * return an available buffer in credit-based mode unless the channel has been released.
     *
     * @return The available buffer.
     */
    @Nullable
    public Buffer requestBuffer() {
        return bufferManager.requestBuffer();
    }

6.11. onSenderBacklog

backlog 是发送端的堆积 的 buffer 数量,如果 bufferQueue 中 buffer 的数量不足,就去须从 LocalBufferPool 中请求 floating buffer
在请求了新的 buffer 后,通知生产者有 credit 可用
根据backlog(积压的数量)提前分配内存,如果backlog加上初始的credit大于可用buffer数,需要分配浮动buffer。

/**
 *
 * backlog 是发送端的堆积 的 buffer 数量,
 * 如果 bufferQueue 中 buffer 的数量不足,就去须从 LocalBufferPool 中请求 floating buffer
 * 在请求了新的 buffer 后,通知生产者有 credit 可用
 *
 * 根据backlog(积压的数量)提前分配内存,
 * 如果backlog加上初始的credit大于可用buffer数,需要分配浮动buffer。
 *
 * Receives the backlog from the producer's buffer response. If the number of available buffers
 * is less than backlog + initialCredit, it will request floating buffers from the buffer
 * manager, and then notify unannounced credits to the producer.
 *
 * @param backlog The number of unsent buffers in the producer's sub partition.
 */
void onSenderBacklog(int backlog) throws IOException {

    // 请求浮动内存
    int numRequestedBuffers = bufferManager.requestFloatingBuffers(backlog + initialCredit);

    if (numRequestedBuffers > 0 && unannouncedCredit.getAndAdd(numRequestedBuffers) == 0) {

        notifyCreditAvailable();
    }
}

6.12. onBuffer

接收到远程 ResultSubpartition 发送的 Buffer .
由 CreditBasedPartitionRequestClientHandler#decodeBufferOrEvent 调用

/**
 * 接收到远程 ResultSubpartition 发送的 Buffer
 * @param buffer
 * @param sequenceNumber
 * @param backlog
 * @throws IOException
 */
public void onBuffer(Buffer buffer, int sequenceNumber, int backlog) throws IOException {

    // 是否需要回收此buffer
    boolean recycleBuffer = true;

    try {

        // 检查sequenceNumber
        // 序号需要匹配
        if (expectedSequenceNumber != sequenceNumber) {
            onError(new BufferReorderingException(expectedSequenceNumber, sequenceNumber));
            return;
        }

        final boolean wasEmpty;
        boolean firstPriorityEvent = false;

        synchronized (receivedBuffers) {

            NetworkActionsLogger.traceInput(
                    "RemoteInputChannel#onBuffer",
                    buffer,
                    inputGate.getOwningTaskName(),
                    channelInfo,
                    channelStatePersister,
                    sequenceNumber);
            // Similar to notifyBufferAvailable(), make sure that we never add a buffer
            // after releaseAllResources() released all buffers from receivedBuffers
            // (see above for details).
            if (isReleased.get()) {
                return;
            }

            // 判断添加buffer之前的队列是否为空
            wasEmpty = receivedBuffers.isEmpty();

            SequenceBuffer sequenceBuffer = new SequenceBuffer(buffer, sequenceNumber);
            DataType dataType = buffer.getDataType();
            if (dataType.hasPriority()) {
                firstPriorityEvent = addPriorityBuffer(sequenceBuffer);
            } else {
                //  把已经填充了数据的内存块放入receivedBuffers队列
                //  后续的operator可以通过 StreamTaskNetworkInput 的实现类 读取InputChannel中缓存的数据。
                // [比如 StreamOneInputProcessor]
                receivedBuffers.add(sequenceBuffer);
                channelStatePersister.maybePersist(buffer);
                if (dataType.requiresAnnouncement()) {
                    firstPriorityEvent = addPriorityBuffer(announce(sequenceBuffer));
                }
            }

            // 增加SequenceNumber
            ++expectedSequenceNumber;
        }
        // 已接收到数据,缓存不需要回收
        recycleBuffer = false;


        if (firstPriorityEvent) {

            notifyPriorityEvent(sequenceNumber);
        }
        // 如果添加buffer之前的队列为空,需要通知对应的inputGate,现在已经有数据了(不为空
        if (wasEmpty) {
            //通知 InputGate,当前 channel 有新数据
            notifyChannelNonEmpty();
        }

        if (backlog >= 0) {
            //根据客户端的积压申请float buffer
            onSenderBacklog(backlog);
        }
    } finally {
        // 回收buffer
        if (recycleBuffer) {
            buffer.recycleBuffer();
        }
    }
}
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值