一文弄懂 Flink Task 数据交互之数据读源码

mn_kw

已于 2024-09-13 17:32:31 修改

阅读量600

点赞数 21

文章标签： flink 交互前端

于 2024-09-13 17:31:31 首次发布

本文链接：https://blog.csdn.net/mn_kw/article/details/142215697

版权

一文弄懂 Flink Task 数据交互之数据读源码

1. OneInputStreamTask
2. CheckpointBarrierHandler
3. SingleInputGate
4. InputChannel
- 4.1RemoteInputChannel
- 4.2 LocalInputChannel
5. SingleInputGate从InputChannel获取数据
- 5.1 首先看RemoteInputChannel
- 5.2 再看LocalInputChannel的实现

Flink 重要源码目录点击我

在Reduce端的数据读过程取过程，以及Reduce端的数据模型。本文暂时只分析Reduce端任务处理线程的数据读，不涉及到与上游任务的网络交换和数据请求，Task之间的数据交换主要是基于Credit的Netty网络通信，这部分将在之后

1. OneInputStreamTask

首先从下游的任务类型和执行流程开始分析，下游的任务类型主要是OneInputStreamTask，任务执行就是调用其run()方法，run()方法里就是循环的调用inputProcessor.processInput()进行数据处理。

//OneInputStreamTask类
protected void run() throws Exception {
   // cache processor reference on the stack, to make the code more JIT friendly
   final StreamInputProcessor<IN> inputProcessor = this.inputProcessor;

   while (running && inputProcessor.processInput()) {
      // all the work happens in the "processInput" method
   }
}

//vStreamTask 接口
protected void processInput(MailboxDefaultAction.Controller controller) throws Exception {
   // 这里调用 inputProcessor.processInput
   InputStatus status = inputProcessor.processInput();
   if (status == InputStatus.MORE_AVAILABLE && recordWriter.isAvailable()) {
      return;
   }
   if (status == InputStatus.END_OF_INPUT) {
      controller.allActionsCompleted();
      return;
   }
   CompletableFuture<?> jointFuture = getInputOutputJointFuture(status);
   MailboxDefaultAction.Suspension suspendedDefaultAction = controller.suspendDefaultAction();
   jointFuture.thenRun(suspendedDefaultAction::resume);
}

在inputProcessor.processInput()中，数据获取的逻辑就是从barrierHandler中获取一个buffer，然后依次从buffer中获取一条数据进行处理，buffer数据被消费完之后再接着从barrierHandler中获取一个buffer，如此循环下去。

// StreamOneInputProcessor
@Override
public InputStatus processInput() throws Exception {
   // 这里调用 input.emitNext 方法
   InputStatus status = input.emitNext(output);

   if (status == InputStatus.END_OF_INPUT) {
      operatorChain.endHeadOperatorInput(1);
   }

   return status;
}

// StreamTaskNetworkInput
@Override
public InputStatus emitNext(DataOutput<T> output) throws Exception {

   while (true) {
      // get the stream element from the deserializer
      if (currentRecordDeserializer != null) {
         // 先从之前已经获取的buffer里反序列化出一条数据记录
         DeserializationResult result = currentRecordDeserializer.getNextRecord(deserializationDelegate);
         if (result.isBufferConsumed()) {
            currentRecordDeserializer.getCurrentBuffer().recycleBuffer();
            currentRecordDeserializer = null;
         }

         if (result.isFullRecord()) {
            // 在这里处理数据
            processElement(deserializationDelegate.getInstance(), output);
            return InputStatus.MORE_AVAILABLE;
         }
      }

      // 如果之前的buffer已经消费完了，就重新再从barrierHandler获取一个buffer
      Optional<BufferOrEvent> bufferOrEvent = checkpointedInputGate.pollNext();
      if (bufferOrEvent.isPresent()) {
         // return to the mailbox after receiving a checkpoint barrier to avoid processing of
         // data after the barrier before checkpoint is performed for unaligned checkpoint mode
         if (bufferOrEvent.get().isEvent() && bufferOrEvent.get().getEvent() instanceof CheckpointBarrier) {
            return InputStatus.MORE_AVAILABLE;
         }
         processBufferOrEvent(bufferOrEvent.get());
      } else {
         if (checkpointedInputGate.isFinished()) {
            checkState(checkpointedInputGate.getAvailableFuture().isDone(), "Finished BarrierHandler should be available");
            return InputStatus.END_OF_INPUT;
         }
         return InputStatus.NOTHING_AVAILABLE;
      }
   }
}

private void processElement(StreamElement recordOrMark, DataOutput<T> output) throws Exception {
   if (recordOrMark.isRecord()){
      // 进行实际的数据处理
      output.emitRecord(recordOrMark.asRecord());
   } else if (recordOrMark.isWatermark()) {
      // 处理 WaterMark 数据
      statusWatermarkValve.inputWatermark(recordOrMark.asWatermark(), lastChannel);
   } else if (recordOrMark.isLatencyMarker()) {
      output.emitLatencyMarker(recordOrMark.asLatencyMarker());
   } else if (recordOrMark.isStreamStatus()) {
      statusWatermarkValve.inputStreamStatus(recordOrMark.asStreamStatus(), lastChannel);
   } else {
      throw new UnsupportedOperationException("Unknown type of StreamElement");
   }
}

2. CheckpointBarrierHandler

这里我们需要进一步去看看barrierHandler中数据获取的逻辑，barrierHandler这个组件我们在分析checkpoint的时候分析过，barrierHandler意思就是barrier处理器，它处理的是checkpoint barrier。根据checkpoint模式的不同会创建不同的barrierHandler，如果是EXACTLY_ONCE，就会生成BarrierBuffer，会进行barrier对齐；如果是AT_LEAST_ONCE，那就会生成BarrierTracker，不会进行barrier对齐。如果在程序中没有设置checkpoint，那么默认的也是AT_LEAST_ONCE，barrierHandler是BarrierTracker实例。这里就不详细讲是怎么处理checkpoint的了，我们只看它是怎么获取数据buffer的

// BarrierBuffer
public BufferOrEvent getNextNonBlocked() throws Exception {
   while (true) {
      // process buffered BufferOrEvents before grabbing new ones
      Optional<BufferOrEvent> next;
      //currentBuffered里是进行checkpoint barrier对齐时缓存的那些被阻塞channel的buffer
      //currentBuffered不为空发生在完成一个checkpoint之后，将bufferBlocker里的buffer放到currentBuffered
      if (currentBuffered == null) {
          //一般情况下是从inputGate获取buffer
         next = inputGate.getNextBufferOrEvent();
      }
      else {
         next = Optional.ofNullable(currentBuffered.getNext());
         ...
      }

      ...

      BufferOrEvent bufferOrEvent = next.get();
      if (isBlocked(bufferOrEvent.getChannelIndex())) {
         // if the channel is blocked, we just store the BufferOrEvent
         //如果一个channel被阻塞了（已经接收到checkpoint barrier了），先添加到bufferBlocker里，但不会放到currentBuffered
         bufferBlocker.add(bufferOrEvent);
         checkSizeLimit();
      }
      else if (bufferOrEvent.isBuffer()) {
         return bufferOrEvent;
      }
      ...//其他处理checkpoint事件
   }
}


// BarrierTracker
public BufferOrEvent getNextNonBlocked() throws Exception {
   while (true) {
       //比较简单，直接从inputGate中获取
      Optional<BufferOrEvent> next = inputGate.getNextBufferOrEvent();
      if (!next.isPresent()) {
         // buffer or input exhausted
         return null;
      }

      BufferOrEvent bufferOrEvent = next.get();
      if (bufferOrEvent.isBuffer()) {
         return bufferOrEvent;
      }
      ...//其他的checkpoint事件
   }
}

// BarrierTracker
public BufferOrEvent getNextNonBlocked() throws Exception {
   while (true) {
       //比较简单，直接从inputGate中获取
      Optional<BufferOrEvent> next = inputGate.getNextBufferOrEvent();
      if (!next.isPresent()) {
         // buffer or input exhausted
         return null;
      }

      BufferOrEvent bufferOrEvent = next.get();
      if (bufferOrEvent.isBuffer()) {
         return bufferOrEvent;
      }
      ...//其他的checkpoint事件
   }
}

从上面代码中可以看到，不管是BarrierBuffer还是BarrierTracker，都是从inputGate中来获取buffer的，这里就引出来Reduce端数据输入的一个最重要的模型InputGate。在分析inputGate.getNextBufferOrEvent()之前先来分析一下InputGate的数据结构。

3. SingleInputGate

InputGate的实现由两种，SingleInputGate和UnionInputGate，常见的就是SingleInputGate，UnionInputGate是将多个SingleInputGate进行联合在一起的InputGate，例如join算子，是从两个流中进行数据输入，它就是一个UnionInputGate。这里我们分析SingleInputGate就行了

public class SingleInputGate implements InputGate {

   /** The type of the partition the input gate is consuming. */
   private final ResultPartitionType consumedPartitionType;

   /**
    * The index of the consumed subpartition of each consumed partition. This index depends on the
    * {@link DistributionPattern} and the subtask indices of the producing and consuming task.
    */
   private final int consumedSubpartitionIndex;

   /** The number of input channels (equivalent to the number of consumed partitions). */
   private final int numberOfInputChannels;

   /**
    * Input channels. There is a one input channel for each consumed intermediate result partition.
    * We store this in a map for runtime updates of single channels.
    */
   private final Map<IntermediateResultPartitionID, InputChannel> inputChannels;

   /** Channels, which notified this input gate about available data. */
   private final ArrayDeque<InputChannel> inputChannelsWithData = new ArrayDeque<>();

   /**
    * Buffer pool for incoming buffers. Incoming data from remote channels is copied to buffers
    * from this pool.
    */
   private BufferPool bufferPool;

   /** Global network buffer pool to request and recycle exclusive buffers (only for credit-based). */
   private NetworkBufferPool networkBufferPool;

   private final boolean isCreditBased;

   /** Flag indicating whether partitions have been requested. */
   private boolean requestedPartitionsFlag;

   /** Number of network buffers to use for each remote input channel. */
   private int networkBuffersPerChannel;
   
   //其他一些未列出的非核心成员
   ...

主要成员：
consumedPartitionType: ResultPartitionType，数据之间交换的类型，有BLOCKING, PIPELINED, PIPELINED_BOUNDED。具体含义在《Task数据交互之数据写》中已经解释过，在实时流里都是PIPELINED_BOUNDED，意味有限制的流水线模式，上游生产的同时下游可以进行消费，采用有限的buffer去缓存这些数据。
consumedSubpartitionIndex: 在《Task数据交互之数据写》中说过，每个ResultPartition有多个ResultSubPartition，代表多个下游任务，每个ResultSubPartition都被下游的一个任务所消费。consumedSubpartitionIndex所代表的就是这个下游任务消费的是上游哪个ResultSubPartition
numberOfInputChannels: InputChannel的数量，InputChannel代表的是上游任务数据通道，每个InputChannel对应一个上游任务，例如上游有10个Map任务，那每个Reduce任务就会有10个InputChannel，如果map任务和reduce任务在同一个节点上，那InputChannel类型就是LocalInputChannel，否则就是RemoteInputChannel。如果InputGate对应的是ResultPartition，那InputChannel对应的就是ResultSubPartition
inputChannels: 这个InputGate所有的InputChannel
inputChannelsWithData: 接收到上游有数据发送过来的inputChannels，如果某个上游任务长时间没有数据产出，那它不会在这个队列里
bufferPool: LocalBufferPool，本地buffer池，InputChannel可以从这个buffer池中获取浮动buffer去缓存从上游任务发送过来的数据。
isCreditBased: 是否基于Credit的数据传输，默认是
networkBuffersPerChannel: 用于接收每个InputChannel数据的buffer数量，这部分当做InputChannel的独占buffer，默认是每个InputChannel使用两个独占buffer来缓存数据。

那这个SingleInputGate是在哪里创建的呢？
答案是JobMaster在部署Task的时候会创建TaskDeploymentDescriptor（根据ExecutionGraph），TaskDeploymentDescriptor中就包含了InputGate的描述信息InputGateDeploymentDescriptor，JobMaster将TaskDeploymentDescriptor发送给TaskManager之后，TaskManager会根据TaskDeploymentDescriptor构建Task，这时候会去创建InputGate

//Task构造方法
public Task(
   ...
   Collection<ResultPartitionDeploymentDescriptor> resultPartitionDeploymentDescriptors,
   Collection<InputGateDeploymentDescriptor> inputGateDeploymentDescriptors,
   ...) {

   ...

   counter = 0;

   for (InputGateDeploymentDescriptor inputGateDeploymentDescriptor: inputGateDeploymentDescriptors) {
      SingleInputGate gate = SingleInputGate.create(
         taskNameWithSubtaskAndId,
         jobId,
         executionId,
         inputGateDeploymentDescriptor,
         networkEnvironment,
         this,
         metricGroup.getIOMetricGroup());

      inputGates[counter] = gate;
      inputGatesById.put(gate.getConsumedResultId(), gate);

      ++counter;
   }

   ...
}

在SingleInputGate的结构中，最核心的当属inputChannels和bufferPool了。bufferPool的创建跟Map端类似，在《Task数据交互之数据写》中，我们分析了在map端LocalBufferPool中最大的buffer数 = task对应的下游任务数 * 每个下游任务需要的buffer + 额外多分配的buffer数。但在reduce端，LocalBufferPool中最大的buffer数 = 额外多分配的buffer数，默认情况下额外多分配的buffer数为8，这部分作为浮动的buffer。如果项目的任务数较大，应该调大这个参数。

4. InputChannel

接下来看看InputChannel
上面说过，每个InputChannel对应一个上游任务，如果map任务和reduce任务在同一个节点上，那InputChannel类型就是LocalInputChannel，否则就是RemoteInputChannel，正式环境中，大部分使用的还是RemoteInputChannel，因为每个下游任务节点要和多个上游节点去进行数据交互。所以我们先来看RemoteInputChannel

4.1RemoteInputChannel

InputChannel
接下来看看InputChannel
上面说过，每个InputChannel对应一个上游任务，如果map任务和reduce任务在同一个节点上，那InputChannel类型就是LocalInputChannel，否则就是RemoteInputChannel，正式环境中，大部分使用的还是RemoteInputChannel，因为每个下游任务节点要和多个上游节点去进行数据交互。所以我们先来看RemoteInputChannel

RemoteInputChannel
public class RemoteInputChannel extends InputChannel implements BufferRecycler, BufferListener {

   /** The connection manager to use connect to the remote partition provider. */
   // 与其他节点进行通信连接的管理者
   private final ConnectionManager connectionManager;

   /**
    * The received buffers. Received buffers are enqueued by the network I/O thread and the queue
    * is consumed by the receiving task thread.
    */
    // 从上游任务节点接收到的buffer数据队列，数据将会被任务处理线程所消费
   private final ArrayDeque<Buffer> receivedBuffers = new ArrayDeque<>();

   /** Client to establish a (possibly shared) TCP connection and request the partition.
    */
    // 与上游节点通信的客户端，也可以说是netty客户端
   private volatile PartitionRequestClient partitionRequestClient;

   /** The initial number of exclusive buffers assigned to this channel. */
   // 初始的消费凭证，flink节点间数据传输默认是基于credit消费凭证的，关于具体的概念可以参考《Flink基于Credit的数据传输和背压》
   private int initialCredit;

   /** The available buffer queue wraps both exclusive and requested floating buffers. */
   /**
       可用的空闲buffer队列，下游任务接收到上游任务的数据时，从这个队列中拿一个空闲的buffer来缓存接收到的数据，
       并放到receivedBuffers列队里。bufferQueue包含两种buffer队列，一种是该RemoteInputChannel独占的buffer，
       数量等于networkBuffersPerChannel，独占的buffer是RemoteInputChannel私有的，
       在回收时会再次添加到RemoteInputChannel的可用buffer队列里；另一种是可以被多个RemoteInputChannel所共享的buffer，
       称之为浮动buffer，浮动buffer总量等于LocalBufferPool中的buffer数，当某个RemoteInputChannel没有足够的空闲buffer了
       （比如数据处理线程执行很慢的时候），可以从LocalBufferPool中申请浮动的buffer来缓存接收的数据，
       浮动的buffer在回收时放到LocalBufferPool里。
       接收端使用buffer接收数据时，优先使用浮动的buffer，再使用独占的buffer
   */
   private final AvailableBufferQueue bufferQueue = new AvailableBufferQueue();

   /** The number of required buffers that equals to sender's backlog plus initial credit. */
   @GuardedBy("bufferQueue")
   /**
       RemoteInputChannel接收数据需求的buffer数量，等于发送端数据积压的量+initialCredit，
       initialCredit的值等于每个RemoteInputChannel的独占buffer数。那么这个numRequiredBuffers
       的值其实是要大于发送端的数据积压量的，这样可以做到更加安全保险
   */
   private int numRequiredBuffers;

   /** The tag indicates whether this channel is waiting for additional floating buffers from the buffer pool. */
   @GuardedBy("bufferQueue")
   /**
       是否需要等待浮动的buffer，当RemoteInputChannel申请浮动buffer的时候，发现LocalBufferPool也没有足够的buffer了，
       就会标识要等待空闲buffer，当LocalBufferPool有buffer回收了，就会分配给该RemoteInputChannel       
   */
   private boolean isWaitingForFloatingBuffers;
   
   //其他一些未列出的非核心成员
   ...

4.2 LocalInputChannel

上面分析了RemoteInputChannel，下面再来看看LocalInputChannel，它代表的是和下游任务在同一节点（JVM）的上游任务。
可以看到LocalInputChannel的结构相对简单，没有本地buffer队列什么的，因为在同一节点一个JVM中，直接读取上游任务产生的数据即可

public class LocalInputChannel extends InputChannel implements BufferAvailabilityListener {

   private final Object requestLock = new Object();

   /** The local partition manager. */
   // ResultSubpartition的一个视图，可以通过这个视图直接读取上游任务写到ResultSubPartition中的buffer，不需要再进行数据交换了
   private final ResultPartitionManager partitionManager;

   /** Task event dispatcher for backwards events. */
   private final TaskEventDispatcher taskEventDispatcher;

   /** The consumed subpartition. */
   private volatile ResultSubpartitionView subpartitionView;

   private volatile boolean isReleased;

5. SingleInputGate从InputChannel获取数据

回到上文，任务处理线程读数据最终是通过inputGate.getNextBufferOrEvent()来获取的，我却花了这么多篇幅来介绍SingleInputGate和InputChannel，是因为了解了这两个组件就能更好的理解数据接收端的架构了。那么下面就来具体看一下inputGate.getNextBufferOrEvent()的实现

//SingleInputGate类
public Optional<BufferOrEvent> getNextBufferOrEvent() throws IOException, InterruptedException {
   return getNextBufferOrEvent(true);
}

private Optional<BufferOrEvent> getNextBufferOrEvent(boolean blocking) throws IOException, InterruptedException {
   ...
   //向服务端，也就是上游任务节点发起数据请求
   requestPartitions();

   InputChannel currentChannel;
   boolean moreAvailable;
   Optional<BufferAndAvailability> result = Optional.empty();

   do {
      synchronized (inputChannelsWithData) {
         while (inputChannelsWithData.size() == 0) {
            if (isReleased) {
               throw new IllegalStateException("Released");
            }

            if (blocking) {
                //如果没有任何InputChannel接收到数据，线程就会阻塞
               inputChannelsWithData.wait();
            }
            else {
               return Optional.empty();
            }
         }
        //从接收到数据的InputChannel队列inputChannelsWithData里出队一个InputChannel
         currentChannel = inputChannelsWithData.remove();
         enqueuedInputChannelsWithData.clear(currentChannel.getChannelIndex());
         moreAvailable = !inputChannelsWithData.isEmpty();
      }
      //从InputChannel里获取一个buffer
      result = currentChannel.getNextBuffer();
   } while (!result.isPresent());

   // this channel was now removed from the non-empty channels queue
   // we re-add it in case it has more data, because in that case no "non-empty" notification
   // will come for that channel
   //如果InputChannel还有多余的数据，则继续放到inputChannelsWithData队列里
   if (result.get().moreAvailable()) {
      queueChannel(currentChannel);
      moreAvailable = true;
   }

   final Buffer buffer = result.get().buffer();
   if (buffer.isBuffer()) {
       //将buffer封装成BufferOrEvent返回
      return Optional.of(new BufferOrEvent(buffer, currentChannel.getChannelIndex(), moreAvailable));
   }
   else {
      ... //事件
   }
}

通过源码可以看到实现逻辑大致如下：
1、首先去请求上游的ResultPartition，这通常发生在首次获取数据的时候，向上游任务节点发送数据请求，建立tcp连接，之后这个连接会一直存在
2、从接收到数据的InputChannel队列inputChannelsWithData里出队一个InputChannel，再从这个InputChannel里获取一个buffer，如果没有任何InputChannel接收到数据，任务处理线程就会阻塞，直到有InputChannel接收到数据。
3、如果2中的InputChannel还有多余的数据，则继续放到inputChannelsWithData队列里，以便可以继续获取InputChannel后面的数据

接下来就看看InputChannel怎么获取buffer
首先看RemoteInputChannel

5.1 首先看RemoteInputChannel

//RemoteInputChannel类
Optional<BufferAndAvailability> getNextBuffer() throws IOException {

   final Buffer next;
   final boolean moreAvailable;

   synchronized (receivedBuffers) {
      next = receivedBuffers.poll();
      moreAvailable = !receivedBuffers.isEmpty();
   }

   numBytesIn.inc(next.getSizeUnsafe());
   numBuffersIn.inc();
   return Optional.of(new BufferAndAvailability(next, moreAvailable, getSenderBacklog()));
}

逻辑比较简单，就是从接收到的buffer队列里取一个buffer就行了。

5.2 再看LocalInputChannel的实现

//LocalInputChannel类
Optional<BufferAndAvailability> getNextBuffer() throws IOException, InterruptedException {
   checkError();

   ResultSubpartitionView subpartitionView = this.subpartitionView;
   if (subpartitionView == null) {
       ...
      subpartitionView = checkAndWaitForSubpartitionView();
   }
    //通过subpartitionView获取，在JVM本地，而非远程
   BufferAndBacklog next = subpartitionView.getNextBuffer();

   ...
   return Optional.of(new BufferAndAvailability(next.buffer(), next.isMoreAvailable(), next.buffersInBacklog()));
}

//PipelinedSubpartitionView类
public BufferAndBacklog getNextBuffer() {
   return parent.pollBuffer();
}

//PipelinedSubpartition类
BufferAndBacklog pollBuffer() {
   synchronized (buffers) {
      Buffer buffer = null;

      if (buffers.isEmpty()) {
         flushRequested = false;
      }

      while (!buffers.isEmpty()) {
          //从PipelinedSubpartition的buffers数据队列取队头的buffer
         BufferConsumer bufferConsumer = buffers.peek();
         buffer = bufferConsumer.build();

         checkState(bufferConsumer.isFinished() || buffers.size() == 1,
            "When there are multiple buffers, an unfinished bufferConsumer can not be at the head of the buffers queue.");

         if (buffers.size() == 1) {
            // turn off flushRequested flag if we drained all of the available data
            flushRequested = false;
         }
        //如果buffer是已经被写满的，不是写了一半数据的那种，就可以从buffers队列里删掉了
         if (bufferConsumer.isFinished()) {
            buffers.pop().close();
            decreaseBuffersInBacklogUnsafe(bufferConsumer.isBuffer());
         }

         if (buffer.readableBytes() > 0) {
            break;
         }
         buffer.recycleBuffer();
         buffer = null;
         if (!bufferConsumer.isFinished()) {
            break;
         }
      }

      if (buffer == null) {
         return null;
      }
      //更新PipelinedSubpartition的数据状态
      updateStatistics(buffer);
      // Do not report last remaining buffer on buffers as available to read (assuming it's unfinished).
      // It will be reported for reading either on flush or when the number of buffers in the queue
      // will be 2 or more.
      return new BufferAndBacklog(
         buffer,
         isAvailableUnsafe(),
         getBuffersInBacklog(),
         nextBufferIsEventUnsafe());
   }
}

总体逻辑也比较简单，就是从Map任务生产的ResultSubPartition里获取一个buffer，Map生产的数据都放到ResultSubPartition的buffers队列里了。
但是细节的一个问题是这个Reduce任务获取的这个buffer并不一定就从ResultSubPartition的buffers队列里删掉了，而是可能会获取这个buffer的部分数据。这是因为如果Map任务的数据生产很慢，经过200ms（默认buffer timeout）还没有填满一个buffer，它会进行刷新数据，让下游任务来访问或者推送到下游，这时就不会返回一整个buffer，而是返回这个buffer里200ms内所产生的数据，Map端会继续往这个buffer里去写数据，直至写满才会另写一个新的buffer。

再回到上文，任务线程从inputGate获取到buffer之后会进行反序列化，然后进行任务数据处理。总结一下，数据读的流程大致是OneInputStreamTask–>StreamInputProcessor–>CheckpointBarrierHandler–>InputGate–>InputChannel。

到此，Task的数据读基本就分析完了。后面会分析Reduce端和Map端的数据传输和交互过程，也就是RemoteInputChannel的数据是怎么接收的，ResultSubPartition里的数据又是怎么发送到下游的。