flink runtime network related-CSDN博客

本文链接：https://blog.csdn.net/zhoucs86/article/details/91356628

* An input gate consumes one or more partitions of a single produced intermediate result.
*
* <p>Each intermediate result is partitioned over its producing parallel subtasks; each of these
* partitions is furthermore partitioned into one or more subpartitions.
*
* <p>As an example, consider a map-reduce program, where the map operator produces data and the
* reduce operator consumes the produced data.
*
* <pre>{@code
* +-----+ +---------------------+ +--------+
* | Map | = produce => | Intermediate Result | <= consume = | Reduce |
* +-----+ +---------------------+ +--------+
* }</pre>
*
* <p>When deploying such a program in parallel, the intermediate result will be partitioned over its
* producing parallel subtasks; each of these partitions is furthermore partitioned into one or more
* subpartitions.
*
* <pre>{@code
* Intermediate result
* +-----------------------------------------+
* | +----------------+ | +-----------------------+
* +-------+ | +-------------+ +=> | Subpartition 1 | | <=======+=== | Input Gate | Reduce 1 |
* | Map 1 | ==> | | Partition 1 | =| +----------------+ | | +-----------------------+
* +-------+ | +-------------+ +=> | Subpartition 2 | | <==+ |
* | +----------------+ | | | Subpartition request
* | | | |
* | +----------------+ | | |
* +-------+ | +-------------+ +=> | Subpartition 1 | | <==+====+
* | Map 2 | ==> | | Partition 2 | =| +----------------+ | | +-----------------------+
* +-------+ | +-------------+ +=> | Subpartition 2 | | <==+======== | Input Gate | Reduce 2 |
* | +----------------+ | +-----------------------+
* +-----------------------------------------+
* }</pre>
*
* <p>In the above example, two map subtasks produce the intermediate result in parallel, resulting
* in two partitions (Partition 1 and 2). Each of these partitions is further partitioned into two
* subpartitions -- one for each parallel reduce subtask.

========================================================
/** The number of input channels (equivalent to the number of consumed partitions). */
private final int numberOfInputChannels;

/**
* Input channels. There is a one input channel for each consumed intermediate result partition.
* We store this in a map for runtime updates of single channels.
*/
private final Map<IntermediateResultPartitionID, InputChannel> inputChannels;

一个InputGate可以从多个Channel当中读取到数据到本地处理

/**
   * Buffer pool for incoming buffers. Incoming data from remote channels is copied to buffers
   * from this pool.
   */
   private BufferPool bufferPool;

这一部分就是从NetworkBufferPool当中申请到的内存空间

实际上理解就是localBuffer上申请partition，然后往channel里面去写

其实Owner就是partition

=============
NetworkEnviroment的 setupInputGate 方法是用来再TaskManager，JobManager初始化的时候

BufferOrEvent 就是一个消费者可能接收到的一个内容，里面可能包含着 NetworkBuffer 或者 Event 。再NetworkPoolFactory初始化的时候，
availableMemorySegments.add(MemorySegmentFactory.allocateUnpooledOffHeapMemory(segmentSize, null));

private final Set<LocalBufferPool> allBufferPools = new HashSet<>(); 维护了所有的LocalNetworkPool的引用

redistributeBuffers用来在不同的 NetworkPool 之间进行重分布

requestBuffer / requestMemorySegment 是localNetworkPool请求资源的函数

==============================

Serialization 其实就是把字节从Segment里面往真正地Buf 里面写入

Writer 这个接口有两种实现。一个是 ResultPartition 另一个是RecordWriter

SpanningRecordSerializer 里面有一个 serializationBuffer ，这是一个 DataOutputSerializer 里面有一块wrapper 是从 startBuffer 分配出来地
IOReadableWritable 里面会调用 DataOutputSerializer 里面相应地write 方法，将值写到缓存里面
wrapAsByteBuffer 可以把 DataOutputSerializer 里面的值取出来成为一个ByteBuffer
segment.put 是将 wrapper 里面的东西写到 segment里面去

SpanningRecordSerializer 的 serializeRecord 方法，其实就是把 IOReadableWritable 序列化到 serializationBuffer(DataOutputSerializer )
SpanningRecordSerializer 的 hasSerializedData 是看 dataBuffer lengthBuffer 是不是有remaining

lengthBuffer 其实是代表当前这个Serializer当中内容的长度
dataBuffer 是用来获取这个内容的，
就是靠这两个东西来看写入后的结果是哪一种的，写完以后，SpanningRecordSerializer 中就没有内容了

写入的目标是 BufferBuilder ，先把需要写入的长度写入，然后写入真正的内容
copyToBufferBuilder 会返回一个结果
FULL_RECORD
FULL_RECORD_MEMORY_SEGMENT_FULL
PARTIAL_RECORD_MEMORY_SEGMENT_FULL

从测试用例来看，一个8byte 的 BufferBuilder ，存入一个 Int 之后就满了
reset 方法是将 position 置零，可以再写一次

RecordWriter当中的 emit copyFromSerializerToTargetChannel 这两个方法，就是说把序列化完成的东西，写到ResultPartition里面的各个SubPartition里面
也就是说

根据DAG的图可以看到有很多 resultPartition , 真正在运行的时候有一个 ResultPartition 。
每个ResultPartition拥有一个BufferPool并且是被其包含的ResultSubPartition共享的。

addBufferConsumer 是将subPartition 加入到 ResultPartition 当中去

onConsumedSubpartition 代表当前这个 ResultPartition 被引用了多少次，当这个数归0 的时候，

requestSubpartition：请求ResultSubpartition；
getNextBuffer：获得下一个Buffer；
releaseAllResources：释放所有的相关资源；

PartitionRequestClient
onBuffer方法的执行处于Netty的I/O线程上，但RemoteInputChannel中getNextBuffer却不会在Netty的I/O线程上被调用，所以必须有一个数据共享的容器，这个容器就是receivedBuffers队列。getNextBuffer就是直接从receivedBuffers队列中出队一条数据然后返回。

在InputChannel里面有一个
@VisibleForTesting
   @Override
   public void requestSubpartition(int subpartitionIndex) throws IOException, InterruptedException {
       if (partitionRequestClient == null) {
           // Create a client and request the partition
           partitionRequestClient = connectionManager
               .createPartitionRequestClient(connectionId);

           partitionRequestClient.requestSubpartition(partitionId, subpartitionIndex, this, 0);
       }
   }

这个方法

在 PartitionRequestClientHandler 里面有一个 ConcurrentMap<InputChannelID, RemoteInputChannel> inputChannels
这个维护了一个 ID 到 channel 的关系

=================
一个TaskManager中可能同时运行着很多任务实例，有时某些任务需要消费某远程任务所生产的结果分区，
有时某些任务可能会生产结果分区供其他任务消费。所以对一个TaskManager来说，
其职责并非单一的，它既可能充当客户端的角色也可能充当服务端角色。
因此，一个NettyConnectionManager会同时管理着一个Netty客户端（NettyClient）和一个Netty服务器（NettyServer）实例。
当然除此之外还有一个Netty缓冲池（NettyBufferPool）以及一个分区请求客户端工厂（PartitionRequestClientFactory，用于创建分区请求客户端PartitionRequestClient），这些对象都在NettyConnectionManager构造器中被初始化。

NettyBufferPool 主要是对于netty当中的 PooledByteBufAllocator进行重写。使用flink自己的方法重写

实际上NettyProtocol中是真正注册到netty当中去的内容，根据Client中的内容，可以看到，注册管道等等一系列的行为
NettyClient并不用于发起远程结果子分区请求，该工作将由PartitionRequestClient完成。

真正发起请求的其实 PartitionRequestClient 。ChannelFuture requestSubpartition 这个函数真正的把一个消息发送到远端

### 理一下整个顺序：在netty中注册的那个函数实际上是 getClientChannelHandlers 。这个函数中提供了三个Handler
messageEncoder （） NettyMessageEncoder 实际上就是NettyMessage中提供的一个编码的方法 outbound

NettyMessageDecoder NettyMessageDecoder inbound
networkClientHandler 真正用来发送消息的一个接口，一般是 PartitionRequestClientHandler inbound

这样很自然地看到了 PartitionRequestClientHandler 里面地 channelRead 这个接口