对于Flink流处理来说,算子之间的数据的流动要依靠分区器,也就是StreamPartitioner。它是一个抽象类,有以下几个实现类
其中第一个是blink中的,其余都是基本的依赖
接下来,一个一个说:
GlobalPartitioner
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
return 0;
}
上游所有数据,都流向下游算子第一个分区
ShufflePartitioner
private Random random = new Random();
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
return random.nextInt(numberOfChannels);
}
依靠random的nextInt以及下游的平行度,随机的往下游发送
RebalancePartitioner
private int nextChannelToSendTo;
@Override
public void setup(int numberOfChannels) {
super.setup(numberOfChannels);
nextChannelToSendTo = ThreadLocalRandom.current().nextInt(numberOfChannels);
}
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
nextChannelToSendTo = (nextChannelToSendTo + 1) % numberOfChannels;
return nextChannelToSendTo;
}
setup是初始化的时候就会运行的,方法来自于父类的父类ChannelSelector。会随机初始化一个分区index,然后用循环的方式开始循环输出,这个能保证很好的负载均衡。
KeyGroupStreamPartitioner
用于keyedstream
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
K key;
try {
key = keySelector.getKey(record.getInstance().getValue());
} catch (Exception e) {
throw new RuntimeException("Could not extract key from " + record.getInstance().getValue(), e);
}
return KeyGroupRangeAssignment.assignKeyToParallelOperator(key, maxParallelism, numberOfChannels);
}
public static int assignToKeyGroup(Object key, int maxParallelism) {
Preconditions.checkNotNull(key, "Assigned key must not be null!");
return computeKeyGroupForKeyHash(key.hashCode(), maxParallelism);
}
public static int computeKeyGroupForKeyHash(int keyHash, int maxParallelism) {
return MathUtils.murmurHash(keyHash) % maxParallelism;
}
public static int computeOperatorIndexForKeyGroup(int maxParallelism, int parallelism, int keyGroupId) {
return keyGroupId * parallelism / maxParallelism;
}
这里也是比较清楚的,经过了两次hash,第一次是,key的hashCode,然后再去通过MathUtils.murmurHash方法获得一个数值,和最大并行度取余得到keyGroupid,最后keyGroupId * parallelism / maxParallelism 获得分区index。
BroadcastPartitioner
/**
* Note: Broadcast mode could be handled directly for all the output channels
* in record writer, so it is no need to select channels via this method.
*/
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
throw new UnsupportedOperationException("Broadcast partitioner does not support select channels.");
}
BroadcastPartitioner,不需要分区选择器,因为下游每个分区都会有上游的所有数据。
RescalePartitioner
private int nextChannelToSendTo = -1;
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
if (++nextChannelToSendTo >= numberOfChannels) {
nextChannelToSendTo = 0;
}
return nextChannelToSendTo;
}
这个是RescalePartitioner的分区选择方法,乍一看上去,和RebalancePartitioner没有区别。我们继续追踪源码,总所周知,Flink会将执行计划分为StreamGraph -》 JobGraph -》ExecutionGraph。 而StreamingJobGraphGenerator就是讲StreamGraph转换为JobGraph。在这个类中,他把ForwardPartitioner和RescalePartitioner列为POINTWISE(点对点)分配模式,其他的为ALL_TO_ALL(多对多)分配模式。代码如下:
if (partitioner instanceof ForwardPartitioner || partitioner instanceof RescalePartitioner) {
jobEdge = downStreamVertex.connectNewDataSetAsInput(
headVertex,
DistributionPattern.POINTWISE,
resultPartitionType);
} else {
jobEdge = downStreamVertex.connectNewDataSetAsInput(
headVertex,
DistributionPattern.ALL_TO_ALL,
resultPartitionType);
}
而在jobGraph -》 ExecutionGraph中,连接上下游时,会根据这两种模式,去分配上游某个分区,所对应的下游分区范围。代码如下:
switch (pattern) {
case POINTWISE:
edges = connectPointwise(sourcePartitions, inputNumber);
break;
case ALL_TO_ALL:
edges = connectAllToAll(sourcePartitions, inputNumber);
break;
default:
throw new RuntimeException("Unrecognized distribution pattern.");
}
由于篇幅问题,这两个方法,这里就不列出来了。说下大概意思,如果是ALL_TO_ALL,那么上游的每个分区,可以分配的范围是下游所有分区。如果是POINTWISE,那么就和上下游的并行度有关,举个例子,如果上游是a1,a2两个分区,下游是 b1,b2,b3,b4 这4个分区,那么 上游a1 对应 b1,b2分区,a2对应 b3 b4分区。如果上游是a1,a2,a3,a4 四个分区,下游是 b1,b2分区,那么上游 a1 对应 b1 , a2对应 b1, a3对应 b2 ,a4对应b2。可以画图显示:
那么这样的好处是什么呢?可以增加taskmanager的数据本地性,减少了网络IO,taskmanager数据可以直接从本地的上游算子获取所需数据。RescalePartitioner和RebalancePartitioner相比,数据本地性比较好,减少了网络IO,但是不如 RebalancePartitioner数据均衡,因为RebalancePartitioner是ALL_TO_ALL模式的,对应下游所有分区,是真正的轮询。如果上游数据比较大,采用RebalancePartitioner,会带来不少的网络开销。
ForwardPartitioner
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
return 0;
}
通过上面RescalePartitioner的讲解,这里就比较好理解了,他会发往对应的下游的第一个分区。如果用户没有指定分区的前提下,上下游算子并行度一致,那么采用的就是ForwardPartitioner,如果上下游算子不一致,采用的是RebalancePartitioner。这个在StreamGraph类中有体现
StreamNode upstreamNode = getStreamNode(upStreamVertexID);
StreamNode downstreamNode = getStreamNode(downStreamVertexID);
if (partitioner == null && upstreamNode.getParallelism() == downstreamNode.getParallelism()) {
partitioner = new ForwardPartitioner<Object>();
} else if (partitioner == null) {
partitioner = new RebalancePartitioner<Object>();
}
CustomPartitionerWrapper
从名字就可以看出,是用户自定义的分区器了。
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
K key;
try {
key = keySelector.getKey(record.getInstance().getValue());
} catch (Exception e) {
throw new RuntimeException("Could not extract key from " + record.getInstance(), e);
}
return partitioner.partition(key, numberOfChannels);
}
举个栗子,根据第一个字段的长度分区:
dataStreamSource.partitionCustom(new Partitioner<String>() {
@Override
public int partition(String key, int numPartitions) {
return key.length() % numPartitions;
}
},0);
BinaryHashPartitioner
这个是在flink-table-planner-blink中的
@Override
public int selectChannel(SerializationDelegate<StreamRecord<BaseRow>> record) {
return MathUtils.murmurHash(
getHashFunc().hashCode(record.getInstance().getValue())) % numberOfChannels;
}
private HashFunction getHashFunc() {
if (hashFunc == null) {
try {
hashFunc = genHashFunc.newInstance(Thread.currentThread().getContextClassLoader());
genHashFunc = null;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
return hashFunc;
}
主要是用方法 + 数据 得出来的hashCode,再通过MathUtils.murmurHash % 并行度得出来的数据要去往的分区值