在Flink中,KeyGroup,KeyGroupEntry和KeyGroupRange是与状态后端密切相关的概念。
KeyGroup
KeyGroup是Flink中用于实现Keyed State的一个概念。它将Keyed State数据拆分成若干个不同的分区,每个KeyGroup都包含一组连续的Key。KeyGroup的数量等于Job的并行度。每个TaskManager会维护一些KeyGroup,它会为每个Key计算一个Key Group ID,然后将Key Group ID映射到Task Slot上。
KeyGroupEntry
KeyGroupEntry代表了一个Key-Value对在Keyed State后端中的实际存储位置。每个KeyGroup中有多个KeyGroupEntry,其中每个KeyGroupEntry都由一个Key和对应的Value组成。
KeyGroupRange
KeyGroupRange是一组连续的KeyGroup ID。在Flink的分布式环境中,Job会被分割成若干个并行的任务,每个任务都会处理一部分KeyGroup。KeyGroupRange表示了某个任务需要处理的KeyGroup的范围。
综上所述,KeyGroup是将Keyed State数据拆分成若干个不同的分区,每个KeyGroup中包含多个KeyGroupEntry;而KeyGroupRange则表示了某个任务需要处理的KeyGroup的范围,即可能一组连续的KeyGroup ID。KeyGroupEntry则代表了一个Key-Value对在Keyed State后端中的实际存储位置。
简要源码
public class KeyGroupEntry {
private final int kvStateId;
private final byte[] key;
private final byte[] value;
***
***
}
public class KeyGroup {
private final int keyGroupId;
private final ThrowingIterator<KeyGroupEntry> keyGroupEntries;
KeyGroup(int keyGroupId, ThrowingIterator<KeyGroupEntry> keyGroupEntries) {
this.keyGroupId = keyGroupId;
this.keyGroupEntries = keyGroupEntries;
}
public int getKeyGroupId() {
return keyGroupId;
}
public ThrowingIterator<KeyGroupEntry> getKeyGroupEntries() {
return keyGroupEntries;
}
}
public class KeyGroupRange implements KeyGroupsList, Serializable {
private static final long serialVersionUID = 4869121477592070607L;
/** The empty key-group */
public static final KeyGroupRange EMPTY_KEY_GROUP_RANGE = new KeyGroupRange();
private final int startKeyGroup;
private final int endKeyGroup;
***
***
}
简图说明:
+------------------------------------------------------------------+
| Keyed State |
| +----------------------------------------+ |
| | | |
| | KeyGroup 0 | |
| | +---------------------+ | |
| | | KeyGroupEntry 0 | | |
| | +---------------------+ | |
| | | KeyGroupEntry 1 | | |
| | +---------------------+ | |
| | | ... | | |
| | | ... | | |
| | | ... | | |
| | +---------------------+ | |
| | | |
| | KeyGroup 1 | |
| | +---------------------+ | |
| | | KeyGroupEntry 0 | | |
| | +---------------------+ | |
| | | KeyGroupEntry 1 | | |
| | +---------------------+ | |
| | | ... | | |
| | | ... | | |
| | | ... | | |
| | +---------------------+ | |
| | ... | |
| | | |
| | KeyGroup n-1 | |
| | +---------------------+ | |
| | | KeyGroupEntry 0 | | |
| | +---------------------+ | |
| | | KeyGroupEntry 1 | | |
| | +---------------------+ | |
| | | ... | | |
| | | ... | | |
| | | ... | | |
| | +---------------------+ | |
| | | |
| +----------------------------------------+ |
| |
| KeyGroup ID Range (0 - n-1) |
+------------------------------------------------------------------+
在这个示意图中,Keyed State由多个KeyGroup组成,每个KeyGroup代表着
Keyed State的一个分区,每个KeyGroup中包含多个KeyGroupEntry,
每个KeyGroupEntry存储了一个Key-Value对。KeyGroup ID Range表示
每个Task需要处理的KeyGroup ID范围。