数据结构二进制化
源码解读
org/apache/flink/table/dataformat/BinaryRow.java
- 由Flink的最小内存管理单元 MemorySegment 支撑实现,能够大量减少序列化与反序列化的开销
- 正如上图所示,一个binary row含有两个部分: 定长和变长部分
- Fixed-length part:
- 一个字节长的header
- null bit sets 用于null 的追踪??与8个字节长的字对齐
- field values 保存基本类型和能以8个字节长存储的变长的值
- 否则field values将会存储变长值的长度与offset
- 完全落在MemorySegment,这会加速field的读写速度。
- 单行的field的数值不能超过一个MemorySegment的容量
- variable-length part:
- 可能会由多个memorySegment来存储
/home/graviti/下载/flink-release-1.9.0/flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/operators/aggregate
下面对比一下GroupAggFunction 与 MiniBatchGroupAggFunction的区别
GroupAggFunction
/**
* 对stream上经过的每一个元素进行处理
* @param input
* @param ctx A {@link Context} that allows querying the timestamp of the element and getting
* a {@link TimerService} for registering timers and querying the time. The
* context is only valid during the invocation of this method, do not store it.
* @param out The collector for returning result values.
*
* @throws Exception
*/
@Override
public void processElement(BaseRow input, Context ctx, Collector<BaseRow> out) throws Exception {
long currentTime = ctx.timerService().currentProcessingTime();
// register state-cleanup timer
registerProcessingCleanupTimer(ctx, currentTime);
BaseRow currentKey = ctx.getCurrentKey();
boolean firstRow;
//以当前的accumulator 是否为空来判断是否是第一行
BaseRow accumulators = accState.value();
if (null == accumulators) {
firstRow = true;
accumulators = function.createAccumulators();
} else {
firstRow = false;
}
// set accumulators to handler first
function.setAccumulators(accum