一. 前言
在Presto中,Function函数(比如sum,max,min, count等等)的实现都是通过动态代码生成器生成的。各个算子的逻辑实现BuiltInFunctionNamespaceManager中管理的,但是真正的AggregationOperator代码实现则是通过AccumulatorCompiler动态生成的。本文通过走读sum(long)算子的实现,来了解Presto中Aggregation代码代码生成器的生成原理。
二. sum 函数的逻辑实现实现
@AggregationFunction("sum")
public final class LongSumAggregation
{
private LongSumAggregation() {}
// 对于需要sum列的每一个数据,都会调用一次sum函数,value为列的值,以下的sum函数实现数据的sum操作
@InputFunction
public static void sum(@AggregationState NullableLongState state, @SqlType(StandardTypes.BIGINT) long value)
{
state.setNull(false);
state.setLong(BigintOperators.add(state.getLong(), value));
}
//combine函数为最后的final aggregation进行调用,用于合并各个partial aggration的state(上述sum函数产生的stage)
@CombineFunction
public static void combine(@AggregationState NullableLongState state, @AggregationState NullableLongState otherState)
{
if (state.isNull()) {
state.setNull(false);
state.setLong(otherState.getLong());
return;
}
state.setLong(BigintOperators.add(state.getLong(), otherState.getLong()));
}
// OutputFunction为最后的总输出,给下游产生一个page的Block
@OutputFunction(StandardTypes.BIGINT)
public static void output(@AggregationState NullableLongState state, BlockBuilder out)
{
NullableLongState.write(BigintType.BIGINT, state, out);
}
}
三. 动态生成aggregation算子的实现
代码生成器的实现在AccumulatorCompiler::generateAccumulatorClass 实现的,其动态生成了如下几个代码字节段:
3.1 generateConstructor 动态生成构造函数,动态生成的代码大概如下所示:
public final class io.prestosql.$gen.BigintBigintSumGroupedAccumulator_20220901_221627_11 implements io.prestosql.operator.aggregation.GroupedAccumulator
{
private final io.prestosql.spi.function.AccumulatorStateSerializer stateSerializer_0;
private final io.prestosql.spi.function.AccumulatorStateFactory stateFactory_0;
private final io.prestosql.$gen.GroupedNullableLongState_20220901_221626_6 state_0;
private final List<Integer> inputChannels;
private final Optional<Integer> maskChannel;
public BigintBigintSumGroupedAccumulator_20220901_221627_11(List<AccumulatorStateDescriptor> stateDescriptors, List<Integer> inputChannels,
Optional<Integer> maskChannel, List<LambdaProvider> lambdaProviders)
{
super();
this.stateSerializer_0 = stateDescriptors.get(0).getSerializer();
this.stateFactory_0 = stateDescriptors.get(0).getFactory();
this.inputChannels = inputChannels;
this.maskChannel = maskChannel;
this.state_0 = this.stateFactory_0.createGroupedState();
}
}
3.2 generateAddInput 动态生成addInput函数,动态生成的代码大概如下:
public void addInput(Page page)
{
this.masksBlock = maskChannel.map(AggregationUtils.pageblockGetter(page)).orElse(null);
long rows = Page.getBlock(0).getPositionCount;
for(int i=0; i < rows; i++) {
block0 = page.getBlock(inputChannels.get(i));
long value = bigint.getLong(block, i);
InputFunction(state_0, value) // 此处的InputFunction则调用了上边的LongSumAggregation的sum方法
}
}
3.3 generateAddInputWindowIndex动态生成带窗口函数的addInput方法,动态生成的代码大概如下:
public void addInput(io.prestosql.spi.function.WindowIndex index, java.util.List<Integer> channels, int startPosition, int endPosition)
{
for (int i = startPosition; i < endPosition; i++)
{
addInput(index.getPage(i))
}
}
3.4 generateGetIntermediateType动态生成addIntermediate函数,其实现大概如下:
public void addIntermediate(io.prestosql.operator.GroupByIdBlock groupIdsBlock, io.prestosql.spi.block.Block block)
{
long rows = Block.getPositionCount();
for (int i = 0; i < rows; i++) {
State scratchState_0 = stateSerializer_0.deserialize(block, i);
CombineFunction(state_0, scratchState_0) // 此处的CombineFunction则调用了上边的LongSumAggregation的combine方法
}
}
3.5 generateEvaluateFinal动态生成了evaluateFinal函数,其实现大概如下:
public void evaluateFinal(io.prestosql.spi.block.BlockBuilder out) {
outputFunction(state_0, out) 此处的CombineFunction则调用了上边的LongSumAggregation的output方法
}
至此,一个Aggregator算子的实现类便动态生成了。动态生成的Aggregator算子将在Aggregator进行应用。
四. 应用动态生成的Operation算子进行聚合计算
public void processPage(Page page)
{
if (step.isInputRaw()) {
aggregation.addInput(page); //调用上边3.2生成的addInput函数
}
else {
aggregation.addIntermediate(page.getBlock(intermediateChannel));
}
}
public void evaluate(BlockBuilder blockBuilder)
{
if (step.isOutputPartial()) {
aggregation.evaluateIntermediate(blockBuilder);
}
else {
aggregation.evaluateFinal(blockBuilder); //调用上边3.5生成的evaluateFinal函数
}
}
至此,一个完整的聚合操作便已经完成。