presto本身虽然是支持在计算时内存不足溢出到硬盘功能的,但presto更偏向于打造一个快短的纯内存计算的交互式计算引擎,因此在此功能没有下太多功夫,仅实现了个别算子支持spill功能,实现程度上也比不上Hive,Spark等mr类型的组件完善。因此使用presto执行厚重的SQL时候常常会有内存不足的错误。
presto的溢出硬盘是在worker的operator计算阶段处理的,支持溢出到硬盘的Operator有:
1. HashAggregationOperator
HashAggregationOperator的spill到硬盘的功能实现在:
public void addInput(Page page)
{
....
if (step.isOutputPartial() || !spillEnabled || hasOrderBy() || hasDistinct())
{
aggregationBuilder = new InMemoryHashAggregationBuilder(...);
}
else {
aggregationBuilder = new SpillableHashAggregationBuilder(...);
}
.....
aggregationBuilder.updateMemory();
}
2. OrderByOperator
OrderByOperator的spill到硬盘的功能实现在:
public void finish()
{
....
if (state == State.NEEDS_INPUT) {
....
getFutureValue(spillToDisk());
finishMemoryRevoke.run();
....
}
pageIndex.sort(sortChannels, sortOrder);
....
}
3. LookupJoinOperator(包括了INNER JOIN,OUTER JOIN, FULL JOIN 等join类型的算子:
LookupJoinOperator的spill到硬盘的功能实现在:
private void addInput(Page page, SpillInfoSnapshot spillInfoSnapshot)
{
......
if (spillInfoSnapshot.hasSpilled()) {
page = spillAndMaskSpilledPositions(page, spillInfoSnapshot.getSpillMask());
if (page.getPositionCount() == 0) {
return;
}
}
.....
}
4. WindowOperator
WindowOperator的spill到硬盘的功能实现在:
public WindowOperator(...)
{
...
if (spillEnabled) {
...
this.outputPages = pageBuffer.pages()
.flatTransform(spillablePagesToPagesIndexes.get())
);
}
else {
this.outputPages = pageBuffer.pages()
.transform(new PagesToPagesIndexes(inMemoryPagesIndexWithHashStrategies, orderChannels, ordering))
);
}
....
}