Presto的内存管理

最新推荐文章于 2024-06-28 14:43:06 发布

codealy

最新推荐文章于 2024-06-28 14:43:06 发布

阅读量402

点赞数

文章标签： java 数据库

本文链接：https://blog.csdn.net/yueny/article/details/131417463

版权

Presto的内存管理

Presto是一个分布式的查询引擎，本身并不存储数据，但是可以接入多种数据源，并且支持跨数据源的级联查询。

在这里插入图片描述

Presto的架构分为：

Coodinator：解析SQL语句，生成执行计划，分发执行任务给Worker节点执行。
Discovery Server：Worker节点启动后向Discovery Server服务注册，Coordinator从Discovery Server获得可以正常工作的Worker节点。
Worker：负责执行实际查询任务，访问底层存储系统。
存储：Presto的数据可以存储在HDFS/OBS，推荐热数据存储在HDFS，冷数据存储在OBS。

内存管理

Presto 里面内存只有2种内存，一种是user memory，另一种是system memory。
system memory用于input/output/exchange buffers等，user memory 用于hash join、agg这些。

Presto 将用户内存和系统内存都使用内存池的方式进行管理，避免不断的申请回收导致性能下降。

内存池类型

<= v0.200

在v0.201 之前，Presto 有三种内存池，分别为 GENERAL_POOL、RESERVED_POOL、SYSTEM_POOL。

GENERAL_POOL：用于普通查询的physical operators。GENERAL_POOL值为总内存（Xmx值）- 预留的（max-memory-per-node）- 系统的（0.4 * Xmx）。

SYSTEM_POOL：系统预留内存，用于读写buffer，worker初始化以及执行任务必要的内存。大小由config.properties里的resources.reserved-system-memory指定。默认值为JVM max memory * 0.4。

RESERVED_POOL：大部分时间里是不参与计算的，只有当GENERAL_POOL满的时候，将最占用内存的SQL分配到 RESERVED_POOL 内存上来。从所有查询里获取占用内存最大的那个查询，然后将该查询放到 RESERVED_POOL 里执行，同时注意 RESERVED_POOL只能用于一个Query。大小由config.properties里的query.max-memory-per-node指定，默认值为：JVM max memory * 0.1。

但 RESERVED_POOL 实际使用时，这块内存很少会被使用，原因是： 1. 一般使用Presto的业务是用来SQL提速的，不会使用spill disk功能，2. 为了服务的稳定性，会限制最大内存和Kill内存策略，所以会出现查询没有被分配到RESERVED_POOL之前，SQL将会被系统Kill掉。

>= v0.201

System memory pool is now unused by default and it will eventually be removed completely. All memory allocations will now be served from the general/user memory pool. The old behavior can be restored with the deprecated.legacy-system-pool-enabled config option.

v 0.201之后，默认 SYSTEM_POOL 是不开启的，参数 deprecated.legacy-system-pool-enabled 控制，默认值为false。

通过代码走读 SqlTaskManager#createQueryContext ：

private QueryContext createQueryContext(
            QueryId queryId,
            LocalMemoryManager localMemoryManager,
            NodeMemoryConfig nodeMemoryConfig,
            LocalSpillManager localSpillManager,
            GcMonitor gcMonitor,
            DataSize maxQueryUserMemoryPerNode,
            DataSize maxQueryTotalMemoryPerNode,
            DataSize maxQuerySpillPerNode)
    {
        if (nodeMemoryConfig.isLegacySystemPoolEnabled()) {
            Optional<MemoryPool> systemPool = localMemoryManager.getSystemPool();
            verify(systemPool.isPresent(), "systemPool must be present");
            return new LegacyQueryContext(
                    queryId,
                    maxQueryUserMemoryPerNode,
                    localMemoryManager.getGeneralPool(),
                    systemPool.get(),
                    gcMonitor,
                    taskNotificationExecutor,
                    driverYieldExecutor,
                    maxQuerySpillPerNode,
                    localSpillManager.getSpillSpaceTracker());
        }
 
        return new DefaultQueryContext(
                queryId,
                maxQueryUserMemoryPerNode,
                maxQueryTotalMemoryPerNode,
                localMemoryManager.getGeneralPool(),
                gcMonitor,
                taskNotificationExecutor,
                driverYieldExecutor,
                maxQuerySpillPerNode,
                localSpillManager.getSpillSpaceTracker());
    }
}

可以看到 systemPool.get() 被替换为了 localMemoryManager.getGeneralPool() ，所以GENERAL_POOL扮演了之前GENERAL_POOL及 SYSTEM_POOL 的作用，提供user memory和system memory。

内存调优参数

presto_memory_config

并行度调优

调整线程数增大task的并发以提高效率。

presto_task_config

常见OOM报错

Query exceeded per-node total memory limit of xx

适当增加query.max-total-memory-per-node。

Query exceeded distributed user memory limit of xx

适当增加query.max-memory。

Could not communicate with the remote task. The node may have crashed or be under too much load

内存不够，导致节点crash，可以查看/var/log/message。

内存申请

Presto 的内存申请是在Operator操作中完成的。

比如 TableScanOperator 中申请的系统内存：

public class TableScanOperator
        implements SourceOperator, Closeable
{
    @Override
    public Page getOutput()
    {
        ...

        // updating system memory usage should happen after page is loaded.
        systemMemoryContext.setBytes(source.getSystemMemoryUsage());

        return page;
    }
    
}

比如 AggregationOperator 中申请的用户内存:

/**
 * Group input data and produce a single block for each sequence of identical values.
 */
public class AggregationOperator
        implements Operator
{
      
    @Override
    public void addInput(Page page)
    {
        ...

        long memorySize = 0;
        for (Aggregator aggregate : aggregates) {
            aggregate.processPage(page);
            memorySize += aggregate.getEstimatedSize();
        }
        
        if (useSystemMemory) {
            systemMemoryContext.setBytes(memorySize);
        }
        else {
            userMemoryContext.setBytes(memorySize);
        }
    }

}