PostgreSQL 并行查询选择

PostgreSQL 并行查询选择

标签

PostgreSQL , 并行 , 代价 , 额外的启动代价


背景

为什么PostgreSQL 并行查询有额外的启动成本, 按照CBO的代价低优先原则, 还是会选择使用并行查询?

算法请参考:

《PostgreSQL 并行计算 优化器算法浅析》

PostgreSQL 估算并行计算代价时, 会考虑并行计算worker进程和leader进程之间的memory copy开销, 以及启动worker节点是分配共享内存的额外开销:

src/backend/optimizer/path/costsize.c

  • parallel_tuple_cost Cost of CPU time to pass a tuple from worker to leader backend
  • parallel_setup_cost Cost of setting up shared memory for parallelism

例如, 额外的代价会体现在gather node:

/*  
 * cost_gather  
 *        Determines and returns the cost of gather path.  
 *  
 * 'rel' is the relation to be operated upon  
 * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL  
 * 'rows' may be used to point to a row estimate; if non-NULL, it overrides  
 * both 'rel' and 'param_info'.  This is useful when the path doesn't exactly  
 * correspond to any particular RelOptInfo.  
 */  
void  
cost_gather(GatherPath *path, PlannerInfo *root,  
                        RelOptInfo *rel, ParamPathInfo *param_info,  
                        double *rows)  
{  
        Cost            startup_cost = 0;  
        Cost            run_cost = 0;  
  
        /* Mark the path with the correct row estimate */  
        if (rows)  
                path->path.rows = *rows;  
        else if (param_info)  
                path->path.rows = param_info->ppi_rows;  
        else  
                path->path.rows = rel->rows;  
  
        startup_cost = path->subpath->startup_cost;  
  
        run_cost = path->subpath->total_cost - path->subpath->startup_cost;  
  
        /* Parallel setup and communication cost. */  
        startup_cost += parallel_setup_cost;  
        run_cost += parallel_tuple_cost * path->path.rows;  
  
        path->path.startup_cost = startup_cost;  
        path->path.total_cost = (startup_cost + run_cost);  
}  

但是, 在估算node cpu的代价时, 用到的并不是所有worker节点代价的总和, 而是每个worker的代价.

/*  
 * cost_seqscan  
 *        Determines and returns the cost of scanning a relation sequentially.  
 *  
 * 'baserel' is the relation to be scanned  
 * 'param_info' is the ParamPathInfo if this is a parameterized path, else NULL  
 */  
  
        /* Adjust costing for parallelism, if used. */  
        if (path->parallel_workers > 0)  
        {  
                double          parallel_divisor = get_parallel_divisor(path);  
  
                /* The CPU cost is divided among all the workers. */  
                cpu_run_cost /= parallel_divisor;   
  
                /*  
                 * It may be possible to amortize some of the I/O cost, but probably  
                 * not very much, because most operating systems already do aggressive  
                 * prefetching.  For now, we assume that the disk run cost can't be  
                 * amortized at all.  
                 */  
  
                /*  
                 * In the case of a parallel plan, the row count needs to represent  
                 * the number of tuples processed per worker.  
                 */  
                path->rows = clamp_row_est(path->rows / parallel_divisor);  
        }  
/*  
 * Estimate the fraction of the work that each worker will do given the  
 * number of workers budgeted for the path.  
 */  
static double  
get_parallel_divisor(Path *path)  
{  
        double          parallel_divisor = path->parallel_workers;  
  
        /*  
         * Early experience with parallel query suggests that when there is only  
         * one worker, the leader often makes a very substantial contribution to  
         * executing the parallel portion of the plan, but as more workers are  
         * added, it does less and less, because it's busy reading tuples from the  
         * workers and doing whatever non-parallel post-processing is needed.  By  
         * the time we reach 4 workers, the leader no longer makes a meaningful  
         * contribution.  Thus, for now, estimate that the leader spends 30% of  
         * its time servicing each worker, and the remainder executing the  
         * parallel plan.  
         */  
        if (parallel_leader_participation)  
        {  
                double          leader_contribution;  
  
                leader_contribution = 1.0 - (0.3 * path->parallel_workers);  
                if (leader_contribution > 0)  
                        parallel_divisor += leader_contribution;  
        }  
  
        return parallel_divisor;  
}  

注意: page scan的代价是算总的, 并没有像cpu tuple cost那样除以worker数. 为啥呢? 可能觉得IO的并行提升能力有限? 又或者是pg hacker们没有考虑到这个问题?

        /*  
         * disk costs  
         */  
        disk_run_cost = spc_seq_page_cost * baserel->pages;  

最后总结一下, PostgreSQL的CBO思路是哪个耗时短就选哪个. 所以 代价模型=执行时间模型, 因此并行计算既然能更快, 那就要在代价估算算法中有所体现, PG在计算cpu tuple cost时会除以worker数, 实际上算的是每个worker的代价, 和并行执行的时间模型匹配.

所以虽然并行计算有启动worker的创建共享内存, worker和leader进程的内存拷贝的额外代价, 但是最终并行计算的代价还是更低, 最终有可能选择并行计算.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值