PostgreSQL查询代价估算(六)

1.1.1.1     哈希连接代价:final_cost_hashjoin

函数功能:

计算两个关系使用hash连接算法的花费。内外表的元组数和Hash运算的选择率乘积得到了参与hash连接的元组数,这个hash连接的元组数和hash创建的花费决定了CPU的花费。

对于hashjoin花费的计算,分为两个函数:

1.         一是initial_cost_hashjoin函数,初步估计hashjoin算法的花费,形成的结果会作为final_cost_hashjoin的入口参数(JoinCostWorkspace *workspace)传入final_cost_hashjoin函数;

2.         二是final_cost_hashjoin函数,对hashjoin算法的花费进行全面的估计(如:CPU/IO的花费、内表是否唯一、不同连接类型的差别等)。

 

代码分析:

void

final_cost_hashjoin(PlannerInfo *root, HashPath *path,

                                   JoinCostWorkspace *workspace,

                                   SpecialJoinInfo *sjinfo,

                                   SemiAntiJoinFactors *semifactors)

{......

       virtualbuckets = (double) numbuckets *(double) numbatches;

      

    //hash算法,首先为内表构造hash链表,外表的连接的对象,根据hash函数计算值向内表形成的hash链表上映射;所以,需要求解构造hash的花费,即求解构造内表所在的hash的花费

 

       //如果唯一,则没有重复的,平均分布即可,所以每个bucket的概率是相同的

       if (IsA(inner_path, UniquePath))

              innerbucketsize = 1.0 / virtualbuckets;

       else

       {

              innerbucketsize = 1.0;

              foreach(hcl, hashclauses)//遍历每条hash子句(参数传入的限制条件)

              {......

                     //hash子句被内表涵盖

                     if (bms_is_subset(restrictinfo->right_relids, inner_path->parent->relids))

                     {

                            thisbucketsize = restrictinfo->right_bucketsize;

                            if (thisbucketsize < 0)//如果小于零,则缓存中不存在,需要重新计算

                            {

                                   thisbucketsize = estimate_hash_bucketsize(root,

                                                               get_rightop(restrictinfo->clause), virtualbuckets);

                                   restrictinfo->right_bucketsize = thisbucketsize;

                            }

                     }

                     else//否则,hash子句被外表涵盖;处理过程类似if语句

                     {......}

 

                     if (innerbucketsize > thisbucketsize)//保存最小值

                            innerbucketsize = thisbucketsize;

              }

       }

 

       //计算限制等类型的条件的CPU花费

       cost_qual_eval(&hash_qual_cost, hashclauses, root);

       cost_qual_eval(&qp_qual_cost, path->jpath.joinrestrictinfo, root);

......

  //如果是半连接或反半连接,注意:

  //1 半连接或反半连接语义,找到一个满足连接条件的,即可停止对同类型的进行扫描,所以不用匹配很多元组,所以花费应该少

  //2 不管是何种方式,计算花费的主要因子通常都是在元组数、匹配度上,所以如下计算都有这样的考虑

       if (path->jpath.jointype == JOIN_SEMI || path->jpath.jointype == JOIN_ANTI)

       {......

              //只要有一个匹配,就可以停止扫描其他同类项

              outer_matched_rows = rint(outer_path_rows * semifactors->outer_match_frac);

              inner_scan_frac = 2.0 / (semifactors->match_count + 1.0);

 

              startup_cost += hash_qual_cost.startup;

              run_cost += hash_qual_cost.per_tuple * outer_matched_rows *

                     clamp_row_est(inner_path_rows * innerbucketsize * inner_scan_frac) * 0.5;

 

              //为不匹配的情况,计算花费

              run_cost += hash_qual_cost.per_tuple *

                     (outer_path_rows - outer_matched_rows) *

                     clamp_row_est(inner_path_rows / virtualbuckets) * 0.05;

 

              if (path->jpath.jointype == JOIN_SEMI)

                     hashjointuples = outer_matched_rows;

              else

                     hashjointuples = outer_path_rows - outer_matched_rows;

       }

       else//普通的连接类型,认为约有0.5的匹配度(这算是一种启发式规则)

       {

              startup_cost += hash_qual_cost.startup;

              run_cost += hash_qual_cost.per_tuple * outer_path_rows *

                     clamp_row_est(inner_path_rows * innerbucketsize) * 0.5;

 

              hashjointuples = approx_tuple_count(root, &path->jpath, hashclauses);

       }

 

       //计算CPU花费

       startup_cost += qp_qual_cost.startup;

       cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple;

       run_cost += cpu_per_tuple * hashjointuples;

 

       path->jpath.path.startup_cost = startup_cost;

       path->jpath.path.total_cost = startup_cost + run_cost;

}

 

调用关系图解:

PostgreSQL查询代价估算(六) - 那海蓝蓝 - 那海蓝蓝的博客
 


final_cost_hashjoin 函数被 create_hashjoin_path 函数调用,直至被 add_paths_to_joinrel 函数调用,完成两表连接的 hashjoin 连接算法的花费估算。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值