1.1.1.1 哈希连接代价:final_cost_hashjoin
函数功能:
计算两个关系使用hash连接算法的花费。内外表的元组数和Hash运算的选择率乘积得到了参与hash连接的元组数,这个hash连接的元组数和hash创建的花费决定了CPU的花费。
对于hashjoin花费的计算,分为两个函数:
1. 一是initial_cost_hashjoin函数,初步估计hashjoin算法的花费,形成的结果会作为final_cost_hashjoin的入口参数(JoinCostWorkspace *workspace)传入final_cost_hashjoin函数;
2. 二是final_cost_hashjoin函数,对hashjoin算法的花费进行全面的估计(如:CPU/IO的花费、内表是否唯一、不同连接类型的差别等)。
代码分析:
void
final_cost_hashjoin(PlannerInfo *root, HashPath *path,
JoinCostWorkspace *workspace,
SpecialJoinInfo *sjinfo,
SemiAntiJoinFactors *semifactors)
{......
virtualbuckets = (double) numbuckets *(double) numbatches;
//hash算法,首先为内表构造hash链表,外表的连接的对象,根据hash函数计算值向内表形成的hash链表上映射;所以,需要求解构造hash的花费,即求解构造内表所在的hash的花费
//如果唯一,则没有重复的,平均分布即可,所以每个bucket的概率是相同的
if (IsA(inner_path, UniquePath))
innerbucketsize = 1.0 / virtualbuckets;
else
{
innerbucketsize = 1.0;
foreach(hcl, hashclauses)//遍历每条hash子句(参数传入的限制条件)
{......
//hash子句被内表涵盖
if (bms_is_subset(restrictinfo->right_relids, inner_path->parent->relids))
{
thisbucketsize = restrictinfo->right_bucketsize;
if (thisbucketsize < 0)//如果小于零,则缓存中不存在,需要重新计算
{
thisbucketsize = estimate_hash_bucketsize(root,
get_rightop(restrictinfo->clause), virtualbuckets);
restrictinfo->right_bucketsize = thisbucketsize;
}
}
else//否则,hash子句被外表涵盖;处理过程类似if语句
{......}
if (innerbucketsize > thisbucketsize)//保存最小值
innerbucketsize = thisbucketsize;
}
}
//计算限制等类型的条件的CPU花费
cost_qual_ev
cost_qual_ev
......
//如果是半连接或反半连接,注意:
//1 半连接或反半连接语义,找到一个满足连接条件的,即可停止对同类型的进行扫描,所以不用匹配很多元组,所以花费应该少
//2 不管是何种方式,计算花费的主要因子通常都是在元组数、匹配度上,所以如下计算都有这样的考虑
if (path->jpath.jointype == JOIN_SEMI || path->jpath.jointype == JOIN_ANTI)
{......
//只要有一个匹配,就可以停止扫描其他同类项
outer_matched_rows = rint(outer_path_rows * semifactors->outer_match_frac);
inner_scan_frac = 2.0 / (semifactors->match_count + 1.0);
startup_cost += hash_qual_cost.startup;
run_cost += hash_qual_cost.per_tuple * outer_matched_rows *
clamp_row_est(inner_path_rows * innerbucketsize * inner_scan_frac) * 0.5;
//为不匹配的情况,计算花费
run_cost += hash_qual_cost.per_tuple *
(outer_path_rows - outer_matched_rows) *
clamp_row_est(inner_path_rows / virtualbuckets) * 0.05;
if (path->jpath.jointype == JOIN_SEMI)
hashjointuples = outer_matched_rows;
else
hashjointuples = outer_path_rows - outer_matched_rows;
}
else//普通的连接类型,认为约有0.5的匹配度(这算是一种启发式规则)
{
startup_cost += hash_qual_cost.startup;
run_cost += hash_qual_cost.per_tuple * outer_path_rows *
clamp_row_est(inner_path_rows * innerbucketsize) * 0.5;
hashjointuples = approx_tuple_count(root, &path->jpath, hashclauses);
}
//计算CPU花费
startup_cost += qp_qual_cost.startup;
cpu_per_tuple = cpu_tuple_cost + qp_qual_cost.per_tuple;
run_cost += cpu_per_tuple * hashjointuples;
path->jpath.path.startup_cost = startup_cost;
path->jpath.path.total_cost = startup_cost + run_cost;
}
调用关系图解: