openGauss数据库源码解析 | SQL引擎源解析（14）

最新推荐文章于 2024-08-21 15:46:02 发布

openGauss小助手

最新推荐文章于 2024-08-21 15:46:02 发布

阅读量792

点赞数 8

文章标签： sql 数据库 openGauss

本文链接：https://blog.csdn.net/weixin_53596073/article/details/137034028

版权

2）建立连接路径

至此已经筛选出两个满足条件的RelOptInfo，那么下一步就是要对他们中的路径建立物理连接关系。通常的物理连接路径有NestLoop、Merge Join和Hash Join三种，这里主要是借由sort_inner_and_outer、match_unsorted_outer和hash_inner_and_outer函数实现的。

像sort_inner_and_outer函数主要是生成MergeJoin路径，其特点是假设内表和外表的路径都是无序的，所以必须要对其进行显示排序，内外表只要选择总代价最低的路径即可。而matvh_unsorted_outer函数则是代表外表已经有序，这时候只需要对内表进行显示排序就可以生成MergeJoin路径或者生成NestLoop以及参数化路径。最后的选择就是对两表连接建立HashJoin路径，也就是要建立哈希表。

为了方便MergeJoin的建立，首先需要对约束条件进行处理，故把适用于MergeJoin的约束条件从中筛选出来（select_mergejoin_clauses函数），这样在sort_inner_and_outer和match_unsorted_outer函数中都可以利用这个Mergejoinable连接条件。代码如下：

//提取可以进行Merge Join的条件
foreach (l, restrictlist) {
    RestrictInfo* restrictinfo = (RestrictInfo*)lfirst(l);

//如果当前是外连接并且是一个过滤条件，那么就忽略
    if (isouterjoin && restrictinfo->is_pushed_down)
        continue;

    //对连接条件是否可以做Merge Join进行一个初步的判断
//restrictinfo->can_join和restrictinfo->mergeopfamilies都是在distribute_qual_to_rels生成
    if (!restrictinfo->can_join || restrictinfo->mergeopfamilies == NIL) {
//忽略FULL JOIN ON FALSE情况
        if (!restrictinfo->clause || !IsA(restrictinfo->clause, Const))
            have_nonmergeable_joinclause = true;
        continue; /* not mergejoinable */
    }

//检查约束条件是否是outer op inner或者inner op outer的形式
    if (!clause_sides_match_join(restrictinfo, outerrel, innerrel)) {
        have_nonmergeable_joinclause = true;
        continue; /* no good for these input relations */
    }

//更新并使用最终的等价类
//"规范化"pathkeys，这样约束条件就能和pathkeys进行匹配
    update_mergeclause_eclasses(root, restrictinfo);

    if (EC_MUST_BE_REDUNDANT(restrictinfo->left_ec) || EC_MUST_BE_REDUNDANT(restrictinfo->right_ec)) {
        have_nonmergeable_joinclause = true;
        continue; /* can't handle redundant eclasses */
    }

    result_list = lappend(result_list, restrictinfo);
}

（1） sort_inner_and_outer函数。

sort_inner_and_outer函数主要用于生成MergeJoin路径，它需要显式地对两个字RelOptInfo进行排序，只考虑子RelOptInfo中的cheapest_total_path函数即可。通过MergeJoinable（能够用来生成Merge Join的）的连接条件来生成pathkeys，然后不断地调整pathkeys中pathke的顺序来获得不同的pathkeys集合，再根据不同顺序的pathkeys来决定内表的innerkeys和外表的outerkeys。代码如下：

//对外表和内表中的每一条路径进行连接尝试遍历
foreach (lc1, outerrel->cheapest_total_path) {
    Path* outer_path_orig = (Path*)lfirst(lc1);
    Path* outer_path = NULL;
    j = 0;
    foreach (lc2, innerrel->cheapest_total_path) {
        Path* inner_path = (Path*)lfirst(lc2);
        outer_path = outer_path_orig;

//参数化路径不可生成MergeJoin路径
        if (PATH_PARAM_BY_REL(outer_path, innerrel) ||
            PATH_PARAM_BY_REL(inner_path, outerrel))
            return;

        //必须满足外表和内表最低代价路径
        if (outer_path != linitial(outerrel->cheapest_total_path) &&
            inner_path != linitial(innerrel->cheapest_total_path)) {
            if (!join_used[(i - 1) * num_inner + j - 1]) {
                j++;
                continue;
            }
        }

//生成唯一化路径
        jointype = save_jointype;
        if (jointype == JOIN_UNIQUE_OUTER) {
            outer_path = (Path*)create_unique_path(root, outerrel, outer_path, sjinfo);
            jointype = JOIN_INNER;
        } else if (jointype == JOIN_UNIQUE_INNER) {
            inner_path = (Path*)create_unique_path(root, innerrel, inner_path, sjinfo);
            jointype = JOIN_INNER;
        }
//根据之前提取的条件确定可供MergeJoin路径生成的PathKeys集合
        all_pathkeys = select_outer_pathkeys_for_merge(root, mergeclause_list, joinrel);
//处理上面pathkeys集合中每一个Pathkey尝试生成MergeJoin路径
        foreach (l, all_pathkeys) {
……
            //生成内表的Pathkey
            innerkeys = make_inner_pathkeys_for_merge(root, cur_mergeclauses, outerkeys);

            //生成外表的Pathkey
            merge_pathkeys = build_join_pathkeys(root, joinrel, jointype, outerkeys);

//根据pathkey以及内外表路径生成MergeJoin路径
            try_mergejoin_path(root, ……,innerkeys);
        }
        j++;
    }
    i++;
}

（2） match_unsorted_outer函数。

match_unsorted_outer函数大部分整体代码思路和sort_inner_and_outer一致，最主要的一点不同是sort_inner_and_outer是根据条件去推断出内外表的pathkey。而在match_unsorted_outer函数中，是假定外表路径是有序的，它是按照外表的pathkeys反过来排序连接条件的，也就是外表的pathkeys直接就可以作为outerkeys使用，查看连接条件中哪些是和当前的pathkeys匹配的并把匹配的连接条件筛选出来，最后再参照匹配出来的连接条件生成需要显示排序的innerkeys。

（3） hash_inner_and_outer函数。

顾名思义，hash_inner_and_outer函数的主要作用就是建立HashJoin的路径，在distribute_restrictinfo_to_rels函数中已经判断过一个约束条件是否适用于Hashjoin。因为Hashjoin要建立哈希表，至少有一个适用于Hashjoin的连接条件存在才能使用HashJoin，否则无法创建哈希表。

3）路径筛选

至此为止已经生成了物理连接路径Hashjoin、NestLoop、MergeJoin，那么现在就是要根据他们生成过程中计算的代价去判断是否是一条值得保存的路径，因为在连接路径阶段会生成很多种路径，并会生成一些明显比较差的路径，这时候筛选可以帮助做一个基本的检查，能够节省生成计划的时间。因为如果生成计划的时间太长，即便选出了“很好”的执行计划，那么也是不能够接受的。

add_path为路径筛选主要函数。代码如下：

switch (costcmp) {
    case COSTS_EQUAL:
        outercmp = bms_subset_compare(PATH_REQ_OUTER(new_path), 
PATH_REQ_OUTER(old_path));
        if (keyscmp == PATHKEYS_BETTER1) {
            if ((outercmp == BMS_EQUAL || outercmp == BMS_SUBSET1) && 
  new_path->rows <= old_path->rows)
  //新路径代价和老路径相似，PathKeys要长，需要的参数更少
  //结果集行数少，故接受新路径放弃旧路径
                remove_old = true; /* new dominates old */
        } else if (keyscmp == PATHKEYS_BETTER2) {
            if ((outercmp == BMS_EQUAL || outercmp == BMS_SUBSET2) && 
  new_path->rows >= old_path->rows)
  //新路径代价和老路径相似，pathkeys要短，需要的参数更多
  //结果集行数更多，不接受新路径保留旧路径
                accept_new = false; /* old dominates new */
        } else {
            if (outercmp == BMS_EQUAL) {
  //到这里，新旧路径的代价、pathkeys、路径参数均相同或者相似
  //如果新路径返回的行数少，选择接受新路径，放弃旧路径
                if (new_path->rows < old_path->rows)
                    remove_old = true; /* new dominates old */
  //如果新路径返回行数多，选择不接受新路径，保留旧路径
                else if (new_path->rows > old_path->rows)
                    accept_new = false; /* old dominates new */
  //到这里，代价、pathkeys、路径参数、结果集行数均相似
  //那么就严格规定代价判断的范围，如果新路径好，则采用新路径，放弃旧路径
                else {
                    small_fuzzy_factor_is_used = true;
                    if (compare_path_costs_fuzzily(new_path, old_path, SMALL_FUZZY_FACTOR) ==
                        COSTS_BETTER1)
                        remove_old = true; /* new dominates old */
                    else
                        accept_new = false; /* old equals or
                                             * dominates new */
                }
   //如果代价和pathkeys相似，则比较行数和参数，好则采用，否则放弃
            } else if (outercmp == BMS_SUBSET1 && 
new_path->rows <= old_path->rows)
                remove_old = true; /* new dominates old */
            else if (outercmp == BMS_SUBSET2 && 
  new_path->rows >= old_path->rows)
                accept_new = false; /* old dominates new */
             /* else different parameterizations, keep both */
        }
        break;
    case COSTS_BETTER1:
//所有判断因为新路径均好于或者等于旧路径
//则接受新路径，放弃旧路径
        if (keyscmp != PATHKEYS_BETTER2) {
            outercmp = bms_subset_compare(PATH_REQ_OUTER(new_path), 
PATH_REQ_OUTER(old_path));
            if ((outercmp == BMS_EQUAL || outercmp == BMS_SUBSET1) && 
new_path->rows <= old_path->rows)
                remove_old = true; /* new dominates old */
        }
        break;
    case COSTS_BETTER2:
//所有判断因素旧路径均差于或者等于新路径
//则不接受新路径，保留旧路径
        if (keyscmp != PATHKEYS_BETTER1) {
            outercmp = bms_subset_compare(PATH_REQ_OUTER(new_path),
PATH_REQ_OUTER(old_path));
            if ((outercmp == BMS_EQUAL || outercmp == BMS_SUBSET2) && 
new_path->rows >= old_path->rows)
                accept_new = false; /* old dominates new */
        }
        break;
    default:

        /*
         * can't get here, but keep this case to keep compiler
         * quiet
         */
        break;
}