本文简单介绍了PG查询逻辑优化中的子查询链接(subLink),以ANY子链接为例介绍了子查询链接上拉主函数处理逻辑以及使用gdb跟踪分析。
一、子链接上拉简介
PG尝试对ANY_SUBLINK和EXISTS_SUBLINK两种类型的子链接尝试提升,ANY_SUBLINK表示IN/ANY子句,EXISTS_SUBLINK表示EXISTS子句.子链接上拉(pull up)的目的是为了提升性能,把查询一条记录对比一条记录的逻辑实现变换为半连接或反半连接实现.
上拉样例:
-- 原始SQL
select *
from t_dwxx a
where dwbh > any (select b.dwbh from t_grxx b);
-- 上拉后的SQL(示意,实际不能执行)
select a.*
from t_dwxx a semi join select b.dwbh from t_grxx b on a.dwbh > pullup.dwbh;
执行了子链接上拉后的执行计划:
testdb=# explain select *
testdb-# from t_dwxx a
testdb-# where dwbh > any (select b.dwbh from t_grxx b);
QUERY PLAN
------------------------------------------------------------------------
Nested Loop Semi Join (cost=0.00..815.76 rows=53 width=474)
Join Filter: ((a.dwbh)::text > (b.dwbh)::text)
-> Seq Scan on t_dwxx a (cost=0.00..11.60 rows=160 width=474)
-> Materialize (cost=0.00..17.35 rows=490 width=38)
-> Seq Scan on t_grxx b (cost=0.00..14.90 rows=490 width=38)
(5 rows)
没有上拉的执行计划:
testdb=# explain select *
testdb-# from t_dwxx a
testdb-# where dwbh > any (select b.dwbh from t_grxx b where a.dwbh = b.dwbh);
QUERY PLAN
------------------------------------------------------------------
Seq Scan on t_dwxx a (cost=0.00..1302.40 rows=80 width=474)
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on t_grxx b (cost=0.00..16.12 rows=2 width=38)
Filter: ((a.dwbh)::text = (dwbh)::text)
(5 rows)
二、源码解读
pull_up_sublinks
/*
* pull_up_sublinks
* Attempt to pull up ANY and EXISTS SubLinks to be treated as
* semijoins or anti-semijoins.
* 尝试上拉(pull up) ANY/IN 和 EXISTS 子链接,变换为半连接或反半连接
*
* A clause "foo op ANY (sub-SELECT)" can be processed by pulling the
* sub-SELECT up to become a rangetable entry and treating the implied
* comparisons as quals of a semijoin. However, this optimization *only*
* works at the top level of WHERE or a JOIN/ON clause, because we cannot
* distinguish whether the ANY ought to return FALSE or NULL in cases
* involving NULL inputs. Also, in an outer join's ON clause we can only
* do this if the sublink is degenerate (ie, references only the nullable
* side of the join). In that case it is legal to push the semijoin
* down into the nullable side of the join. If the sublink references any
* nonnullable-side variables then it would have to be evaluated as part
* of the outer join, which makes things way too complicated.
*
* Under similar conditions, EXISTS and NOT EXISTS clauses can be handled
* by pulling up the sub-SELECT and creating a semijoin or anti-semijoin.
*
* express op ANY (sub-SELECT) 语句可以通过上拉子查询为RTE并且把对比操作变换
* 为半连接操作,这种优化只能在最上层的WHERE语句中实现
* EXISTS/NOT EXISTS语句类似
*
*
* This routine searches for such clauses and does the necessary parsetree
* transformations if any are found.
*
* This routine has to run before preprocess_expression(), so the quals
* clauses are not yet reduced to implicit-AND format, and are not guaranteed
* to be AND/OR-flat either. That means we need to recursively search through
* explicit AND clauses. We stop as soon as we hit a non-AND item.
*/
void
pull_up_sublinks(PlannerInfo *root)
{
Node *jtnode;
Relids relids;
/* Begin recursion through the jointree */
jtnode = pull_up_sublinks_jointree_recurse(root,
(Node *) root->parse->jointree,
&relids);//执行上拉操作
/*
* root->parse->jointree must always be a FromExpr, so insert a dummy one
* if we got a bare RangeTblRef or JoinExpr out of the recursion.
*/
if (IsA(jtnode, FromExpr))//jointree要求FromExpr类型
root->parse->jointree = (FromExpr *) jtnode;
else
root->parse->jointree = makeFromExpr(list_make1(jtnode), NULL);
}
pull_up_sublinks_jointree_recurse
处理流程如下图所示:
查询树结构如下:
/*
* Recurse through jointree nodes for pull_up_sublinks()
*
* In addition to returning the possibly-modified jointree node, we return
* a relids set of the contained rels into *relids.
*/
static Node *
pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
Relids *relids)
{
if (jtnode == NULL)
{
*relids = NULL;
}
else if (IsA(jtnode, RangeTblRef))//如为RangeTblRef类型
{
int varno = ((RangeTblRef *) jtnode)->rtindex;
*relids = bms_make_singleton(varno);
/* jtnode is returned unmodified */
}
else if (IsA(jtnode, FromExpr))//如为FromExpr类型
{
FromExpr *f = (FromExpr *) jtnode;
List *newfromlist = NIL;
Relids frelids = NULL;
FromExpr *newf;
Node *jtlink;
ListCell *l;
/* First, recurse to process children and collect their relids */
foreach(l, f->fromlist)//
{
Node *newchild;
Relids childrelids;
//对fromlist中的元素执行上拉操作
//如能够上拉,则把子查询从WHERE子句中提升到FROM子句,newchild作为连接的一部分
newchild = pull_up_sublinks_jointree_recurse(root,
lfirst(l),
&childrelids);
newfromlist = lappend(newfromlist, newchild);
frelids = bms_join(frelids, childrelids);
}
/* Build the replacement FromExpr; no quals yet */
newf = makeFromExpr(newfromlist, NULL);//创建新的FromExpr
/* Set up a link representing the rebuilt jointree */
jtlink = (Node *) newf;
/* Now process qual --- all children are available for use */
//处理子链接中的表达式
//newf(指针,相当于jtlink)
newf->quals = pull_up_sublinks_qual_recurse(root, f->quals,
&jtlink, frelids,
NULL, NULL);//
/*
* Note that the result will be either newf, or a stack of JoinExprs
* with newf at the base. We rely on subsequent optimization steps to
* flatten this and rearrange the joins as needed.
*
* Although we could include the pulled-up subqueries in the returned
* relids, there's no need since upper quals couldn't refer to their
* outputs anyway.
*/
*relids = frelids;//设置相关的relids
jtnode = jtlink;//返回值
}
else if (IsA(jtnode, JoinExpr))
{
JoinExpr *j;
Relids leftrelids;
Relids rightrelids;
Node *jtlink;
/*