对analyze.cpp的解析(一)

奔走的月光

已于 2023-04-18 11:09:27 修改

阅读量215

点赞数 1

分类专栏： openGauss 文章标签： java 前端数据库

于 2022-08-29 22:11:22 首次发布

本文链接：https://blog.csdn.net/m0_60340015/article/details/126569695

版权

openGauss 专栏收录该内容

26 篇文章 2 订阅

订阅专栏

源码链接

https://www.gitlink.org.cn/Eao3piq4e/openGauss-server/tree/master/src%2Fcommon%2Fbackend%2Fparser%2Fanalyze.cpp

概述

在该文件中定义了很多用于语义解析的函数，可以说该文件在生成查询树的过程中不可或缺。其中最为重要的一个函数便是 parse_analyze() ，它作为语义分析的入口，用来处理语法分析产生的抽象语法树。而顺着调用关系一层层找下去发现最终的被调函数是 transformStmt() ，这篇博客我来说说这个函数的机理。

解析

transformStmt()

//代码清单1
//src/common/backend/parser/analyze.cpp
Query* transformStmt(ParseState* pstate, Node* parseTree, bool isFirstNode, bool isCreateView)
{
    Query* result = NULL;
    AnalyzerRoutine *analyzerRoutineHook = (AnalyzerRoutine*)u_sess->hook_cxt.analyzerRoutineHook;

    switch (nodeTag(parseTree)) {
            /*
             * Optimizable statements
             */
        case T_InsertStmt:
            result = transformInsertStmt(pstate, (InsertStmt*)parseTree);
            break;

        case T_DeleteStmt:
            result = transformDeleteStmt(pstate, (DeleteStmt*)parseTree);
            break;

        case T_UpdateStmt:
            result = transformUpdateStmt(pstate, (UpdateStmt*)parseTree);
            break;

        case T_MergeStmt:
            result = transformMergeStmt(pstate, (MergeStmt*)parseTree);
            break;

        case T_SelectStmt: {
            SelectStmt* n = (SelectStmt*)parseTree;
            if (n->valuesLists) {
                result = transformValuesClause(pstate, n);
            } else if (n->op == SETOP_NONE) {
                if (analyzerRoutineHook == NULL || analyzerRoutineHook->transSelect == NULL) {
                    result = transformSelectStmt(pstate, n, isFirstNode, isCreateView);
                } else {
                    result = analyzerRoutineHook->transSelect(pstate, n, isFirstNode, isCreateView);
                }
            } else {
                result = transformSetOperationStmt(pstate, n);
            }
        } break;

            /*
             * Special cases
             */
        case T_DeclareCursorStmt:
            result = transformDeclareCursorStmt(pstate, (DeclareCursorStmt*)parseTree);
            break;

        case T_ExplainStmt:
            result = transformExplainStmt(pstate, (ExplainStmt*)parseTree);
            break;

#ifdef PGXC
        case T_ExecDirectStmt:
            result = transformExecDirectStmt(pstate, (ExecDirectStmt*)parseTree);
            break;
#endif

        case T_CreateTableAsStmt:
            result = transformCreateTableAsStmt(pstate, (CreateTableAsStmt*)parseTree);
            break;

        case T_CreateModelStmt:
            result = transformCreateModelStmt(pstate, (CreateModelStmt*) parseTree);
            break;


        default:

            /*
             * other statements don't require any transformation; just return
             * the original parsetree with a Query node plastered on top.
             */
            result = makeNode(Query);
            result->commandType = CMD_UTILITY;
            result->utilityStmt = (Node*)parseTree;
            break;
    }

    /* Mark as original query until we learn differently */
    result->querySource = QSRC_ORIGINAL;
    result->canSetTag = true;

    /* Mark whether synonym object is in rtables or not. */
    result->hasSynonyms = pstate->p_hasSynonyms;

    return result;
}

该函数递归地将抽象语法树转换为查询树，具体的机理其实就是简单的匹配。它的第一个参数pstate 是 ParseState* 类型的指针，指向用于保存许多语义分析中间信息的内存区域，如原始 SQL 命令、范围表、连接表达式、原始 WINDOW 子句、FOR UPDATE/FOR SHARE 子句等。另外，ParseState 结构体在 src/include/parser/parse_node.h 文件中。

它的第二个参数 parseTree 是 Node* 类型的指针，实际上 Node 结构体只有一个用枚举类型 NodeTag 定义的变量 type ，就像这样：

//代码清单2
//src/include/nodes/nodes.h
/*
 * The first field of a node of any type is guaranteed to be the NodeTag.
 * Hence the type of any node can be gotten by casting it to Node. Declaring
 * a variable to be of Node * (instead of void *) can also facilitate
 * debugging.
 */
typedef struct Node {
    NodeTag type;
} Node;

而 NodeTag 枚举类型意味着 type 的值只能是已指定的范围中的一种，NodeTag 类型的定义也在 src/include/nodes/nodes.h 中，可以自己去看看取值的种类，由于过多就不在此展开。

第三、四个参数只有在语句是查询语句的时候才会被使用，其他情况下用不着。

再讲一下这个函数的内容，在代码清单1第8行调用了一个叫 nodeTag() 的函数，它其实是一个宏函数：

//代码清单3
//src/include/nodes/nodes.h
#define nodeTag(nodeptr) (((const Node*)(nodeptr))->type)

所以，这个函数的用处正是得到这个 parseTree 所指定的语句的类型，即这棵语法树是为哪一种语句建立的。得到了这个就好办了，我们将它与已有的几种语句种类去匹配：T_InsertStmt、T_DeleteStmt、T_UpdateStmt、T_MergeStmt、T_SelectStmt以及其它几种比较特殊的语句类型。默认情况下，即代码清单1第69~78行用到了 makeNode() 函数，该函数为：

//代码清单4
//src/include/nodes/nodes.h
/*
 * newNode -
 *	  create a new node of the specified size and tag the node with the
 *	  specified tag.
 *
 * !WARNING!: Avoid using newNode directly. You should be using the
 *	  macro makeNode.  eg. to create a Query node, use makeNode(Query)
 *
 * Note: the size argument should always be a compile-time constant, so the
 * apparent risk of multiple evaluation doesn't matter in practice.
 */
#ifdef __GNUC__
/* 针对gcc版本的newNode */
#ifndef FRONTEND_PARSER
#define newNode(size, tag)                                                \
    ({                                                                    \
        Node* _result;                                                    \
        AssertMacro((size) >= sizeof(Node));/* 检测申请的内存大小，大于等于sizeof(Node) */  \
        _result = (Node*)palloc0fast(size);/* 申请内存 */                              \
        _result->type = (tag);/*设置TypeTag */                                         \
        _result;/*返回值*/                                                          \
    })
#else // !FRONTEND_PARSER
#define newNode(size, tag)                                                \
    ({                                                                    \
        Node *_result;                                                    \
        AssertMacro((size) >= sizeof(Node)); /* need the tag, at least */ \
        _result = (Node *)feparser_malloc0(size);                         \
        _result->type = (tag);                                            \
        _result;                                                          \
    })
#endif // !FRONTEND_PARSER
#else
/*针对g++编译器版本的newNode，区别在于，g++版本的返回的指针要用全局变量*/
#define newNode(size, tag)                                              \
    (AssertMacro((size) >= sizeof(Node)), /* need the tag, at least */  \
        t_thrd.utils_cxt.newNodeMacroHolder = (Node*)palloc0fast(size), \
        t_thrd.utils_cxt.newNodeMacroHolder->type = (tag),              \
        t_thrd.utils_cxt.newNodeMacroHolder)
#endif /* __GNUC__ */

#define makeNode(_type_) ((_type_*)newNode(sizeof(_type_), T_##_type_))

调用 makeNode() 实际上就是在调用 newNode() ，而 newNode() 的作用是创建指定大小的新节点，并使用指定标签标记该节点，这个结点的类型就是我们之前提到的 Node 类型。然而注意到代码清单4中 makeNode() 的参数 _type_ ，当我们利用 newNode() 得到一个指向 Node 节点的 Node* 类型的指针后，这个指针会被强制转换为 _type_* 类型的。回到代码清单1，我们可以知道 Query 类型被作为参数 _type_ ，同时又由其它相连的语句得知除已经指定的语句类型外的其它语句类型不需要任何转换外，只需返回原始 parsetree ，并在顶部粘贴查询节点。

除 Node 结构体以外，另一个很重要的结构体就是 Query 结构体了，查看一下它的定义：

//代码清单5
//src/include/nodes/parsenodes_common.h
/*
 * Query -
 * 	  Parse analysis turns all statements into a Query tree
 * 	  for further processing by the rewriter and planner.
 *
 * 	  Utility statements (i.e. non-optimizable statements) have the
 * 	  utilityStmt field set, and the Query itself is mostly dummy.
 * 	  DECLARE CURSOR is a special case: it is represented like a SELECT,
 * 	  but the original DeclareCursorStmt is stored in utilityStmt.
 *
 * 	  Planning converts a Query tree into a Plan tree headed by a PlannedStmt
 * 	  node --- the Query structure is not used by the executor.
 */
typedef struct Query {
    NodeTag type;

    CmdType commandType; /* select|insert|update|delete|merge|utility */

    QuerySource querySource; /* where did I come from? */

    uint64 queryId; /* query identifier (can be set by plugins) */

    bool canSetTag; /* do I set the command result tag? */

    Node* utilityStmt; /* non-null if this is DECLARE CURSOR or a
                        * non-optimizable statement */

    int resultRelation; /* rtable index of target relation for
                         * INSERT/UPDATE/DELETE/MERGE; 0 for SELECT */
  ······
    List* rtable;       /* list of range table entries */
    FromExpr* jointree; /* table join tree (FROM and WHERE clauses) */
    List* targetList; /* target list (of TargetEntry) */
  ······
    bool can_push;
    bool        unique_check;               /* true if the subquery is generated by general
                                             * sublink pullup, and scalar output is needed */
    Oid* fixed_paramTypes; /* For plpy CTAS query. CTAS is a recursive call.CREATE query is the first rewrited.
                            * thd 2nd rewrited query is INSERT SELECT.whithout this attribute, DB will have
                            * an error that has no idea about $x when INSERT SELECT query is analyzed. */
    int fixed_numParams;
} Query;

可以看到 type 是它的第一个成员变量，类型为 NodeTag ，这变相地说明了它包含了 Node 结构体，因为 Node 结构体只有一个成员变量 type ，类型也为 NodeTag 。另外，这个结构体的 rtable 指定了查询哪些表(RangeTblEntry)，jointree 用于储存 SQL 语句中的 From ... WHERE ... 部分，而 targetList 指定了查询哪些列(TargetEntry)，这些变量对于构建查询树有很大的用处，之后我会有解析。

最后，关于这个函数，我再讲一下它的递归性体现在哪里。整个 transformStmt() 看下来，我们并看不出来它在哪一处调用了自身，既然明眼看不出，这时我就把眼光放在了它的各个switch分支上。果不其然，以 T_CreateModelStmt 分支为例，在这个分支中调用了 transformCreateModelStmt() 函数，查看一下，省略一些与这个问题关联没那么大的部分：

//代码清单6
//src/common/backend/parser/analyze.cpp
Query* transformCreateModelStmt(ParseState* pstate, CreateModelStmt* stmt)
{
    SelectStmt* select_stmt = (SelectStmt*) stmt->select_query;

    stmt->algorithm     = get_algorithm_ml(stmt->architecture);
······

    // Transform the select query that we prepared for the training operator
    Query* select_query = transformStmt(pstate, (Node*) select_stmt);
    stmt->select_query  = (Node*) select_query;

    /* represent the command as a utility Query */
    Query* result = makeNode(Query);
    result->commandType = CMD_UTILITY;
    result->utilityStmt = (Node*)stmt;

    return result;
}

在代码清单6第11行可以看到，该函数又调用了 transformStmt() 函数，这就说明了 transformStmt() 在递归地调用自身。

总结

transformStmt() 函数的本质就是将输入语句的类型拿去匹配现有的语句类型，当匹配成功便执行相应操作。既然弄清了这棵语法树是由什么类型的语句构造出来的，接下来要做的就是将它转化为查询树。对应的分支和执行的操作，我只做了一个表：

NodeTag值	语义分析函数
T_InsertStmt	transformInsertStmt()
T_DeleteStmt	transformDeleteStmt()
T_UpdateStmt	transformUpdateStmt()
T_MergeStmt	transformMergeStmt()
T_SelectStmt	transformSelectStmt()、transformValuesClause() 或 transformSetOperationStmt()
T_DeclareCursorStmt	transformDeclareCursorStmt() //游标定义语句
T_ExplainStmt	transformExplainStmt() //Explain语句，显示查询的执行计划
T_ExecDirectStmt	transformExecDirectStmt() //PGXC架构下才有此分支
T_CreateTableAsStmt	transformCreateTableAsStmt()
T_CreateModelStmt	transformCreateModelStmt()
其他	作为 Utility 类型处理，即建表、建索引等附加命令，并直接在分析树上封装一个 Query 节点返回