对execProcnode.cpp的解析

奔走的月光

已于 2023-04-18 11:05:13 修改

阅读量177

点赞数

分类专栏： openGauss 文章标签：服务器开发语言

于 2022-09-24 16:32:09 首次发布

本文链接：https://blog.csdn.net/m0_60340015/article/details/126921590

版权

openGauss 专栏收录该内容

26 篇文章 2 订阅

订阅专栏

源码链接

https://www.gitlink.org.cn/Eao3piq4e/openGauss-server/tree/master/src%2Fgausskernel%2Fruntime%2Fexecutor%2FexecProcnode.cpp

概述

在前一篇博客对execMain.cpp的解析(三)里，我们解析了 Plan 模块中三个比较重要的函数，而在这三个函数中，又分别调用了 ExecInitNode()、ExecProcNode()、ExecEndNode() 函数，这三个函数均在该文件中，接下来我会解析这三个函数以及其它相关的东西。

解析

ExecInitNode()

//代码清单1
//src/gausskernel/runtime/executor/execProcnode.cpp
PlanState* ExecInitNode(Plan* node, EState* estate, int e_flags)
{
    PlanState* result = NULL;
······
    if (unlikely(IS_PGXC_DATANODE && NeedStubExecution(node))) 
    {
        result = (PlanState*)ExecInitNodeStubNorm(node, estate, e_flags);
    }
    else
    {       
        result = ExecInitNodeByType(node, estate, e_flags);
    }
······
    ExecInitNodeSubPlan(node, estate, result);
······
    return result;
}

该函数用来初始化计划树上的节点，并得到对应的计划状态树，其中计划状态树上的每个状态节点都连接着计划树上对应的计划节点，输入参数 node 指向该计划树的根节点。在代码清单1第7~14行为了初始化计划树上的节点该函数一般会调用 ExecInitNodeByType() ，它的作用是通过判断计划节点的类型来调用专门的初始化函数，每个初始化函数的名字都是由 ExecInit 前缀加该计划节点的类型构成的。而且，因为计划状态树是依据计划树构建的，所以他们有着一样的树形结构，我们看一看 Plan 结构体和 PlanState 结构体就能窥一二：

//代码清单2
//src/include/nodes/plannodes.h
typedef struct Plan {
    NodeTag type;
······
    /*
     * Common structural data for all Plan types.
     */
    List* targetlist;      /* target list to be computed at this node */
    List* qual;            /* implicitly-ANDed qual conditions */
    struct Plan* lefttree; /* input plan tree(s) */
    struct Plan* righttree;
······
} Plan;

//代码清单3
//src/include/nodes/execnodes.h
typedef struct PlanState {
    NodeTag type;

    Plan* plan; /* associated Plan node */

    EState* state; /* at execution time, states of individual
                    * nodes point to one EState for the whole
                    * top-level plan */

    Instrumentation* instrument; /* Optional runtime stats for this node */

    /*
     * Common structural data for all Plan types.  These links to subsidiary
     * state trees parallel links in the associated plan tree (except for the
     * subPlan list, which does not exist in the plan tree).
     */
    List* targetlist;           /* target list to be computed at this node */
    List* qual;                 /* implicitly-ANDed qual conditions */
    struct PlanState* lefttree; /* input plan tree(s) */
    struct PlanState* righttree;
    List* initPlan; /* Init SubPlanState nodes (un-correlated expr subselects) */
    List* subPlan;  /* SubPlanState nodes in my expressions */
······
} PlanState;

前面提到，计划状态树上的状态节点会连接对应的计划树上的计划节点，PlanState 结构体的 plan 成员变量正是用来做这件事的。另外可以看到，它们有两个相同名称的成员变量 lefttree 和 righttree ，虽然在不同的结构体中它们的类型不同，但是它们有着相同的含义，都代表着一棵树的左子树和右子树，这说明了什么？说明计划树和计划状态树都是二叉树。

既然计划树是二叉树，那不可避免地，当我们对根节点初始化完成后，我们还需要对它的左右子树进行初始化操作，不过首先需要执行 ExecInitNodeByType() 中的 switch case 分支结构来选择对应的函数，因为并不是所有种类的计划树节点既有左子树又有右子树，看一看简略的 ExecInitNodeByType() 函数：

//代码清单4
//src/gausskernel/runtime/executor/execProcnode.cpp
PlanState* ExecInitNodeByType(Plan* node, EState* estate, int eflags)
{
    switch (nodeTag(node)) {
        case T_BaseResult:
            return (PlanState*)ExecInitResult((BaseResult*)node, estate, eflags);
        case T_ModifyTable:
            return (PlanState*)ExecInitModifyTable((ModifyTable*)node, estate, eflags);
        case T_Append:
            return (PlanState*)ExecInitAppend((Append*)node, estate, eflags);
······
        case T_NestLoop:
            return (PlanState*)ExecInitNestLoop((NestLoop*)node, estate, eflags);
        case T_MergeJoin:
            return (PlanState*)ExecInitMergeJoin((MergeJoin*)node, estate, eflags);
······
        default:
            ereport(ERROR, (errmodule(MOD_EXECUTOR), errcode(ERRCODE_UNRECOGNIZED_NODE_TYPE), errmsg("unrecognized node type: %d when initializing executor.", (int)nodeTag(node))));
            return NULL; /* keep compiler quiet */
    }
}

例如，一个计划节点是 BaseResult 类型的，那么就调用 ExecInitResult() 函数，这个函数简略表示一下：

//代码清单5
//src/gausskernel/runtime/executor/nodeResult.cpp
ResultState* ExecInitResult(BaseResult* node, EState* estate, int eflags)
{
    /* check for unsupported flags */
    Assert(!(eflags & (EXEC_FLAG_MARK | EXEC_FLAG_BACKWARD)) || outerPlan(node) != NULL);

    /*
     * create state structure
     */
    ResultState* resstate = makeNode(ResultState);
    resstate->ps.plan = (Plan*)node;
    resstate->ps.state = estate;
······
    outerPlanState(resstate) = ExecInitNode(outerPlan(node), estate, eflags);
······
    return resstate;
}

该函数中很重要的一条语句便是代码清单5中第15行的语句，该条语句调用了 outerPlanState() 宏函数以及 ExecInitNode() 函数，与 outerPlanState() 功能相似的函数还有 innerPlanState() 函数：

//代码清单6
//src/include/nodes/execnodes.h
#define innerPlanState(node) (((PlanState*)(node))->righttree)
#define outerPlanState(node) (((PlanState*)(node))->lefttree)

而 ExecInitNode() 函数的第一个参数是 outerPlan(node) ，outerPlan() 函数是什么样的呢？如下：

//代码清单7
//src/include/nodes/plannodes.h
#define innerPlan(node) (((Plan*)(node))->righttree)
#define outerPlan(node) (((Plan*)(node))->lefttree)

可以看到，innerPlan(node)、outerPlan(node) 是指向计划树 node 节点的左子树和右子树的指针。可问题来了，resstate 它不是 PlanState* 类型的指针啊，为什么也可以用 outerPlanState() 函数呢？针对这个问题，可以看一下 ResultState 结构体：

//代码清单8
//src/include/nodes/execnodes.h
typedef struct ResultState {
    PlanState ps; /* its first field is NodeTag */
    ExprState* resconstantqual;
    bool rs_done;      /* are we done? */
    bool rs_checkqual; /* do we need to check the qual? */
} ResultState;

因为 ResultState 结构体的第一个成员变量是 PlanState 类型的，所以在 outerPlanState() 里将resstate 强转为 PlanState* 类型的指针，然后直接引用成员变量 ps 的成员变量 lefttree 而不引用 ps 是可行的。

因此，代码清单5中第15行的语句的作用就很明显了。由于该计划节点 node 是 BaseResult 类型的，所以它没有右子树，因此我们将调用 ExecInitNode() 对 node 的左子树进行处理，之后当该函数返回指向相应的状态节点的指针时，我们用 resstate 的 lefttree 域去接收，最后将 resstate 返回给调用者。直白地说，计划状态树是在遍历计划树的过程中动态地递归地被创建出来的，递归性体现在 ExecInitNode() 上。

当然，前面提到并不是所有种类的计划树节点既有左子树又有右子树，所以有的种类的计划树节点是既有左子树又有右子树的，以代码清单4中 NestLoop 类型的计划节点为例，此时需要调用 ExecInitNestLoop() 函数：

//代码清单9
//src/gausskernel/runtime/executor/nodeNestloop.cpp
NestLoopState* ExecInitNestLoop(NestLoop* node, EState* estate, int eflags)
{
······
    /*
     * create state structure
     */
    NestLoopState* nlstate = makeNode(NestLoopState);
    nlstate->js.ps.plan = (Plan*)node;
    nlstate->js.ps.state = estate;
······
    outerPlanState(nlstate) = ExecInitNode(outerPlan(node), estate, eflags);
······
    innerPlanState(nlstate) = ExecInitNode(innerPlan(node), estate, eflags);
······
    return nlstate;
}

从代码清单9第10行，我们可以简略地看出我们是先对根节点 node 进行转化的，当我们得到对应的状态节点 nlstate 后，我们再在第13行调用 ExecInitNode() 对 node 节点的左子树进行转化，在第15行调用 ExecInitNode() 对 node 节点的右子树进行转化，这样一看，我们遍历计划树创建计划状态树的遍历顺序是这样的：根节点-->左子树-->右子树，妥妥的先序遍历，所以我们是先序创建了计划状态树。

ExecProcNode()

//代码清单10
//src/gausskernel/runtime/executor/execProcnode.cpp
TupleTableSlot* ExecProcNode(PlanState* node)
{
    TupleTableSlot* result = NULL;
······
#ifdef ENABLE_MULTIPLE_NODES
    if (unlikely(planstate_need_stub(node))) {
        result = ExecProcNodeStub(node);
    } else
#endif
    {
        int index = (int)(nodeTag(node))-T_ResultState;
        Assert(index >= 0 && index <= T_StreamState - T_ResultState);
        result = g_execProcFuncTable[index](node);
    }

    if (node->instrument != NULL) {
        ExecProcNodeInstr(node, result);
    }
······
    return result;
}

ExecProcNode() 的作用很简单，它用来执行计划状态树的节点并按要求返回一个元组。然而，具体实现这个功能的是 ExecProcNodeStub() 或 g_execProcFuncTable[]() ，后者是装有多个函数指针的数组，通过指定的下标就可以调用特定的处理函数。我目前还没有找到这两个函数在哪个文件里，但通过查看 PostgreSQL 的 ExecProcNode() 的源码以及它的介绍，我觉得 openGauss 的 ExecProcNode() 虽然与 PostgreSQL 的有较大差别，但机理应该是一样的，都是从计划状态树地根节点获取数据，上层节点为了能够完成自己的处理将会递归调用 ExecProcNode() 来从下层节点获取输入数据，最后进行选择条件的运算和投影运算，并向更上层的节点返回指向结果元组的指针。另外，我们也可以通过 ExecProcNodeByType() 推测 openGauss 也采用这种方法来处理计划状态树地方法。

ExecProcNodeByType() 是这样的：

//代码清单11
//src/gausskernel/runtime/executor/execProcnode.cpp
TupleTableSlot* ExecProcNodeByType(PlanState* node)
{
    TupleTableSlot* result = NULL;
    switch (nodeTag(node)) {
        case T_ResultState:
            return ExecResult((ResultState*)node);
······
        case T_NestLoopState:
            return ExecNestLoop((NestLoopState*)node);
        case T_MergeJoinState:
            return ExecMergeJoin((MergeJoinState*)node);
······
        default:
            ereport(ERROR, (errmodule(MOD_EXECUTOR), errcode(ERRCODE_UNRECOGNIZED_NODE_TYPE), errmsg("unrecognized node type: %d when executing executor node.", (int)nodeTag(node))));
            return NULL;
    }
}

它在结构上简直和 ExecInitNodeByType() 一模一样，不过 ExecProcNode() 中并没有出现这个函数，然而，一种可能的情况是 ExecProcNodeStub() 或 g_execProcFuncTable[]() 中调用了这个函数。这使得 ExecProcNode() 仍然可以通过判断计划状态树上的状态节点的类型来选择特定的执行函数，另外，还记得前面提到的计划树与有它而构造出来的计划状态树有一样的二叉树结构吗？在以上假设均成立的情况下，可以推测，ResultState 类型的状态节点只有左子树，NestLoopState 类型的状态节点有左右子树，这里只是举这两个例子。之后，在 ExecResult() 和 ExecNestLoop() 中又将调用 ExecProcNode() 函数去处理 node 的左子树或及右子树上的所有状态节点。这里采用的是一种中序遍历的方法，而不是像 ExecInitNode() 那样的先序遍历方法，因为在我们这种假设中结果是一层层沿树枝向上传递的，并且只有当遍历到叶子节点时这种遍历才会停下，同时左边的叶子节点是更先遍历到的，毕竟有些节点没有右子树，所以默认的顺序应当是左节点要比右节点更先被遍历到。

ExecEndNode()

//代码清单12
//src/gausskernel/runtime/executor/execProcnode.cpp
void ExecEndNode(PlanState* node)
{
······
    ExecEndNodeByType(node);
}

ExecEndNode() 的作用是对计划状态树进行清理，当执行器处理完所有能够获得的元组之后，最终由这个函数来负责善后工作。当然，最后的担子还是落到了 ExecEndNodeByType() 上：

//代码清单13
//src/gausskernel/runtime/executor/execProcnode.cpp
static void ExecEndNodeByType(PlanState* node)
{
    switch (nodeTag(node)) {
        case T_ResultState:
            ExecEndResult((ResultState*)node);
            break;
······
        case T_NestLoopState:
            ExecEndNestLoop((NestLoopState*)node);
            break;

        case T_MergeJoinState:
            ExecEndMergeJoin((MergeJoinState*)node);
            break;
······
        default:
            ereport(ERROR, (errmodule(MOD_EXECUTOR), errcode(ERRCODE_UNRECOGNIZED_NODE_TYPE), errmsg("unrecognized node type: %d when ending executor.", (int)nodeTag(node))));
            break;
    }
}

可以看到，ExecEndNodeByType() 也是一个选择函数，具体的清理工作是交给其它特定的函数的。通过在这些特定函数里又再次调用 ExecEndNode() 函数，这就完成了一个遍历过程，按理说应该是中序遍历。因为按中序遍历的方式来释放节点会更加地方便，不必像先序遍历那样要多定义几个变量来保存它的左子树和右子树。

总结

ExecInitNode()、ExecProcNode()、ExecEndNode() 函数可以说是执行模块中十分接近元组操作的接口函数了，利用 ExecInitNode() ，我们将由前一个模块传递过来的计划树转化得到了对应的计划状态树，在之后的主调函数中又调用到了 ExecProcNode() ，用它来执行计划状态树中的状态节点并返回其中存储的元组，待将所有状态节点执行完毕后，在后续调用过程中调用了 ExecEndNode() 来释放所有的状态节点。这三个函数也是 Node 模块相对于 Plan 模块的接口函数，Plan 模块又和 Executor 模块有着紧密联系，它们都是整个查询执行模块的子模块。