对execMain.cpp的解析(一)

奔走的月光

已于 2023-03-31 23:12:24 修改

阅读量284

点赞数

分类专栏： openGauss 文章标签： java 数据库前端

于 2022-09-08 16:31:00 首次发布

本文链接：https://blog.csdn.net/m0_60340015/article/details/126682236

版权

openGauss 专栏收录该内容

26 篇文章 2 订阅

订阅专栏

源码链接

https://www.gitlink.org.cn/Eao3piq4e/openGauss-server/tree/master/src%2Fgausskernel%2Fruntime%2Fexecutor%2FexecMain.cpp

概述

Exexutor 模块是查询执行模块的核心，包括表达式计算、数据定义处理以及行级执行算子，而 Executor 模块的启动函数、运行函数和终止函数等均在这个文件中，所以在这篇博客我将解析该文件中比较重要的几个函数。

解析

ExecutorRun()

//代码清单1
//src/gausskernel/runtime/executor/execMain.cpp
void ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count)
{
    /* sql active feature, opeartor history statistics */
    int instrument_option = 0;
    bool has_track_operator = false;
    char* old_stmt_name = u_sess->pcache_cxt.cur_stmt_name;
    u_sess->statement_cxt.executer_run_level++;
    if (u_sess->SPI_cxt._connected >= 0) {
        u_sess->pcache_cxt.cur_stmt_name = NULL;
    }
    exec_explain_plan(queryDesc);
······
    if (ExecutorRun_hook) {
        (*ExecutorRun_hook)(queryDesc, direction, count);
    } else {
        standard_ExecutorRun(queryDesc, direction, count);
    }
······
}

该函数是运行 Executor 模块时的入口函数，其中代码清单1中第15~19行的 if - else 判断语句是该函数的的核心语句块，我们来解析一下这个语句块。ExecutorRun_hook 是什么呢？它是可能存在的钩子函数 ExecutorRun_hook() 的内存地址。如果它存在，那么我们就用这个钩子函数作为执行器运行的主体，否则就采用默认的执行器运行的主体 standard_ExecutorRun() 函数。关于钩子函数的地址 ExecutorRun_hook ，这个指针变量也在该文件中被定义：

//代码清单2
//src/gausskernel/runtime/executor/execMain.cpp
THR_LOCAL ExecutorRun_hook_type ExecutorRun_hook = NULL;

openGauss 对原来的 PostgreSQL 中进程内的全局变量添加 THR_LOCAL 定义为线程的局部变量，避免线程之间误用，所以 THR_LOCAL 保证了线程的独立性，当利用多个线程执行多个计划时不会产生冲突。ExecutorRun_hook_type 这个用来定义存储钩子函数地址的变量的类型具体为：

//代码清单3
//src/include/executor/executor.h
/* Hook for plugins to get control in ExecutorRun() */
typedef void (*ExecutorRun_hook_type)(QueryDesc* queryDesc, ScanDirection direction, long count);
extern THR_LOCAL PGDLLIMPORT ExecutorRun_hook_type ExecutorRun_hook;

PGDLLIMPORT 是一个宏，具体为：

//代码清单4
//src/include/port/win32.h
/* defines for dynamic linking on Win32 platform */
#if defined(WIN32) || defined(__CYGWIN__)

#if __GNUC__ && !defined(__declspec)
#error You need egcs 1.1 or newer for compiling!
#endif

#ifdef BUILDING_DLL
#define PGDLLIMPORT __declspec(dllexport)
#else /* not BUILDING_DLL */
#define PGDLLIMPORT __declspec(dllimport)
#endif

#ifdef _MSC_VER
#define PGDLLEXPORT __declspec(dllexport)
#else
#define PGDLLEXPORT
#endif
#else /* not CYGWIN, not MSVC, not MingW */
#define PGDLLIMPORT
#define PGDLLEXPORT
#endif

先来看看代码清单4中第4行，这指的是判断这个 RDBMS 是否是在 Windows 系统上运行，如果是，那么执行第6~20行的语句块，否则执行第22~23行的语句块。不过很显然，openGauss 目前仅支持 Linux 系统，是不支持 Windows 系统的，由于代码经 PostgreSql 源码改造而来，所以 openGauss 源码也或多或少地保留了原先的风格。既然这样，那么执行的便是第22~23行的语句块中定义的宏生效，换句话说，PGDLLIMPORT 这个宏在 openGauss 中不起作用，形同虚设，不必管它。

还有一个问题，那就是 PGDLLIMPORT 的本意是什么呢？我来拆解一下这个字符串，"PG" 代表 PostgreSql ，"DLL" 代表动态链接库，"IMPORT" 取的是 “导入” 的意思，总的来说，就是从 PG 动态库中导入什么东西，从代码清单3我们可以很清楚地知道它是为了导入一个函数指针赋值给 ExecutorRun_hook 。但是，我们已经知道了 PGDLLIMPORT 在这里跟一个空格的作用没多大区别，所以从代码清单3中我们可以知道，ExecutorRun_hook 的类型是 void* ，它的值为 NULL，因为我们使用的是 Linux 系统，PGDLLIMPORT 不起作用，没办法为 ExecutorRun_hook 导入函数指针的值。

回到代码清单1，此时我们就应该很清楚 if - else 判断语句的执行情况了。由于 ExecutorRun_hook 为 NULL，所以 ExecutorRun() 函数调用的函数是 standard_ExecutorRun() ，至此，关于 ExecutorRun() 这个函数在变量引用方面的疑惑就都解决了。

关于钩子，我建议看这篇文章：PostgreSQL hook机制

standard_ExecutorRun()

该函数是执行器运行时实际被调用的函数，它的源码如下：

//代码清单5
//src/gausskernel/runtime/executor/execMain.cpp
void standard_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count)
{
    EState *estate = NULL;
    CmdType operation;
    DestReceiver *dest = NULL;
    bool send_tuples = false;
    MemoryContext old_context;
    instr_time starttime;
    double totaltime = 0;

    /* sanity checks */
    Assert(queryDesc != NULL);
    estate = queryDesc->estate;
    Assert(estate != NULL);
    Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));

    /*
     * Switch into per-query memory context
     */
    old_context = MemoryContextSwitchTo(estate->es_query_cxt);
······
    /*
     * extract information from the query descriptor and the query feature.
     */
    operation = queryDesc->operation;
    dest = queryDesc->dest;
······
    /*
     * run plan
     */
    if (!ScanDirectionIsNoMovement(direction)) {
        if (queryDesc->planstate->vectorized) {
            ExecuteVectorizedPlan(estate, queryDesc->planstate, operation, send_tuples, count, direction, dest);
        } else {
#ifdef ENABLE_MOT
            ExecutePlan(estate, queryDesc->planstate, operation, send_tuples,
                count, direction, dest, queryDesc->mot_jit_context);
#else
            ExecutePlan(estate, queryDesc->planstate, operation, send_tuples, count, direction, dest);
#endif
        }
    }
······
    (void)MemoryContextSwitchTo(old_context);
}

该函数的第一个参数 queryDesc 是 QueryDesc* 类型的指针，它用来指向存储了执行器执行查询所需要的所有信息的内存区域，是执行器运行过程中最重要的一个变量。QueryDesc 结构体的样式为：

//代码清单6
//src/include/executor/exec/execdesc.h
typedef struct QueryDesc {
    CmdType operation;            /* CMD_SELECT, CMD_UPDATE, etc. */
    PlannedStmt* plannedstmt;     /* planner's output, or null if utility */
    Node* utilitystmt;            /* utility statement, or null */
    const char* sourceText;       /* source text of the query */
    Snapshot snapshot;            /* snapshot to use for query */
    Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
    DestReceiver* dest;           /* the destination for tuple output */
    ParamListInfo params;         /* param values being passed in */
    int instrument_options;       /* OR of InstrumentOption flags */
    TupleDesc tupDesc;    /* descriptor for result tuples */
    EState* estate;       /* executor's query-wide state */
    PlanState* planstate; /* tree of per-plan-node state */
    struct Instrumentation* totaltime; /* total time spent in ExecutorRun */
    bool executed;                     /* if the query already executed */
#ifdef ENABLE_MOT
    JitExec::JitContext* mot_jit_context;   /* MOT JIT context required for executing LLVM jitted code */
#endif
} QueryDesc;

代码清单6中，QueryDesc 全称为 " query descriptor "，顾名思义，就是查询描述符，第4~12行定义在 QueryDesc 结构体内部的这些成员变量被提供给 CreateQueryDesc() 使用，并在创建一个 QueryDesc 结构体变量时初始化这些成员变量，其中成员变量 plannedstmt 指向将要被执行的计划树，用于针对于数据表所有元组的增删改查操作。第13~15行定义的成员变量在 ExecutorStart() 中被设置，第16行的成员变量总是被核心系统设置为 NULL ，不过可以被插件修改。第17行的成员变量标识这条查询语句是否已经被执行，第19行的成员变量在 openGauss 启用 ENABLE_MOT 时生效，它是 JitExec 命名空间下定义的一个指针，该指针指向存储着执行 LLVM JIT 代码所需的MOT JIT 上下文的内存区域。JitExec 命名空间定义如下：

//代码清单7
//src/include/executor/exec/execdesc.h
#ifdef ENABLE_MOT
// forward declaration
namespace JitExec
{
    struct JitContext;
}
#endif

回到代码清单5，第5行定义的变量 estate 在第15行被赋予查询描述符中存储着的执行器状态，EState 结构体如下：

//代码清单8
//src/include/nodes/execnodes.h
typedef struct EState {
    NodeTag type;
    /* Basic state for all query types: */
    ScanDirection es_direction;      /* current scan direction */
    Snapshot es_snapshot;            /* time qual to use */
    Snapshot es_crosscheck_snapshot; /* crosscheck time qual for RI */
    List* es_range_table;            /* List of RangeTblEntry */
    PlannedStmt* es_plannedstmt;     /* link to top of plan tree */
    JunkFilter* es_junkFilter; /* top-level junk filter, if any */
······
    /* Stuff used for firing triggers: */
    List* es_trig_target_relations;      /* trigger-only ResultRelInfos */
    TupleTableSlot* es_trig_tuple_slot;  /* for trigger output tuples */
    TupleTableSlot* es_trig_oldtup_slot; /* for TriggerEnabled */
    TupleTableSlot* es_trig_newtup_slot; /* for TriggerEnabled */
······
#ifdef ENABLE_MOT
    JitExec::JitContext* mot_jit_context;   /* MOT JIT context required for executing LLVM jitted code */
#endif
    PruningResult* pruningResult;
} EState;

standard_ExecutorRun() 的第二个参数 direction 最终会作为被调用的 ExecutePlan() 的一个参数，在该函数内被赋给 estate 的成员变量 es_direction ，以标识当前扫描的方向。standard_ExecutorRun() 的第三个参数 count 则指定了目标表的元组个数，作为参数传递给 ExecutePlan() 。

代码清单5第6行定义的变量 operation 在第27行被赋予查询描述符中存储着的将被执行的语句的类型(INSERT、SELECT等)，另外 CmdType 类型为：

//代码清单9
//src/include/nodes/nodes.h
typedef enum CmdType {
    CMD_UNKNOWN,
    CMD_SELECT,  /* select stmt */
    CMD_UPDATE,  /* update stmt */
    CMD_INSERT,  /* insert stmt */
    CMD_DELETE,  /* delete stmt */
    CMD_MERGE,   /* merge stmt */
    CMD_UTILITY, /* cmds like create, destroy, copy, vacuum,
                  * etc. */
    CMD_PREPARE,  /* prepare stmt */
    CMD_DEALLOCATE,  /* deallocate stmt*/
    CMD_EXECUTE,  /* execure stmt*/
    CMD_TRUNCATE,  /* truncate table*/
    CMD_REINDEX,  /* reindex table/index*/
    CMD_NOTHING,  /* dummy command for instead nothing rules
                   * with qual */
    CMD_DDL,
    CMD_DCL,
    CMD_DML,
    CMD_TCL
} CmdType;

第7行的 dest 是 DestReceiver* 类型的指针，指向一个存储着接受特定目标的函数集合的内存区域，这些函数也可以称为元组接收器对象，它们的存在最终是为了向指定的目的地发送元组，因为每当执行器执行返回元组的查询时，结果必须在某个地方，这个地方就是目的地，它可以是 stdout (如显示屏) 或者远程进程，也可以将这些结果直接抛弃不做处理。DestReceiver 结构体如下：

//代码清单10
//src/include/tcop/dest.h
typedef struct _DestReceiver DestReceiver;
struct TupleTableSlot;
struct _DestReceiver {
    /* Called for each tuple to be output: */
    void (*receiveSlot)(TupleTableSlot* slot, DestReceiver* self);
    /* Per-executor-run initialization and shutdown: */
    void (*rStartup)(DestReceiver* self, int operation, TupleDesc typeinfo);
    void (*rShutdown)(DestReceiver* self);
    /* Destroy the receiver object itself (if dynamically allocated) */
    void (*rDestroy)(DestReceiver* self);
    /* CommandDest code for this receiver */
    CommandDest mydest;
    /* Private fields might appear beyond this point... */

    /* Send batch*/
    void (*sendBatch)(VectorBatch* batch, DestReceiver* self);

    void (*finalizeLocalStream)(DestReceiver* self);

    /* send sample tuple to coordinator for analyze */
    bool forAnalyzeSampleTuple;

    MemoryContext tmpContext;
};

可以看到，DestReceiver 结构体变量的成员变量基本上都是函数指针，这就是前面所说的函数集合。

代码清单5第8行定义的 bool 变量 send_tuples 用来标识是否需要将得到的元组发送到某个某个目的地，第9行定义的变量 old_context 在第22、46行被引用，用到了 MemoryContextSwitchTo() 函数：

//代码清单11
//src/include/utils/palloc.h
static inline MemoryContext MemoryContextSwitchTo(MemoryContext context)
{
    MemoryContext old = CurrentMemoryContext;

    CurrentMemoryContext = context;
    return old;
}

所以在第22行得到了原先的内存上下文，第46行进行还原，关于 MemoryContext (内存上下文) 如果想进一步了解可以看一下这篇博客：MemoryContext内存管理。

代码清单5第35~40行代码是 standard_ExecutorRun() 函数的核心，调用了 ExecutePlan() 或 ExecuteVectorizedPlan() 函数来得到 count 个元组。

总结

外界调用执行器的入口函数是 ExecutorRun() ，然后在下一层调用 standard_ExecutorRun() ，该函数中用到了很多的结构体，有存储了执行器执行查询所需要的全部信息的 QueryDesc 结构体，有用来记录执行器状态的 EState 结构体，有用来存储元组接收器对象的 DestReceiver 结构体，以及其它没有在本篇博客的结构体，这些结构体往往有着包含和被包含的关系，这有一点像 C++ 里面对象的继承机制。之后，当获取到足够的数据时，就在 standard_ExecutorRun() 内部调用 ExecutePlan() 或 ExecuteVectorizedPlan() 来执行查询计划，然后在内部调用 ExecuteProcNode() 根据查询计划树中的各个节点类型 (控制节点、扫描节点、连接节点、物化节点) 分别执行相应的操作。