查询代码-advisor解析

最新推荐文章于 2024-11-01 12:20:35 发布

shmilyyiyi

最新推荐文章于 2024-11-01 12:20:35 发布

阅读量880

点赞数 20

文章标签：数据库开源

本文链接：https://blog.csdn.net/shmilyyiyi/article/details/135623281

版权

本文详细介绍了OpenGauss数据库中的index_advisor模块，关注其在提供索引优化建议、性能分析和并发控制方面的功能，强调了优化查询性能和利用智能索引的重要性。

摘要由CSDN通过智能技术生成

文件路径：

src/gausskernel/dbmind/kernel/index_advisor.cpp

1 引言

数据库系统在当代软件开发中扮演着至关重要的角色，而 OpenGauss 作为一款开源的关系型数据库管理系统，为用户提供了高性能、可扩展、安全可靠的数据库解决方案。其中，index_advisor 模块作为 OpenGauss 数据库的重要组成部分，专注于提供有针对性的索引优化建议，以改进数据库查询性能。

1.1 OpenGauss

OpenGauss 是由华为发起的开源数据库项目，旨在为用户提供一流的性能和可靠性。该数据库系统兼容 PostgreSQL，并在此基础上进行了多方面的增强和优化。 OpenGauss 的设计目标包括高性能、高可用性、高兼容性、高可扩展性和高安全性。

1.2 index_advisor模块

数据库中的查询性能往往直接影响了应用程序的效率，而索引则是提高查询性能的关键。 index_advisor 模块作为 OpenGauss 中的重要模块，专注于优化查询语句的性能，特别是通过智能的索引建议来改进查询速度。该模块通过分析 SQL 查询语句，识别潜在的索引优化机会，并生成相应的建议。这些建议可以包括创建新的索引、删除无用的索引以及修改现有索引的优化建议。

2 模块概览

2.1 代码结构

全局变量：

包括 t_thrd.index_advisor_cxt.stmt_target_list、t_thrd.index_advisor_cxt.stmt_table_list 等，用于存储查询目标和表格信息。

TargetCell 结构体：

用于表示查询目标中的列信息。包含列名、别名等信息。

IndexCell 结构体：

用于表示索引信息。包含索引名称、索引键列表等。

TableCell 结构体：

用于表示表格信息，包括表名、别名、索引列表等。

索引应用函数：

包括 find_or_create_clmncell、add_index_from_field、check_joined_tables 等，负责查询目标和表格的创建、索引的添加、连接表的检查等。

2.2 核心功能

索引建议生成：通过分析查询语句和表结构，模块能够生成索引建议，以优化查询性能。
表和字段信息处理：通过解析查询语句，获取表和字段的元数据，帮助生成索引建议。
驱动表选择：在多表关联查询中，选择最适合作为驱动表的表，以提高整体查询效率。

3. 代码分析（摘选）

TargetCel

作为代码中的一个结构体，用于表示数据库查询中的目标列信息。由于可能存在没有别名的情况，使用时需要注意检查 alias_name 是否为空，以避免潜在的空指针引用。

typedef struct {

    char *alias_name;

    char *column_name;

} TargetCell;

caculate_field_cardinality

一个计算字段基数（cardinality）的函数，其主要目的是通过对表中字段的样本数据进行查询，估算字段的基数。

uint4 calculate_field_cardinality(char *schema_name, char *table_name, const char *field_expr){
    const int sample_factor = 2;


    uint4 cardinality;

    uint4 table_count = get_table_count(schema_name, table_name);

    uint4 sample_rows = (table_count / sample_factor) > MAX_SAMPLE_ROWS ? MAX_SAMPLE_ROWS : (table_count / sample_factor);

    StringInfoData query_string;

    initStringInfo(&query_string);

    appendStringInfo(&query_string, "select count(*) from ( select * from %s.%s limit %d) where %s",

        schema_name, table_name, sample_rows, field_expr);

    StmtResult *result = execute_stmt(query_string.data, true);

    pfree_ext(query_string.data);

    uint4 row = tuple_to_uint(result->tuples);

    (*result->pub.rDestroy)((DestReceiver *)result);

    cardinality = row == 0 ? sample_rows : (sample_rows / row);

    return cardinality; }

1）采样因子和表行数获取

引入了采样因子 sample_factor，其作用是在后续的估算中调整样本数量。通过调用 get_table_count 函数，获取指定表的总行数 table_count。

2）确定采样行数

通过简单的计算，确定实际采样的行数 sample_rows，并通过条件判断确保其不超过预定义的最大采样行数 MAX_SAMPLE_ROWS。这一步骤旨在保持采样的有效性和高效性。

3）构建查询字符串

使用 StringInfoData 结构构建了一个 SQL 查询字符串，该字符串通过在表上应用 LIMIT 子句和给定的字段条件，获取了采样行的相关信息。

4）执行查询和处理结果

调用 execute_stmt 函数，执行构建查询，并获取查询结果。随后，通过 tuple_to_uint 函数将结果转换为 uint4 类型的行数 row。在获取结果后，通过调用结果的销毁函数释放相关资源。

5）计算字段基数

避免除零错误，确保结果不为负数。

6）返回计算结果

get_partition_key_name

收集分区表或子分区表的分区键和子分区键信息，并将其存储在 table 结构中的相应列表中。

void get_partition_key_name(Relation rel, TableCell *table, bool is_subpartition)
{
    int partkey_column_n = 0;
    int2vector *partkey_column = NULL;
    partkey_column = GetPartitionKey(rel->partMap);
    partkey_column_n = partkey_column->dim1;
    for (int i = 0; i < partkey_column_n; i++) {
        table->partition_key_list =
            lappend(table->partition_key_list, get_attname(rel->rd_id, partkey_column->values[i]));
    }
    if (is_subpartition) {
        List *partOidList = relationGetPartitionOidList(rel);
        Assert(list_length(partOidList) != 0);
        Partition subPart = partitionOpen(rel, linitial_oid(partOidList), NoLock);
        Relation subPartRel = partitionGetRelation(rel, subPart);
        int subpartkey_column_n = 0;
        int2vector *subpartkey_column = NULL;
        subpartkey_column = GetPartitionKey(subPartRel->partMap);
        subpartkey_column_n = subpartkey_column->dim1;
        for (int i = 0; i < subpartkey_column_n; i++) {
            table->subpartition_key_list =
                lappend(table->subpartition_key_list, get_attname(rel->rd_id, subpartkey_column->values[i]));
        }
        releaseDummyRelation(&subPartRel);
        partitionClose(rel, subPart, NoLock);
    }
}

1）获取分区键信息：

首先声明变量 partkey_column_n 用于存储分区键的列数，以及指针 partkey_column 用于存储分区键的列信息。通过 GetPartitionKey 函数获取了分区键列信息，并通过 dim1 成员获取了列数。

2）遍历分区键信息：

循环遍历分区键的每一列，通过 get_attname 函数获取每个列的名称，然后使用 lappend 函数将列名添加到 table->partition_key_list （存储分区键信息的链表）中。

3）处理子分区键信息：

检查是否为子分区表。如果是，它会获取所有子分区的 Oid 列表，然后打开第一个子分区。它获取子分区键的列信息，遍历每一列，将列名添加到 table->subpartition_key_list 中。最后释放关联的子分区表，关闭子分区。

is_tmp_tabe

使用迭代方式逐个比较表名，检查给定的表名是否出现在临时表列表中。

bool is_tmp_table(const char *table_name)
{
    ListCell *item = NULL;

    foreach (item, g_tmp_table_list) {
        char *tmp_table_name = (char *)lfirst(item);
        if (strcasecmp(tmp_table_name, table_name) == 0) {
            return true;
        }
    }

    return false;
}

1）迭代检查临时表列表：

使用 foreach 循环遍历全局变量 g_tmp_table_list 中的每个表名。

2）比较表名是否匹配：

获取当前临时表列表中的表名，并使用 strcasecmp 函数（不区分大小写）与给定的表名进行比较。如果找到匹配的临时表，函数返回 true 表示是临时表。

find_or_create_tbcell

在全局表格列表中查找或创建表格信息的功能，为后续的索引建议模块提供表格信息的管理和访问接口。

TableCell *find_or_create_tblcell(char *table_name, char *alias_name, char *schema_name, bool ispartition,
    bool issubpartition)
{
    if (!table_name) {
        return NULL;
    }
    if (is_tmp_table(table_name)) {
        ereport(WARNING, (errmsg("can not advise for table %s because it is a temporary table.", table_name)));
        return NULL;
    }

    // seach the table among existed tables
    ListCell *item = NULL;
    ListCell *sub_item = NULL;

    if (t_thrd.index_advisor_cxt.stmt_table_list != NIL) {
        foreach (item, t_thrd.index_advisor_cxt.stmt_table_list) {
            TableCell *cur_table = (TableCell *)lfirst(item);
            char *cur_schema_name = cur_table->schema_name;
            char *cur_table_name = cur_table->table_name;
            if (IsSameRel(cur_schema_name, cur_table_name, schema_name, table_name)) {
                if (alias_name) {
                    foreach (sub_item, cur_table->alias_name) {
                        char *cur_alias_name = (char *)lfirst(sub_item);
                        if (strcasecmp(cur_alias_name, alias_name) == 0) {
                            return cur_table;
                        }
                    }
                    cur_table->alias_name = lappend(cur_table->alias_name, alias_name);
                }
                return cur_table;
            }
            foreach (sub_item, cur_table->alias_name) {
                char *cur_alias_name = (char *)lfirst(sub_item);
                if (IsSameRel(cur_schema_name, cur_alias_name, schema_name, table_name)) {
                    return cur_table;
                }           
            }
        }
    }

    RangeVar* rtable = makeRangeVar(schema_name, table_name, -1);
    Oid table_oid = RangeVarGetRelid(rtable, NoLock, true);
    if (table_oid == InvalidOid || check_relation_type_valid(table_oid) == false) {
        ereport(WARNING, (errmsg("can not advise for table %s due to invalid oid or irregular table.", table_name))); 
        return NULL;                                                                
    }

    // create a new table
    TableCell *new_table = NULL;
    new_table = (TableCell *)palloc0(sizeof(*new_table));
    if (schema_name == NULL) {
        new_table->schema_name = get_namespace_name(get_rel_namespace(table_oid));
    } else {
        new_table->schema_name = schema_name;
    }
    new_table->table_name = table_name;
    new_table->alias_name = NIL;
    if (alias_name) {
        new_table->alias_name = lappend(new_table->alias_name, alias_name);
    }
    new_table->index = NIL;
    new_table->join_cond = NIL;
    new_table->index_print = NIL;
    new_table->partition_key_list = NIL;
    new_table->subpartition_key_list = NIL;
    new_table->ispartition = ispartition;
    new_table->issubpartition = issubpartition;
    // set the partition key of the partition table, including partition and subpartition
    Relation rel = heap_open(table_oid, AccessShareLock);
    if RelationIsPartitioned(rel) {
        if RelationIsSubPartitioned(rel) {
            get_partition_key_name(rel, new_table, true);
        } else {
            get_partition_key_name(rel, new_table);
        }
    }
    heap_close(rel, AccessShareLock);
    t_thrd.index_advisor_cxt.stmt_table_list = lappend(t_thrd.index_advisor_cxt.stmt_table_list, new_table);
    return new_table;
}

1）参数验证与初步检查

检查 table_name 是否为空。若为空，函数直接返回 NULL，表示参数错误。接着，通过 is_tmp_table 函数判断是否为临时表，如果是，则触发警告并同样返回 NULL。确保函数的输入合法性。

2）在已存在的表格信息列表中查找匹配项

通过遍历 stmt_table_list，即已存在的表格信息列表，来查找是否存在符合给定条件的表格信息。遍历过程使用 foreach 循环，逐一检查每个表格信息的 schema_name 和 table_name 是否与传入的相匹配。如果找到匹配的表格信息，则继续检查是否存在别名 alias_name。如果存在别名，则再次遍历已存在的别名列表，查找是否有匹配的别名。

如果找到匹配的别名，则直接返回该表格信息。

如果未找到匹配的别名，将新的别名添加到别名列表中。如果未找到匹配的表格信息，则进入下一步。

3）构造表的范围变量与获取表的 OID

构造表的范围变量 rtable，使用 makeRangeVar 函数。通过 RangeVarGetRelid 获取表的 OID。如果表的 OID 无效，或者 check_relation_type_valid 检查发现表的类型不合规，函数触发警告并返回 NULL。这一步保证了要操作的表格是有效的。

4）创建新的表格信息

如果前面的步骤都未找到匹配的表格信息，说明需要创建新的表格信息。先通过 palloc0 分配内存空间，然后设置新表格的 schema_name、table_name、alias_name 等基本属性。如果传入的 schema_name 为空，则获取表的命名空间设置 schema_name。接着， heap_open 打开表，根据是否为分区表获取相应的分区键信息，将新表格信息添加到 stmt_table_list 中。

5）返回结果

返回新创建的表格信息或者在已存在的表格信息中找到的匹配项。

check_joined_tabes

检查是否有多个已连接的表存在。

static bool check_joined_tables()
{
    return (g_drived_tables && g_drived_tables->length > 1);
}

add_index_from_group_order

从 GROUP BY 或 ORDER BY 子句中提取字段，并为给定表格创建相应的索引。在处理过程中，会涉及到对表达式的转换和提取字段信息的操作。函数的实现考虑了 GROUP BY 和 ORDER BY 的不同情况，使其更加通用。

void add_index_from_group_order(TableCell *table, List *clause, List *target_list, bool flag_group_order)
{
    ListCell *item = NULL;
    char *schema_name = NULL, *table_name = NULL, *index_name = NULL;

    foreach (item, clause) {
        Node *node = NULL;
        List *fields = NULL;

        if (flag_group_order) {
            node = (Node *)lfirst(item);
        } else {
            node = ((SortBy *)lfirst(item))->node;
        }
        if (nodeTag(node) == T_A_Const) {
            node = transform_group_order_node(node, target_list);
        }
        if (nodeTag(node) != T_ColumnRef)
            break;
        fields = ((ColumnRef *)node)->fields;
        split_field_list(fields, &schema_name, &table_name, &index_name);
        add_index(table, index_name);
    }
}

1）迭代处理表达式列表：

使用 foreach 循环遍历给定的表达式列表。

2）获取表达式节点：

根据 flag_group_order 标志选择要处理的节点。

如果是 GROUP BY，直接获取节点；如果是 ORDER BY，从 SortBy 结构中获取节点。

3）转换 A_Const 节点：

如果节点的类型是 A_Const，则调用 transform_group_order_node 函数进行转换。该步骤可能涉及到对目标列表的处理。

4）判断节点类型并提取字段信息：

判断节点类型是否为 ColumnRef，如果不是则跳出循环。

提取 ColumnRef 中的字段信息，包括模式名、表名和索引名。

5）调用 add_index 函数添加索引：

将提取到的索引名添加到表格的索引列表中。