doris查询性能优化

qq_57106792

已于 2023-12-26 16:42:45 修改

阅读量2.3k

点赞数 6

文章标签：数据库 java 服务器

于 2023-12-26 16:26:34 首次发布

本文链接：https://blog.csdn.net/qq_57106792/article/details/135225336

版权

doris查询性能优化

一、执行计划的查询（分析基础）

(1)首先要开启profile

set enable_profile=true;

(2)执行所要查询的sql

(3)查询当前sql的profile

show query profile "/";（找到对应的sql）

(4)获取QueryId,查询总的执行计划

示例:show query profile "/271dc937f9564af0-9ec7ce5755d88c66";

以下是每个节点的执行时间概览

结果示例

┌────────────────────────┐

│[-1: VDataBufferSender] │

│Fragment: 0 │

│MaxActiveTime: 264.641ms│

└────────────────────────┘

│

┌──────────────────────┐

│[6: VAGGREGATION_NODE]│

│Fragment: 0 │

└──────────────────────┘

│

┌───────────────────┐

│[5: VEXCHANGE_NODE]│

│Fragment: 0 │

└───────────────────┘

│

┌────────────────────────┐

│[5: VDataStreamSender] │

│Fragment: 1 │

│MaxActiveTime: 262.945ms│

└────────────────────────┘

│

┌──────────────────────┐

│[2: VAGGREGATION_NODE]│

│Fragment: 1 │

└──────────────────────┘

│

┌──────────────────────┐

│[4: VAGGREGATION_NODE]│

│Fragment: 1 │

└──────────────────────┘

│

┌───────────────────┐

│[3: VEXCHANGE_NODE]│

│Fragment: 1 │

└───────────────────┘

│

┌────────────────────────┐

│[3: VDataStreamSender] │

│Fragment: 2 │

│MaxActiveTime: 252.639ms│

└────────────────────────┘

│

┌──────────────────────┐

│[1: VAGGREGATION_NODE]│

│Fragment: 2 │

└──────────────────────┘

│

┌───────────────────────────────────────────────┐

│[0: VNewOlapScanNode(test_scm_employee_salary)]│

│Fragment: 2 │

└───────────────────────────────────────────────┘

│

┌───────────┐

│[VScanner] │

│Fragment: 2│

└───────────┘

│

┌─────────────────┐

│[SegmentIterator]│

│Fragment: 2 │

└─────────────────┘

(5)获取每个节点详细的执行计划

1、先获取实例的id(找到执行时间最长的一个实例)

示例:show query profile "/271dc937f9564af0-9ec7ce5755d88c66/1";

2、根据实例id查询详细的执行计划

示例:show query profile "/271dc937f9564af0-9ec7ce5755d88c66/1/271dc937f9564af0-9ec7ce5755d88c8c";

结果示例

┌──────────────────────────────────────────────┐

│[5: VDataStreamSender] │

│(Active: 68.512us, non-child: 0.02) │

│ - Counters: │

│ - BlocksSent: 2 │

│ - BrpcSendTime: 33.239us │

│ - BrpcSendTime.Wait: 136.814us │

│ - BytesSent: 0.00 │

│ - CompressTime: 0ns │

│ - IgnoreRows: 0 │

│ - LocalBytesSent: 8.00 B │

│ - LocalSendTime: 10.821us │

│ - LocalSentRows: 1 │

│ - OverallThroughput: 0.0 /sec │

│ - PeakMemoryUsage: 3.13 KB │

│ - SerializeBatchTime: 0ns │

│ - SplitBlockDistributeByChannelTime: 0ns│

│ - SplitBlockHashComputeTime: 0ns │

│ - UncompressedRowBatchSize: 0.00 │

└──────────────────────────────────────────────┘

│

┌────────────────────────────────────┐

│[2: VAGGREGATION_NODE] │

│(Active: 262.903ms, non-child: 0.18)│

│ - Counters: │

│ - BuildTime: 96.322us │

│ - DeserializeDataTime: 0ns │

│ - ExecTime: 3.641us │

│ - ExprTime: 87.430us │

│ - GetResultsTime: 0ns │

│ - HashTableComputeTime: 0ns │

│ - HashTableInputCount: 0 │

│ - HashTableIterateTime: 0ns │

│ - HashTableSize: 0 │

│ - InsertKeysToColumnTime: 0ns │

│ - MaxRowSizeInBytes: 0 │

│ - MergeTime: 0ns │

│ - PeakMemoryUsage: 0.00 │

│ - ProjectionTime: 0ns │

│ - RowsReturned: 1 │

│ - RowsReturnedRate: 3 │

│ - SerializeDataTime: 0ns │

│ - SerializeKeyTime: 0ns │

│ - SerializeResultTime: 2.613us│

│ - StreamingAggTime: 0ns │

└────────────────────────────────────┘

│

┌────────────────────────────────────────────┐

│[4: VAGGREGATION_NODE] │

│(Active: 262.324ms, non-child: 7.37) │

│ - Counters: │

│ - BuildTime: 16.985ms │

│ - DeserializeDataTime: 0ns │

│ - ExecTime: 0ns │

│ - ExprTime: 39.848us │

│ - GetResultsTime: 5.614ms │

│ - HashTableComputeTime: 16.894ms │

│ - HashTableInputCount: 44.321K (44321)│

│ - HashTableIterateTime: 289.882us │

│ - HashTableSize: 44.321K (44321) │

│ - InsertKeysToColumnTime: 5.238ms │

│ - MaxRowSizeInBytes: 71 │

│ - MergeTime: 0ns │

│ - PeakMemoryUsage: 0.00 │

│ - ProjectionTime: 0ns │

│ - RowsReturned: 44.321K (44321) │

│ - RowsReturnedRate: 168.954K /sec │

│ - SerializeDataTime: 0ns │

│ - SerializeKeyTime: 5.242ms │

│ - SerializeResultTime: 0ns │

│ - StreamingAggTime: 0ns │

└────────────────────────────────────────────┘

┌┘

│

┌───────────────────────────────────────────┐

│[3: VEXCHANGE_NODE] │

│(Active: 239.271ms, non-child: 76.51) │

│ - Counters: │

│ - BytesReceived: 584.90 KB │

│ - DataArrivalWaitTime: 238.516ms │

│ - DecompressBytes: 1.22 MB │

│ - DecompressTime: 746.301us │

│ - DeserializeRowBatchTimer: 1.965ms │

│ - FirstBatchArrivalWaitTime: 15.828ms│

│ - PeakMemoryUsage: 2.55 KB │

│ - ProjectionTime: 0ns │

│ - RowsReturned: 44.321K (44321) │

│ - RowsReturnedRate: 185.233K /sec │

│ - SendersBlockedTotalTimer(*): 0ns │

└───────────────────────────────────────────┘

二、具体优化技巧(规范)

一、服务器层面

主要考虑服务器的CPU，内核数，内存大小，磁盘IO,节点数量等。

二、技术层面

一、代码规范

1、技术使用规范

在进行技术层面的优化时首先是不能使用mybatis-plus，要使用xml的方式。

2、sql编写规范

分页查询的时候不要写成两条sql，避免当一条sql的条件变化时另一条sql没有改条件而出问题。

示例:

合并sql写法

<select id="queryEmployeeSalaryByPage" resultType="com.c2f.hmos.scm.core.employeesalary.repo.entity.EmployeeSalaryDO">
        SELECT
            period_id,
            employee_code,
            dept_code,
            emp_type_code,
            duty_code,
            SUM( amount ) AS totalSalary,
            temp.pageTotal
        FROM
            scm_employee_salary inner join
                ( SELECT count(1) as pageTotal
        FROM(
        SELECT
        1
        FROM
        scm_employee_salary
        where
        is_deleted = 0
        <include refid="common"/>
        GROUP BY
        period_id,
        employee_code,
        dept_code,
        emp_type_code,
        duty_code
        ) t
                    ) temp
        where
        is_deleted = 0
        <include refid="common"/>
        GROUP BY
        period_id,
        employee_code,
        dept_code,
        emp_type_code,
        duty_code,temp.pageTotal
        order by period_id desc ,dept_code
LIMIT #{page},#{size}
    </select>

    <sql id="common">
        <if test="!conditions.isEmpty()">
            and
            <foreach collection="conditions.entrySet()" item="val" index="key" separator=" AND ">
                <if test="val != null">
                    ${key} = #{val}
                </if>
            </foreach>
        </if>
    </sql>

通过以上写法可以将两条sql写成一条执行，性能一致。

二、数据库规范

1、首先进行sql优化

首先根据sql优化的一些规则进行。

参考链接15个常用的sql优化技巧-CSDN博客

在分页查询时可以根据id进行子查询走id的索引可以显著提高性能如：

在业务中如果需要对多个字段进行排序可以将主要的字段放在子查询里次要对字段在查询后对查询结果进行排序这样在数据量大的时候在基本满足业务要求的情况下显著提升性能如：

2、模型优化(前缀索引)

在doris的模型创建时会根据一行数据的前 36个字节作为这行数据的前缀索引，这个前缀索引是已一行数据的前36个字节为主，当遇到varchar类型时会断开已之前的为准。所以在创建模型时可以根据业务的要求来确定字段顺序来判断是否需要走前缀索引

示例:

doris中有两种数据和字段一样的表(scm_employee_salary和test_scm_employee_salary),

scm_employee_salary的前缀索引是 period_id,dept_code,id，而test_scm_employee_salary的前缀索引只是id。

sql

SELECT

period_id,

employee_code,

dept_code,

emp_type_code,

duty_code,

SUM( amount ) AS totalSalary

FROM

scm_employee_salary

where

is_deleted = 0

GROUP BY

period_id,

employee_code,

dept_code,

emp_type_code,

duty_code

order by period_id desc ,dept_code

LIMIT 40000,20;

SELECT

period_id,

employee_code,

dept_code,

emp_type_code,

duty_code,

SUM( amount ) AS totalSalary

FROM

test_scm_employee_salary

where

is_deleted = 0

GROUP BY

period_id,

employee_code,

dept_code,

emp_type_code,

duty_code

order by period_id desc ,dept_code

LIMIT 40000,20

查看执行计划

test_scm_employee_salary：

scm_employee_salary：

对比发现scm_employee_salary的hash计算时间，数据合并时间，以及序列化key时间等等都比test_scm_employee_salary的要短，通过前缀索引的调整整体可以提升25%左右的性能

3、doris索引优化

官方文档https://doris.apache.org/zh-CN/docs/data-table/index/index-overview。

doris的索引目前可以使用的只有一个BloomFilter 索引，提升效果不佳，具体使用看业务需求。

4、分区分桶(根据合适的列设置分区,分桶要根据服务器磁盘来定)

分区数据在分的区中均匀分布可以有效提升性能，在分的区中如果数据分布很不均可能会导致性能下降

官方文档https://doris.apache.org/zh-CN/docs/advanced/partition/dynamic-partition

示例:

在表创建的时候选择合适的字段设置分区:

模型创建sql

CREATE TABLE `scm_employee_salary` (

`period_id` bigint(20) NULL COMMENT '期间ID',

`dept_code` varchar(128) NULL COMMENT '成本科室编码',

`id` bigint(20) NOT NULL COMMENT 'id',

`employee_code` varchar(128) NULL COMMENT '员工工号',

`employee_name` varchar(128) NULL COMMENT '员工姓名',

`grant_dept_code` varchar(128) NULL COMMENT '所属科室编码',

`grant_dept_name` varchar(128) NULL COMMENT '所属科室名称',

`dept_name` varchar(128) NULL COMMENT '成本科室名称',

`emp_type_code` varchar(128) NULL COMMENT '员工类型编码',

`emp_type_name` varchar(128) NULL COMMENT '员工类型名称',

`duty_code` varchar(128) NULL COMMENT '职称编码编码',

`duty_name` varchar(128) NULL COMMENT '职称编码名称',

`salary_item_code` varchar(128) NULL COMMENT '工资项编码',

`salary_item_name` varchar(128) NULL COMMENT '工资项名称',

`amount` decimal(20, 8) NULL COMMENT '金额',

`org_code` varchar(128) NULL COMMENT '医疗机构代码（冗余）',

`org_name` varchar(128) NULL COMMENT '医疗机构名称（冗余）',

`hospital_code` varchar(128) NULL COMMENT '院区代码（冗余）',

`hospital_name` varchar(128) NULL COMMENT '院区名称（冗余）',

`extend` varchar(255) NULL COMMENT '扩展字段',

`create_by` bigint(20) NULL COMMENT '创建人',

`create_org_id` bigint(20) NULL COMMENT '创建人机构ID',

`gmt_create` datetime NULL COMMENT '创建时间',

`modify_by` bigint(20) NULL COMMENT '修改人',

`modify_org_id` bigint(20) NULL COMMENT '修改机构ID',

`gmt_modify` datetime NULL COMMENT '修改时间',

`remark` varchar(255) NULL COMMENT '备注',

`app_code` varchar(64) NULL COMMENT 'app编码',

`model_code` varchar(64) NULL COMMENT '模型编码',

`employee_id` bigint(20) NULL COMMENT '员工id',

`grant_dept_id` bigint(20) NULL COMMENT '所属科室id',

`dept_id` bigint(20) NULL COMMENT '成本科室id',

`duty_id` bigint(20) NULL COMMENT '职称id',

`is_deleted` smallint(6) NULL DEFAULT "0" COMMENT '是否删除 0=否， 1=是',

`org_id` bigint(20) NULL COMMENT '组织ID'

) ENGINE=OLAP

UNIQUE KEY(`period_id`, `dept_code`, `id`)

COMMENT '人员薪资'

PARTITION BY RANGE(`period_id`)

(

PARTITION period2023 VALUES [('199701'),('202312')),

PARTITION period2024 VALUES [('202401'),('202412')),

PARTITION period2025 VALUES [('202501'),('202512')),

PARTITION period2026 VALUES [('202601'),('202612')),

PARTITION period2027 VALUES [('202701'),('202712')),

PARTITION period2028 VALUES [('202801'),('202812')),

PARTITION period2029 VALUES [('202901'),('202912')),

PARTITION period2030 VALUES [('203001'),('203012')),

PARTITION period2031 VALUES [('203101'),('203112')),

PARTITION period2032 VALUES [('203201'),('203212')),

PARTITION period2033 VALUES [('203301'),('203312')),

PARTITION period2034 VALUES [('203401'),('203412')),

PARTITION period2035 VALUES [('203501'),('203512')),

PARTITION period2036 VALUES [('203601'),('209912'))

)

DISTRIBUTED BY HASH(`period_id`, `dept_code`) BUCKETS AUTO

PROPERTIES (

"replication_allocation" = "tag.location.default: 3",

"bloom_filter_columns" = "dept_code, period_id",

"in_memory" = "false",

"storage_format" = "V2",

"disable_auto_compaction" = "false"

);

通过对比查询性能提升了2倍多，主要是选择什么字段作为分区条件比较难选择，还是要看具体业务来选择。

5、物化视图(需要group或order的时候使用，现阶段版本不支持多个列)

官方文档https://doris.apache.org/zh-CN/docs/query-acceleration/materialized-view

示例:

创建物化视图sql

createtable sales_records(record_id int, seller_id int, store_id int, sale_date date, sale_amt bigint) distributedbyhash(record_id) properties("replication_num" = "1");