金仓数据库KingbaseES Group 和 Distinct 列的次序影响查询性能

金仓数据库

于 2022-03-29 10:03:43 发布

阅读量1k

点赞数

文章标签：金仓数据库函数返回值结果集

本文链接：https://blog.csdn.net/Kingbase_/article/details/123813358

版权

一、概述
优化拥有大量的分组和去重列的SQL时，这些排序列的次序，也是可以优化的地方。
测试数据结构
kingbase=# select count(distinct txt1 ) txt1, avg(length(txt1))::int ln1, count(distinct txt3 ) txt3 ,avg(length(txt3))::int ln3 from txt01;

txt1 | ln1 | txt3 | ln3
------±-----±--------±----
1000 | 1000 | 1000000 | 10
(1 行记录)
二、work_mem 满足排序情况
1、Distinct 语句
次序： txt1,txt3
kingbase=# explain (analyse ,buffers ) select /+ set(work_mem 100MB) / distinct txt1 ,txt3 from txt01 ;
QUERY PLAN

HashAggregate (cost=269287.33…269687.33 rows=40000 width=64) (actual time=1543.995…1877.527 rows=1000000 loops=1)
Group Key: txt1, txt3
Buffers: shared hit=142858
-> Seq Scan on txt01 (cost=0.00…227144.22 rows=8428622 width=64) (actual time=0.008…159.858 rows=1000000 loops=1)
Buffers: shared hit=142858
Planning Time: 0.081 ms
Execution Time: 1947.951 ms
(7 行记录)
次序： txt3,txt1
kingbase=# explain (analyse ,buffers ) select /+ set(work_mem 100MB) / distinct txt3 ,txt1 from txt01 ;
QUERY PLAN

HashAggregate (cost=269287.33…269687.33 rows=40000 width=64) (actual time=1596.040…1812.380 rows=1000000 loops=1)
Group Key: txt3, txt1
Buffers: shared hit=142858
-> Seq Scan on txt01 (cost=0.00…227144.22 rows=8428622 width=64) (actual time=0.007…163.399 rows=1000000 loops=1)
Buffers: shared hit=142858
Planning Time: 0.075 ms
Execution Time: 1884.907 ms
(7 行记录)
2、Group by 语句
次序： txt1,txt3
kingbase=# explain (analyse ,buffers ) select /+ set(work_mem 100MB) / txt1 ,txt3 from txt01 group by txt1 ,txt3 ;
QUERY PLAN

HashAggregate (cost=269287.33…269687.33 rows=40000 width=64) (actual time=1540.948…1875.917 rows=1000000 loops=1)
Group Key: txt1, txt3
Buffers: shared hit=142858
-> Seq Scan on txt01 (cost=0.00…227144.22 rows=8428622 width=64) (actual time=0.006…160.419 rows=1000000 loops=1)
Buffers: shared hit=142858
Planning Time: 0.084 ms
Execution Time: 1939.103 ms
(7 行记录)
次序： txt3,txt1
kingbase=# explain (analyse ,buffers ) select /+ set(work_mem 100MB) / txt1 ,txt3 from txt01 group by txt3 ,txt1 ;
QUERY PLAN

HashAggregate (cost=269287.33…269687.33 rows=40000 width=64) (actual time=1557.257…1780.662 rows=1000000 loops=1)
Group Key: txt3, txt1
Buffers: shared hit=142858
-> Seq Scan on txt01 (cost=0.00…227144.22 rows=8428622 width=64) (actual time=0.018…165.221 rows=1000000 loops=1)
Buffers: shared hit=142858
Planning Time: 0.330 ms
Execution Time: 1844.664 ms
(7 行记录)
三、work_mem 不满足排序情况
1、Distinct 语句
次序： txt1,txt3
kingbase=# explain (analyse ,buffers ) select /+ set(work_mem 1MB) / distinct txt1 ,txt3 from txt01 ;
QUERY PLAN

Unique (cost=2464313.08…2527527.74 rows=40000 width=64) (actual time=21031.092…22131.259 rows=1000000 loops=1)
Buffers: shared hit=142858, temp read=125368 written=125369
-> Sort (cost=2464313.08…2485384.63 rows=8428622 width=64) (actual time=21031.089…22002.850 rows=1000000 loops=1)
Sort Key: txt1, txt3
Sort Method: external merge Disk: 1002944kB
Buffers: shared hit=142858, temp read=125368 written=125369
-> Seq Scan on txt01 (cost=0.00…227144.22 rows=8428622 width=64) (actual time=0.039…272.327 rows=1000000 loops=1)
Buffers: shared hit=142858
Planning Time: 0.272 ms
Execution Time: 23648.185 ms
(10 行记录)
次序： txt3,txt1
kingbase=# explain (analyse ,buffers ) select /+ set(work_mem 1MB) / distinct txt3 ,txt1 from txt01 ;
QUERY PLAN

Unique (cost=2464313.08…2527527.74 rows=40000 width=64) (actual time=4004.641…4367.218 rows=1000000 loops=1)
Buffers: shared hit=142858, temp read=125491 written=125492
-> Sort (cost=2464313.08…2485384.63 rows=8428622 width=64) (actual time=4004.639…4239.599 rows=1000000 loops=1)
Sort Key: txt3, txt1
Sort Method: external merge Disk: 1003928kB
Buffers: shared hit=142858, temp read=125491 written=125492
-> Seq Scan on txt01 (cost=0.00…227144.22 rows=8428622 width=64) (actual time=0.011…271.572 rows=1000000 loops=1)
Buffers: shared hit=142858
Planning Time: 0.086 ms
Execution Time: 4457.751 ms
(10 行记录)
2、Group by 语句
次序： txt1,txt3
kingbase=# explain (analyse ,buffers ) select /+ set(work_mem 1MB) / txt1 ,txt3 from txt01 group by txt1 ,txt3 ;
QUERY PLAN

Group (cost=2464313.08…2527527.74 rows=40000 width=64) (actual time=21715.770…22796.166 rows=1000000 loops=1)
Group Key: txt1, txt3
Buffers: shared hit=142858, temp read=125368 written=125369
-> Sort (cost=2464313.08…2485384.63 rows=8428622 width=64) (actual time=21715.764…22658.413 rows=1000000 loops=1)
Sort Key: txt1, txt3
Sort Method: external merge Disk: 1002944kB
Buffers: shared hit=142858, temp read=125368 written=125369
-> Seq Scan on txt01 (cost=0.00…227144.22 rows=8428622 width=64) (actual time=0.029…271.335 rows=1000000 loops=1)
Buffers: shared hit=142858
Planning Time: 0.285 ms
Execution Time: 25365.012 ms
(11 行记录)
次序： txt3,txt1
kingbase=# explain (analyse ,buffers ) select /+ set(work_mem 1MB) / txt1 ,txt3 from txt01 group by txt3 ,txt1 ;
QUERY PLAN

Group (cost=2464313.08…2527527.74 rows=40000 width=64) (actual time=4156.296…4541.315 rows=1000000 loops=1)
Group Key: txt3, txt1
Buffers: shared hit=142858, temp read=125368 written=125369
-> Sort (cost=2464313.08…2485384.63 rows=8428622 width=64) (actual time=4156.291…4402.265 rows=1000000 loops=1)
Sort Key: txt3, txt1
Sort Method: external merge Disk: 1002944kB
Buffers: shared hit=142858, temp read=125368 written=125369
-> Seq Scan on txt01 (cost=0.00…227144.22 rows=8428622 width=64) (actual time=0.008…270.872 rows=1000000 loops=1)
Buffers: shared hit=142858
Planning Time: 0.081 ms
Execution Time: 4632.567 ms
(11 行记录)
四、总结
次序 txt1,txt1 txt3,txt1
work_mem满足排序 1947.951 ms 1884.907 ms
work_mem不足排序 25365.012 ms 4632.567 ms
字节少数据值多的列，处于排序列的前列，可以带来性能的提升。当work_mem满足排序时，性能差异不大，当work_mem不足时，性能提升较大。

金仓数据库

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
金仓数据库KingbaseES Group 和 Distinct 列的次序影响查询性能

一、概述优化拥有大量的分组和去重列的SQL时，这些排序列的次序，也是可以优化的地方。测试数据结构kingbase=# select count(distinct txt1 ) txt1, avg(length(txt1))::int ln1, count(distinct txt3 ) txt3 ,avg(length(txt3))::int ln3 from txt01;txt1 | ln1 | txt3 | ln3------±-----±--------±----1000 | 10
复制链接

扫一扫