PostgreSQL索引走错一例分析

最新推荐文章于 2024-07-12 07:13:33 发布

weixin_34410662

最新推荐文章于 2024-07-12 07:13:33 发布

阅读量232

点赞数

文章标签：数据库

原文链接：https://yq.aliyun.com/articles/149116

版权

生成数据

create table test(id1 int, id2 int, id3 int);

create index id1_idx on test using btree (id1);

create index id2_idx on test using btree (id2);

insert into test select t,t ,t from generate_series(10000000, 0, -1) as t;

insert into test select 10000001, 10000001 , (random()*100000)::int from generate_series(1, 5000);

analyze test;

测试SQL

explain analyze  select min(id1) from test where id2 = 10000001;

上面我们在列id1和id2分别创建了索引，我们的猜想是优化器会在id1_idx和id2_idx上选择一个最优的执行计划，But.... 请看下面

现象

下面是数据实际的执行计划，可以看到使用了id1_idx索引，然而执行时间很长

postgres=> explain analyze  select min(id1) from test where id2 = 10000001;
                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Result  (cost=103.28..103.29 rows=1 width=0) (actual time=9860.209..9860.209 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.43..103.28 rows=1 width=4) (actual time=9860.199..9860.202 rows=1 loops=1)
          ->  Index Scan using id1_idx on test  (cost=0.43..445840.93 rows=4335 width=4) (actual time=9860.197..9860.197 rows=1 loops=1)
                Index Cond: (id1 IS NOT NULL)
                Filter: (id2 = 10000001)
                Rows Removed by Filter: 10000001
Planning time: 99.912 ms
Execution time: 9860.282 ms
(9 rows)

Time: 10069.370 ms

我们换成id2_idx索引试试，改造一下SQL，min(id1)改成min(id1+0) 即可，结果令人惊讶，执行时间比使用id1_idx快很多。

postgres=> explain analyze  select min(id1+0) from test where id2 = 10000001;
                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=295.38..295.39 rows=1 width=4) (actual time=1.878..1.878 rows=1 loops=1)
  ->  Index Scan using id2_idx on test  (cost=0.43..273.70 rows=4335 width=4) (actual time=0.034..1.234 rows=5000 loops=1)
        Index Cond: (id2 = 10000001)
Planning time: 0.126 ms
Execution time: 1.931 ms
(5 rows)

Time: 6.889 ms

分析及总结

执行计划由代价决定的，我们先看使用id1_idx索引的代价(cost=103.28..103.29)，是怎么计算出来的呢？id1_idx索引总代价445840.93，满足条件记录数是4335, 我们需要的是最小值，因此只要找到索引第一条满足条件的记录即可，找到第一条记录的代价=445840.93/4335 ~= 103.29(优化器的代价模型是假设这4336条记录平均分布)，而走id2_idx索引的代价为273.70，根据计算走id1_idx代价确实很低，那么是什么原因导致执行起来很慢呢？通过我们生产的数据看，10000001这个记录的数据分布集中在表的结尾，所以找到索引第一条记录代价并不是445840.93/4335，而是接近445840.93。

PG优化器对这样的case的优化并非完美，在生成执行计划的过程中可以结合一下数据分布特点，不断地优化代价模型。作为DBA，我们也要结合具体业务进行SQL优化，避免让优化器误判。

weixin_34410662

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
PostgreSQL索引走错一例分析

生成数据create table test(id1 int, id2 int, id3 int);create index id1_idx on test using btree (id1);create index id2_idx on test using btree (id2);insert into test select t,t ...
复制链接

扫一扫