2.5 组合索引

有时候查询中会带有多个字段的查询条件,但是其中任何单个字段的选择率都不高,而多个字段组合起来却有比较好的选择率。这种场景是bitmap索引大显身手的地方,但是bitmap索引对更新性能的影响相当大。PostgreSQL不支持bitmap索引,但是有一个临时的内存中的类似bitmap索引的东西,叫Bitmap Index Scan。

除了Bitmap Index Scan,组合索引也是一种选择。

准备测试数据

CREATE TABLE tb1 (
    c1 int,
    c2 int
);


INSERT INTO tb1
SELECT
    round(random()*100),
    round(random()*1000)
FROM
    generate_series(1,10000000);


SELECT 
   pg_size_pretty(pg_table_size('tb1'));

 pg_size_pretty
----------------
 346 MB
(1 row)

性能对比测试

全表扫描

postgres=# \timing


SELECT
    count(*)
FROM
    tb1 
WHERE
    c1=99
    AND c2=999;
    
 count
-------
    98
(1 row)

Time: 1200.874 ms

c1单索引扫描

CREATE INDEX tb1_idx1 ON tb1 (c1);


postgres=# ANALYZE;


EXPLAIN (analyze true, buffers true, timing true)
SELECT
    count(*)
FROM
    tb1 
WHERE
    c1=99
    AND c2=999;

                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=47394.85..47394.86 rows=1 width=8) (actual time=92.643..92.643 rows=1 loops=1)
   Buffers: shared hit=39883
   ->  Bitmap Heap Scan on tb1  (cost=1737.18..47394.63 rows=91 width=0) (actual time=23.775..92.613 rows=98 loops=1)
         Recheck Cond: (c1 = 99)
         Filter: (c2 = 999)
         Rows Removed by Filter: 99813
         Heap Blocks: exact=39607
         Buffers: shared hit=39883
         ->  Bitmap Index Scan on tb1_idx1  (cost=0.00..1737.16 rows=93963 width=0) (actual time=15.078..15.078 rows=99911 loops=1)
               Index Cond: (c1 = 99)
               Buffers: shared hit=276
 Planning time: 0.080 ms
 Execution time: 93.037 ms
(13 rows)

c2单索引扫描

DROP INDEX tb1_idx1;


CREATE INDEX tb1_idx2 ON tb1 (c2);


postgres=# ANALYZE;


EXPLAIN (analyze true, buffers true, timing true)
SELECT
    count(*)
FROM
    tb1
WHERE
    c1=99
    AND c2=999;
    
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=23542.41..23542.42 rows=1 width=8) (actual time=15.505..15.505 rows=1 loops=1)
   Buffers: shared hit=8880 read=30
   ->  Bitmap Heap Scan on tb1  (cost=180.77..23542.18 rows=91 width=0) (actual time=3.831..15.482 rows=98 loops=1)
         Recheck Cond: (c2 = 999)
         Filter: (c1 = 99)
         Rows Removed by Filter: 9732
         Heap Blocks: exact=8880
         Buffers: shared hit=8880 read=30
         ->  Bitmap Index Scan on tb1_idx2  (cost=0.00..180.75 rows=9642 width=0) (actual time=2.099..2.099 rows=9830 loops=1)
               Index Cond: (c2 = 999)
               Buffers: shared read=30
 Planning time: 0.244 ms
 Execution time: 15.622 ms
(13 rows)

c1+c2索引bitmap扫描

CREATE INDEX tb1_idx1 ON tb1 (c1);


postgres=# ANALYZE;


EXPLAIN (analyze true, buffers true, timing true)
SELECT
    count(*)
FROM
    tb1
WHERE
    c1=99
    AND c2=999;
    
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2420.36..2420.37 rows=1 width=8) (actual time=21.250..21.250 rows=1 loops=1)
   Buffers: shared hit=128 read=276
   ->  Bitmap Heap Scan on tb1  (cost=1964.58..2420.07 rows=118 width=0) (actual time=21.076..21.236 rows=98 loops=1)
         Recheck Cond: ((c2 = 999) AND (c1 = 99))
         Heap Blocks: exact=98
         Buffers: shared hit=128 read=276
         ->  BitmapAnd  (cost=1964.58..1964.58 rows=118 width=0) (actual time=21.041..21.041 rows=0 loops=1)
               Buffers: shared hit=30 read=276
               ->  Bitmap Index Scan on tb1_idx2  (cost=0.00..235.44 rows=12667 width=0) (actual time=2.260..2.260 rows=9830 loops=1)
                     Index Cond: (c2 = 999)
                     Buffers: shared hit=30
               ->  Bitmap Index Scan on tb1_idx1  (cost=0.00..1728.83 rows=93386 width=0) (actual time=16.887..16.887 rows=99911 loops=1)
                     Index Cond: (c1 = 99)
                     Buffers: shared read=276
 Planning time: 0.386 ms
 Execution time: 21.355 ms
(16 rows)

组合索引扫描

DROP INDEX tb1_idx1;


DROP INDEX tb1_idx2;


CREATE INDEX tb1_idx3 ON tb1(c1,c2);


postgres=# ANALYZE;


EXPLAIN (analyze true, buffers true, timing true)
SELECT
    count(*)
FROM
    tb1
WHERE
    c1=99
    AND c2=999;
    
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=358.58..358.59 rows=1 width=8) (actual time=0.276..0.277 rows=1 loops=1)
   Buffers: shared hit=98 read=3
   ->  Bitmap Heap Scan on tb1  (cost=5.37..358.35 rows=91 width=0) (actual time=0.070..0.257 rows=98 loops=1)
         Recheck Cond: ((c1 = 99) AND (c2 = 999))
         Heap Blocks: exact=98
         Buffers: shared hit=98 read=3
         ->  Bitmap Index Scan on tb1_idx3  (cost=0.00..5.34 rows=91 width=0) (actual time=0.040..0.041 rows=98 loops=1)
               Index Cond: ((c1 = 99) AND (c2 = 999))
               Buffers: shared read=3
 Planning time: 0.331 ms
 Execution time: 0.322 ms
(11 rows)

由此可见,本例中,组合索引的效率相当高。

组合索引使用效果总结

  • 组合索引多字段是有序的,并且是个完整的B-Tree索引,有最左原则。
    多列索引是先按照第一列进行排序,然后在第一列排好序的基础上再对第二列排序,如果没有第一列的话,直接访问第二列,那第二列肯定是无序的,直接访问后面的列就用不到索引了。

  • 组合索引的生效原则是:从前往后依次使用生效,如果中间某个索引没有使用,那么断点前面的索引部分起作用,断点后面的索引没有起作用。
    例如组合索引(a,b,c):
    where a=3 and b=45 and c=5,这种情况下三个索引顺序使用中间没有断点,全部发挥作用;
    where a=3 and c=5,这种情况下b就是断点,a发挥了效果,c没有效果;
    where b=3 and c=4,这种情况下a就是断点,在a后面的索引都没有发挥作用,这种写法联合索引没有发挥任何效果;
    where b=45 and a=3 and c=5,这个跟第一个一样,全部发挥作用,abc只要用上了就行,跟写的顺序无关。

组合索引使用判断

  1. select * from mytable where a=3 and b=5 and c=4;
    abc三个索引都在where条件里面用到了,而且都发挥了作用

  2. select * from mytable where a=3 and c=7;
    a用到索引,b没有用,所以c是没有用到索引效果的

  3. select * from mytable where a=3 and b>7 and c=3; --范围值就算是断点
    a用到了,b也用到了,c没有用到,这个地方b是范围值,也算断点,只不过自身用到了索引

  4. select * from mytable where b=3 and c=4;
    因为a索引没有使用,所以这里bc都没有用上索引效果

  5. select * from mytable where a>4 and b=7 and c=9;
    a用到了,b没有使用,c没有使用

  6. select * from mytable where a=3 order by b;
    a用到了索引,b在结果排序中也用到了索引的效果,前面说了,a下面任意一段的b是排好序的

  7. select * from mytable where a=3 order by c;
    a用到了索引,但是这个地方c没有发挥排序效果,因为中间断点了

  8. select * from mytable where b=3 order by a;
    b没有用到索引,排序中a也没有发挥索引效果

参考

关于PostgreSQL中的组合索引之一(b-tree篇)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值