有时候查询中会带有多个字段的查询条件,但是其中任何单个字段的选择率都不高,而多个字段组合起来却有比较好的选择率。这种场景是bitmap索引大显身手的地方,但是bitmap索引对更新性能的影响相当大。PostgreSQL不支持bitmap索引,但是有一个临时的内存中的类似bitmap索引的东西,叫Bitmap Index Scan。
除了Bitmap Index Scan,组合索引也是一种选择。
准备测试数据
CREATE TABLE tb1 (
c1 int,
c2 int
);
INSERT INTO tb1
SELECT
round(random()*100),
round(random()*1000)
FROM
generate_series(1,10000000);
SELECT
pg_size_pretty(pg_table_size('tb1'));
pg_size_pretty
----------------
346 MB
(1 row)
性能对比测试
全表扫描
postgres=# \timing
SELECT
count(*)
FROM
tb1
WHERE
c1=99
AND c2=999;
count
-------
98
(1 row)
Time: 1200.874 ms
c1单索引扫描
CREATE INDEX tb1_idx1 ON tb1 (c1);
postgres=# ANALYZE;
EXPLAIN (analyze true, buffers true, timing true)
SELECT
count(*)
FROM
tb1
WHERE
c1=99
AND c2=999;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=47394.85..47394.86 rows=1 width=8) (actual time=92.643..92.643 rows=1 loops=1)
Buffers: shared hit=39883
-> Bitmap Heap Scan on tb1 (cost=1737.18..47394.63 rows=91 width=0) (actual time=23.775..92.613 rows=98 loops=1)
Recheck Cond: (c1 = 99)
Filter: (c2 = 999)
Rows Removed by Filter: 99813
Heap Blocks: exact=39607
Buffers: shared hit=39883
-> Bitmap Index Scan on tb1_idx1 (cost=0.00..1737.16 rows=93963 width=0) (actual time=15.078..15.078 rows=99911 loops=1)
Index Cond: (c1 = 99)
Buffers: shared hit=276
Planning time: 0.080 ms
Execution time: 93.037 ms
(13 rows)
c2单索引扫描
DROP INDEX tb1_idx1;
CREATE INDEX tb1_idx2 ON tb1 (c2);
postgres=# ANALYZE;
EXPLAIN (analyze true, buffers true, timing true)
SELECT
count(*)
FROM
tb1
WHERE
c1=99
AND c2=999;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=23542.41..23542.42 rows=1 width=8) (actual time=15.505..15.505 rows=1 loops=1)
Buffers: shared hit=8880 read=30
-> Bitmap Heap Scan on tb1 (cost=180.77..23542.18 rows=91 width=0) (actual time=3.831..15.482 rows=98 loops=1)
Recheck Cond: (c2 = 999)
Filter: (c1 = 99)
Rows Removed by Filter: 9732
Heap Blocks: exact=8880
Buffers: shared hit=8880 read=30
-> Bitmap Index Scan on tb1_idx2 (cost=0.00..180.75 rows=9642 width=0) (actual time=2.099..2.099 rows=9830 loops=1)
Index Cond: (c2 = 999)
Buffers: shared read=30
Planning time: 0.244 ms
Execution time: 15.622 ms
(13 rows)
c1+c2索引bitmap扫描
CREATE INDEX tb1_idx1 ON tb1 (c1);
postgres=# ANALYZE;
EXPLAIN (analyze true, buffers true, timing true)
SELECT
count(*)
FROM
tb1
WHERE
c1=99
AND c2=999;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2420.36..2420.37 rows=1 width=8) (actual time=21.250..21.250 rows=1 loops=1)
Buffers: shared hit=128 read=276
-> Bitmap Heap Scan on tb1 (cost=1964.58..2420.07 rows=118 width=0) (actual time=21.076..21.236 rows=98 loops=1)
Recheck Cond: ((c2 = 999) AND (c1 = 99))
Heap Blocks: exact=98
Buffers: shared hit=128 read=276
-> BitmapAnd (cost=1964.58..1964.58 rows=118 width=0) (actual time=21.041..21.041 rows=0 loops=1)
Buffers: shared hit=30 read=276
-> Bitmap Index Scan on tb1_idx2 (cost=0.00..235.44 rows=12667 width=0) (actual time=2.260..2.260 rows=9830 loops=1)
Index Cond: (c2 = 999)
Buffers: shared hit=30
-> Bitmap Index Scan on tb1_idx1 (cost=0.00..1728.83 rows=93386 width=0) (actual time=16.887..16.887 rows=99911 loops=1)
Index Cond: (c1 = 99)
Buffers: shared read=276
Planning time: 0.386 ms
Execution time: 21.355 ms
(16 rows)
组合索引扫描
DROP INDEX tb1_idx1;
DROP INDEX tb1_idx2;
CREATE INDEX tb1_idx3 ON tb1(c1,c2);
postgres=# ANALYZE;
EXPLAIN (analyze true, buffers true, timing true)
SELECT
count(*)
FROM
tb1
WHERE
c1=99
AND c2=999;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=358.58..358.59 rows=1 width=8) (actual time=0.276..0.277 rows=1 loops=1)
Buffers: shared hit=98 read=3
-> Bitmap Heap Scan on tb1 (cost=5.37..358.35 rows=91 width=0) (actual time=0.070..0.257 rows=98 loops=1)
Recheck Cond: ((c1 = 99) AND (c2 = 999))
Heap Blocks: exact=98
Buffers: shared hit=98 read=3
-> Bitmap Index Scan on tb1_idx3 (cost=0.00..5.34 rows=91 width=0) (actual time=0.040..0.041 rows=98 loops=1)
Index Cond: ((c1 = 99) AND (c2 = 999))
Buffers: shared read=3
Planning time: 0.331 ms
Execution time: 0.322 ms
(11 rows)
由此可见,本例中,组合索引的效率相当高。
组合索引使用效果总结
-
组合索引多字段是有序的,并且是个完整的B-Tree索引,有最左原则。
多列索引是先按照第一列进行排序,然后在第一列排好序的基础上再对第二列排序,如果没有第一列的话,直接访问第二列,那第二列肯定是无序的,直接访问后面的列就用不到索引了。 -
组合索引的生效原则是:从前往后依次使用生效,如果中间某个索引没有使用,那么断点前面的索引部分起作用,断点后面的索引没有起作用。
例如组合索引(a,b,c):
where a=3 and b=45 and c=5,这种情况下三个索引顺序使用中间没有断点,全部发挥作用;
where a=3 and c=5,这种情况下b就是断点,a发挥了效果,c没有效果;
where b=3 and c=4,这种情况下a就是断点,在a后面的索引都没有发挥作用,这种写法联合索引没有发挥任何效果;
where b=45 and a=3 and c=5,这个跟第一个一样,全部发挥作用,abc只要用上了就行,跟写的顺序无关。
组合索引使用判断
-
select * from mytable where a=3 and b=5 and c=4;
abc三个索引都在where条件里面用到了,而且都发挥了作用 -
select * from mytable where a=3 and c=7;
a用到索引,b没有用,所以c是没有用到索引效果的 -
select * from mytable where a=3 and b>7 and c=3; --范围值就算是断点
a用到了,b也用到了,c没有用到,这个地方b是范围值,也算断点,只不过自身用到了索引 -
select * from mytable where b=3 and c=4;
因为a索引没有使用,所以这里bc都没有用上索引效果 -
select * from mytable where a>4 and b=7 and c=9;
a用到了,b没有使用,c没有使用 -
select * from mytable where a=3 order by b;
a用到了索引,b在结果排序中也用到了索引的效果,前面说了,a下面任意一段的b是排好序的 -
select * from mytable where a=3 order by c;
a用到了索引,但是这个地方c没有发挥排序效果,因为中间断点了 -
select * from mytable where b=3 order by a;
b没有用到索引,排序中a也没有发挥索引效果