在实际生活中,我们可能会遇见这样的场景:在你打开淘宝买东西的时候,常常面对多如牛毛的商品,可能会先按照自己想要的条件进行筛选然后再慢慢选择,比方说选择某某品牌、价位、颜色等等.
这个时候就会遇到任意列组合查询的情况了,因为你不知道用户会按照哪些条件进行筛选,可能你会说那我在所有列都建立索引不就行了?的确这是一种方法,但是如果某张表有几百上千个列时候呢,PostgreSQL为我们 提供了几种不同的方法.
- gin索引(支持任意字段组合的查询)
- bloom索引(支持任意只读组合的等值查询)
- 每个单列btree索引(支持任意字段组合的查询)
1、bloom索引
bloom索引接口是PostgreSQL基于bloom filter构造的一个索引接口(不了解bloom filter算法的可以参考这篇文章:https://blog.csdn.net/hguisu/article/details/7866173), 属于lossy索引,可以收敛结果集(排除绝对不满足条件的结果,剩余的结果里再挑选满足条件的结果),因此需要二次check,bloom支持任意列组合的等值查询。
bloom存储的是签名,签名越大,耗费的空间越多,但是排除更加精准。有利有弊。
语法:
CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
WITH (length=80, col1=2, col2=2, col3=4);
签名长度 80 bit, 最大允许4096 bits
col1 - col32,分别指定每列的bits,默认长度2,最大允许4095 bits.
bill=# create table t1(c1 int, c2 int, c3 int, c4 int, c5 int);
CREATE TABLE
bill=#
bill=# create index idx_bloom on t1 using bloom(c1,c2,c3,c4,c5);
CREATE INDEX
bill=# explain select * from t1 where c1 =10;
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=14.05..20.56 rows=8 width=20)
Recheck Cond: (c1 = 10)
-> Bitmap Index Scan on idx_bloom (cost=0.00..14.05 rows=8 width=0)
Index Cond: (c1 = 10)
(4 rows)
bill=# explain select * from t1 where c1 =10 and c2 =20;
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=18.30..19.62 rows=1 width=20)
Recheck Cond: ((c1 = 10) AND (c2 = 20))
-> Bitmap Index Scan on idx_bloom (cost=0.00..18.30 rows=1 width=0)
Index Cond: ((c1 = 10) AND (c2 = 20))
(4 rows)
bill=# explain select * from t1 where c1 =10 and c2 =20 and c3 =30;
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=22.55..23.87 rows=1 width=20)
Recheck Cond: ((c1 = 10) AND (c2 = 20) AND (c3 = 30))
-> Bitmap Index Scan on idx_bloom (cost=0.00..22.55 rows=1 width=0)
Index Cond: ((c1 = 10) AND (c2 = 20) AND (c3 = 30))
(4 rows)
不过bloom方法过滤的效果有限,建议观察使用
2、gin索引
bill=# create index idx_gin on t1 using gin(c1,c2,c3,c4,c5);
CREATE INDEX
bill=# explain select * from t1 where c1 =10 or c2 =20;
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=5.34..15.59 rows=17 width=20)
Recheck Cond: ((c1 = 10) OR (c2 = 20))
-> BitmapOr (cost=5.34..5.34 rows=17 width=0)
-> Bitmap Index Scan on idx_gin (cost=0.00..2.66 rows=8 width=0)
Index Cond: (c1 = 10)
-> Bitmap Index Scan on idx_gin (cost=0.00..2.66 rows=8 width=0)
Index Cond: (c2 = 20)
(7 rows)
bill=# explain select * from t1 where c1 =10 or c2 =20 and c3 =30;
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=6.57..14.07 rows=9 width=20)
Recheck Cond: ((c1 = 10) OR ((c2 = 20) AND (c3 = 30)))
-> BitmapOr (cost=6.57..6.57 rows=9 width=0)
-> Bitmap Index Scan on idx_gin (cost=0.00..2.66 rows=8 width=0)
Index Cond: (c1 = 10)
-> Bitmap Index Scan on idx_gin (cost=0.00..3.90 rows=1 width=0)
Index Cond: ((c2 = 20) AND (c3 = 30))
(7 rows)
GIN索引实现了内部bitmapAnd or bitmapOr,实际上等效于对每个字段建立单独的B-Tree索引,GIN的复合索引这种方法s使用,但是,当数据量非常庞大或者列非常多时,GIN索引会比较大。
同时GIN建议使用fastupdate和延迟合并的特性,加速插入、删除、更新操作。
3、multi-btree
bill=# create index idx_t11 on t1(c1);
CREATE INDEX
bill=# create index idx_t12 on t1(c2);
CREATE INDEX
bill=# create index idx_t13 on t1(c3);
CREATE INDEX
bill=# create index idx_t14 on t1(c4);
CREATE INDEX
bill=# create index idx_t15 on t1(c5);
CREATE INDEX
bill=# explain select * from t1 where c1 = 10 and c2 =20;
QUERY PLAN
----------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=3.28..4.59 rows=1 width=20)
Recheck Cond: ((c2 = 20) AND (c1 = 10))
-> BitmapAnd (cost=3.28..3.28 rows=1 width=0)
-> Bitmap Index Scan on idx_t12 (cost=0.00..1.51 rows=8 width=0)
Index Cond: (c2 = 20)
-> Bitmap Index Scan on idx_t11 (cost=0.00..1.51 rows=8 width=0)
Index Cond: (c1 = 10)
(7 rows)
bill=# explain select * from t1 where c1 = 10 and c2 =20 or c3 =30;
QUERY PLAN
----------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=4.79..12.29 rows=9 width=20)
Recheck Cond: (((c2 = 20) AND (c1 = 10)) OR (c3 = 30))
-> BitmapOr (cost=4.79..4.79 rows=9 width=0)
-> BitmapAnd (cost=3.28..3.28 rows=1 width=0)
-> Bitmap Index Scan on idx_t12 (cost=0.00..1.51 rows=8 width=0)
Index Cond: (c2 = 20)
-> Bitmap Index Scan on idx_t11 (cost=0.00..1.51 rows=8 width=0)
Index Cond: (c1 = 10)
-> Bitmap Index Scan on idx_t13 (cost=0.00..1.51 rows=8 width=0)
Index Cond: (c3 = 30)
(10 rows)
这种方法在列较少的情况下使用也比较便捷.
另外说明下,在不改变表结构的情况下可以使用上面三种方法,但在实际应用中我们也可以考虑比如将多列合并成一个text,利用PostgreSQL的全文检索功能配合gin索引去查询,这也是一种优化思路.