PostgreSQL 多字段任意组合搜索

最新推荐文章于 2024-08-13 15:52:42 发布

foucus、

最新推荐文章于 2024-08-13 15:52:42 发布

阅读量3.3k

点赞数

分类专栏： PostgreSQL 文章标签： PostgreSQL

本文链接：https://blog.csdn.net/weixin_39540651/article/details/100582817

版权

PostgreSQL 专栏收录该内容

282 篇文章 169 订阅

订阅专栏

在实际生活中,我们可能会遇见这样的场景:在你打开淘宝买东西的时候,常常面对多如牛毛的商品,可能会先按照自己想要的条件进行筛选然后再慢慢选择,比方说选择某某品牌、价位、颜色等等.
这个时候就会遇到任意列组合查询的情况了,因为你不知道用户会按照哪些条件进行筛选,可能你会说那我在所有列都建立索引不就行了?的确这是一种方法,但是如果某张表有几百上千个列时候呢,PostgreSQL为我们提供了几种不同的方法.

gin索引（支持任意字段组合的查询）
bloom索引（支持任意只读组合的等值查询）
每个单列btree索引（支持任意字段组合的查询）

1、bloom索引
bloom索引接口是PostgreSQL基于bloom filter构造的一个索引接口(不了解bloom filter算法的可以参考这篇文章:https://blog.csdn.net/hguisu/article/details/7866173)，属于lossy索引，可以收敛结果集(排除绝对不满足条件的结果，剩余的结果里再挑选满足条件的结果)，因此需要二次check，bloom支持任意列组合的等值查询。
bloom存储的是签名，签名越大，耗费的空间越多，但是排除更加精准。有利有弊。
语法:

CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)  
       WITH (length=80, col1=2, col2=2, col3=4);
签名长度 80 bit, 最大允许4096 bits
col1 - col32，分别指定每列的bits，默认长度2，最大允许4095 bits.
bill=# create table t1(c1 int, c2 int, c3 int, c4 int, c5 int);  
CREATE TABLE
bill=# 
bill=# create index idx_bloom on t1 using bloom(c1,c2,c3,c4,c5);
CREATE INDEX
bill=# explain select * from t1 where c1 =10; 
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=14.05..20.56 rows=8 width=20)
   Recheck Cond: (c1 = 10)
   ->  Bitmap Index Scan on idx_bloom  (cost=0.00..14.05 rows=8 width=0)
         Index Cond: (c1 = 10)
(4 rows)


bill=# explain select * from t1 where c1 =10 and c2 =20;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=18.30..19.62 rows=1 width=20)
   Recheck Cond: ((c1 = 10) AND (c2 = 20))
   ->  Bitmap Index Scan on idx_bloom  (cost=0.00..18.30 rows=1 width=0)
         Index Cond: ((c1 = 10) AND (c2 = 20))
(4 rows)
bill=# explain select * from t1 where c1 =10 and c2 =20 and c3 =30;     
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=22.55..23.87 rows=1 width=20)
   Recheck Cond: ((c1 = 10) AND (c2 = 20) AND (c3 = 30))
   ->  Bitmap Index Scan on idx_bloom  (cost=0.00..22.55 rows=1 width=0)
         Index Cond: ((c1 = 10) AND (c2 = 20) AND (c3 = 30))
(4 rows)

不过bloom方法过滤的效果有限,建议观察使用

2、gin索引

bill=# create index idx_gin on t1 using gin(c1,c2,c3,c4,c5);
CREATE INDEX
bill=# explain select * from t1 where c1 =10 or c2 =20;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=5.34..15.59 rows=17 width=20)
   Recheck Cond: ((c1 = 10) OR (c2 = 20))
   ->  BitmapOr  (cost=5.34..5.34 rows=17 width=0)
         ->  Bitmap Index Scan on idx_gin  (cost=0.00..2.66 rows=8 width=0)
               Index Cond: (c1 = 10)
         ->  Bitmap Index Scan on idx_gin  (cost=0.00..2.66 rows=8 width=0)
               Index Cond: (c2 = 20)
(7 rows)

bill=# explain select * from t1 where c1 =10 or c2 =20 and c3 =30;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=6.57..14.07 rows=9 width=20)
   Recheck Cond: ((c1 = 10) OR ((c2 = 20) AND (c3 = 30)))
   ->  BitmapOr  (cost=6.57..6.57 rows=9 width=0)
         ->  Bitmap Index Scan on idx_gin  (cost=0.00..2.66 rows=8 width=0)
               Index Cond: (c1 = 10)
         ->  Bitmap Index Scan on idx_gin  (cost=0.00..3.90 rows=1 width=0)
               Index Cond: ((c2 = 20) AND (c3 = 30))
(7 rows)

GIN索引实现了内部bitmapAnd or bitmapOr，实际上等效于对每个字段建立单独的B-Tree索引,GIN的复合索引这种方法s使用,但是，当数据量非常庞大或者列非常多时，GIN索引会比较大。
同时GIN建议使用fastupdate和延迟合并的特性，加速插入、删除、更新操作。

3、multi-btree

bill=# create index idx_t11 on t1(c1);
CREATE INDEX
bill=# create index idx_t12 on t1(c2); 
CREATE INDEX
bill=# create index idx_t13 on t1(c3); 
CREATE INDEX
bill=# create index idx_t14 on t1(c4); 
CREATE INDEX
bill=# create index idx_t15 on t1(c5); 
CREATE INDEX
bill=# explain select * from t1 where  c1 = 10  and c2 =20;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=3.28..4.59 rows=1 width=20)
   Recheck Cond: ((c2 = 20) AND (c1 = 10))
   ->  BitmapAnd  (cost=3.28..3.28 rows=1 width=0)
         ->  Bitmap Index Scan on idx_t12  (cost=0.00..1.51 rows=8 width=0)
               Index Cond: (c2 = 20)
         ->  Bitmap Index Scan on idx_t11  (cost=0.00..1.51 rows=8 width=0)
               Index Cond: (c1 = 10)
(7 rows)

bill=# explain select * from t1 where  c1 = 10  and c2 =20 or c3 =30;
                                    QUERY PLAN                                    
----------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=4.79..12.29 rows=9 width=20)
   Recheck Cond: (((c2 = 20) AND (c1 = 10)) OR (c3 = 30))
   ->  BitmapOr  (cost=4.79..4.79 rows=9 width=0)
         ->  BitmapAnd  (cost=3.28..3.28 rows=1 width=0)
               ->  Bitmap Index Scan on idx_t12  (cost=0.00..1.51 rows=8 width=0)
                     Index Cond: (c2 = 20)
               ->  Bitmap Index Scan on idx_t11  (cost=0.00..1.51 rows=8 width=0)
                     Index Cond: (c1 = 10)
         ->  Bitmap Index Scan on idx_t13  (cost=0.00..1.51 rows=8 width=0)
               Index Cond: (c3 = 30)
(10 rows)

这种方法在列较少的情况下使用也比较便捷.

另外说明下,在不改变表结构的情况下可以使用上面三种方法,但在实际应用中我们也可以考虑比如将多列合并成一个text,利用PostgreSQL的全文检索功能配合gin索引去查询,这也是一种优化思路.