PostgreSQL模糊查询-CSDN博客

本文链接：https://blog.csdn.net/weixin_39540651/article/details/100655683

说明:
数据库中常用到模糊查询基本分为三类:后模糊(abc%)、前模糊(%abc)、前后模糊(%abc%),一般数据库中都能够支持后模糊查询,但是对于前模糊和前后模糊的支持并不好,但是PostgreSQL对于模糊查询的支持可以说是相当强大,针对不同场景有不同的优化手段,在PostgreSQL中对于这三种模糊查询采用的方法大致如下:
后模糊(abc%):可以使用btree来优化
前模糊(%abc):可以利用reverse函数建立函数索引来优化
前后模糊(%abc%):常利用pg_trgm插件和gin索引来优化

前模糊和后模糊:
后模糊(abc%):
1、利用btree索引来进行优化即可,但是需要注意使用类型默认的index ops class时，仅适合于collate="C"的查询,例如:

bill=# create table t1(id int,info text);
CREATE TABLE
bill=# insert into t1 select generate_series(1,100000),hashtext(random()::text);
INSERT 0 100000

bill=# create index idx_t1 on t1 using btree(info collate "C"); 
CREATE INDEX
bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where info like 'abc%' collate "C";      
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_t1 on public.t1  (cost=0.42..3.04 rows=10 width=14) (actual time=0.005..0.005 rows=0 loops=1)
   Output: id, info
   Index Cond: ((t1.info >= 'abc'::text) AND (t1.info < 'abd'::text))
   Filter: (t1.info ~~ 'abc%'::text COLLATE "C")
   Buffers: shared hit=3
 Planning Time: 0.414 ms
 Execution Time: 0.024 ms
(7 rows)

2、当collate<>"C"时,使用对应类型的pattern ops来让btree索引支持模糊查询

bill=# create index idx_t1 on t1 using btree(info text_pattern_ops);
CREATE INDEX
bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where info like 'abc%';
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_t1 on public.t1  (cost=0.42..3.04 rows=10 width=14) (actual time=0.027..0.027 rows=0 loops=1)
   Output: id, info
   Index Cond: ((t1.info ~>=~ 'abc'::text) AND (t1.info ~<~ 'abd'::text))
   Filter: (t1.info ~~ 'abc%'::text)
   Buffers: shared read=3
 Planning Time: 0.335 ms
 Execution Time: 0.054 ms
(7 rows)

bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where info like 'abc%' collate "C";      
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_t1 on public.t1  (cost=0.42..3.04 rows=10 width=14) (actual time=0.027..0.027 rows=0 loops=1)
   Output: id, info
   Index Cond: ((t1.info ~>=~ 'abc'::text) AND (t1.info ~<~ 'abd'::text))
   Filter: (t1.info ~~ 'abc%'::text COLLATE "C")
   Buffers: shared hit=3
 Planning Time: 0.133 ms
 Execution Time: 0.056 ms
(7 rows)

同时还能支持规则表达式的写法，如下：

bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where info ~ '^abc';  
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_t1 on public.t1  (cost=0.42..3.04 rows=10 width=14) (actual time=0.009..0.009 rows=0 loops=1)
   Output: id, info
   Index Cond: ((t1.info ~>=~ 'abc'::text) AND (t1.info ~<~ 'abd'::text))
   Filter: (t1.info ~ '^abc'::text)
   Buffers: shared hit=3
 Planning Time: 0.210 ms
 Execution Time: 0.026 ms
(7 rows)

前模糊(%abc)
使用反转函数(reverse)索引，可以支持前模糊的查询。
1、同样,使用类型默认的index ops class时，仅适合于collate="C"的查询,例如:

bill=# create index idx_t1 on t1 using btree(reverse(info) collate "C");
CREATE INDEX
bill=# select * from t1 limit 1;
 id |    info     
----+-------------
  1 | -2100249117
(1 row)

bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where reverse(info) like '117%' collate "C";
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.t1  (cost=8.14..378.67 rows=500 width=14) (actual time=0.142..0.287 rows=101 loops=1)
   Output: id, info
   Filter: (reverse(t1.info) ~~ '117%'::text COLLATE "C")
   Heap Blocks: exact=94
   Buffers: shared hit=94 read=3
   ->  Bitmap Index Scan on idx_t1  (cost=0.00..8.02 rows=500 width=0) (actual time=0.118..0.118 rows=101 loops=1)
         Index Cond: ((reverse(t1.info) >= '117'::text) AND (reverse(t1.info) < '118'::text))
         Buffers: shared read=3
 Planning Time: 0.117 ms
 Execution Time: 0.324 ms
(10 rows)

2、当collate<>"C"时,使用对应类型的pattern ops来让btree索引支持模糊查询,并且也能支持规则表达式的写法

bill=# create index idx_t1 on t1 using btree(reverse(info) text_pattern_ops);
CREATE INDEX
bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where reverse(info) like '117%'; 
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.t1  (cost=8.14..378.67 rows=500 width=14) (actual time=0.084..0.212 rows=101 loops=1)
   Output: id, info
   Filter: (reverse(t1.info) ~~ '117%'::text)
   Heap Blocks: exact=94
   Buffers: shared hit=94 read=3
   ->  Bitmap Index Scan on idx_t1  (cost=0.00..8.02 rows=500 width=0) (actual time=0.062..0.063 rows=101 loops=1)
         Index Cond: ((reverse(t1.info) ~>=~ '117'::text) AND (reverse(t1.info) ~<~ '118'::text))
         Buffers: shared read=3
 Planning Time: 0.096 ms
 Execution Time: 0.242 ms
(10 rows)

bill=# explain (analyze,verbose,timing,costs,buffers) select * from t1 where reverse(info) ~ '^117';   
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.t1  (cost=8.14..378.67 rows=500 width=14) (actual time=0.060..0.283 rows=101 loops=1)
   Output: id, info
   Filter: (reverse(t1.info) ~ '^117'::text)
   Heap Blocks: exact=94
   Buffers: shared hit=97
   ->  Bitmap Index Scan on idx_t1  (cost=0.00..8.02 rows=500 width=0) (actual time=0.037..0.037 rows=101 loops=1)
         Index Cond: ((reverse(t1.info) ~>=~ '117'::text) AND (reverse(t1.info) ~<~ '118'::text))
         Buffers: shared hit=3
 Planning Time: 0.129 ms
 Execution Time: 0.320 ms
(10 rows)

前、后模糊的合体:
使用pg_trgm索引，可以支持前、后模糊的查询。
如果想要支持中文模糊查询,需要注意数据库的lc_ctype不能为"C",并且索引、查询条件的collate必须一致才能使用索引,例如:

bill=# \l+ bill
                                             List of databases
 Name |  Owner   | Encoding | Collate |    Ctype    | Access privileges | Size  | Tablespace | Description 
------+----------+----------+---------+-------------+-------------------+-------+------------+-------------
 bill | postgres | UTF8     | C       | zh_CN.UTF-8 |                   | 95 MB | pg_default | 
(1 row)

构造环境:

bill=# create table test001(c1 text);  
CREATE TABLE

bill=# create or replace function gen_hanzi(int) returns text as $$          
bill$# declare        
bill$#   res text;        
bill$# begin        
bill$#   if $1 >=1 then        
bill$#     select string_agg(chr(19968+(random()*20901)::int), '') into res from generate_series(1,$1);        
bill$#     return res;        
bill$#   end if;        
bill$#   return null;        
bill$# end;        
bill$# $$ language plpgsql strict;  
CREATE FUNCTION
bill=# insert into test001 select gen_hanzi(20) from generate_series(1,100000);      
INSERT 0 100000

bill=# create index idx_test001_1 on test001 using gin (c1 gin_trgm_ops);  
CREATE INDEX
bill=# select * from test001 limit 5;   
                    c1                    
------------------------------------------
 頺縈鈍鮆嘻頔傉澩裂驁沧蜏鉩靣鞪僗怅鏒翌豇
 傸鵾鰔骝髑熯偣孊弰瞅油禩轷贶墯佁貫阖翜蹽
 戦綷筿降峝蹮險田螻攠當剞嘌蛓獿鮯諨嬰虡纙
 錮虒弧蜖比蒞窐砫宥軡泜箏矘犍秥冴僕礫四逇
 跐藉圹縲稡霃黇兯蒟穭曣轋虫徼嗣嶹矧鳨冥肺
(5 rows)

模糊查询:

bill=# explain (analyze,verbose,timing,costs,buffers) select * from test001 where c1 like '你%';  
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test001  (cost=10.38..475.76 rows=500 width=32) (actual time=0.041..0.052 rows=5 loops=1)
   Output: c1
   Recheck Cond: (test001.c1 ~~ '你%'::text)
   Heap Blocks: exact=5
   Buffers: shared hit=9
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..10.25 rows=500 width=0) (actual time=0.031..0.032 rows=5 loops=1)
         Index Cond: (test001.c1 ~~ '你%'::text)
         Buffers: shared hit=4
 Planning Time: 0.098 ms
 Execution Time: 0.076 ms
(10 rows)

bill=# explain (analyze,verbose,timing,costs,buffers) select * from test001 where c1 like '%中国';       
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test001  (cost=6.58..19.42 rows=10 width=61) (actual time=0.023..0.023 rows=0 loops=1)
   Output: c1
   Recheck Cond: (test001.c1 ~~ '%中国'::text)
   Rows Removed by Index Recheck: 1
   Heap Blocks: exact=1
   Buffers: shared hit=5
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..6.58 rows=10 width=0) (actual time=0.015..0.015 rows=1 loops=1)
         Index Cond: (test001.c1 ~~ '%中国'::text)
         Buffers: shared hit=4
 Planning Time: 0.248 ms
 Execution Time: 0.045 ms
(11 rows)

前后模糊(%abc%):
使用pg_trgm插件，支持前后模糊的查询。
同样,需要注意数据库的lc_ctype不能为"C",并且索引、查询条件的collate必须一致才能使用索引
建议输入3个或3个以上字符，否则效果不佳

bill=# explain (analyze,verbose,timing,costs,buffers) select * from test001 where c1 like '%中国%';                   
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Seq Scan on public.test001  (cost=0.00..2387.00 rows=10 width=61) (actual time=21.432..21.432 rows=0 loops=1)
   Output: c1
   Filter: (test001.c1 ~~ '%中国%'::text)
   Rows Removed by Filter: 100000
   Buffers: shared hit=1137
 Planning Time: 0.097 ms
 Execution Time: 21.451 ms
(7 rows)

bill=# explain (analyze,verbose,timing,costs,buffers) select * from test001 where c1 like '%我是程序员%';           
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test001  (cost=16.98..29.82 rows=10 width=61) (actual time=0.041..0.042 rows=0 loops=1)
   Output: c1
   Recheck Cond: (test001.c1 ~~ '%我是程序员%'::text)
   Buffers: shared hit=10
   ->  Bitmap Index Scan on idx_test001_1  (cost=0.00..16.98 rows=10 width=0) (actual time=0.039..0.039 rows=0 loops=1)
         Index Cond: (test001.c1 ~~ '%我是程序员%'::text)
         Buffers: shared hit=10
 Planning Time: 0.112 ms
 Execution Time: 0.066 ms
(9 rows)

可以看到只有2个字符性能有明显下降

总结:

对于只有后模糊(abc%)查询需求，使用collate "C"的b-tree索引；当collate不为"C"时，可以使用类型对应的pattern ops(例如text_pattern_ops)建立b-tree索引。
对于只有前模糊(%abc)的查询需求，使用collate "C"的reverse()表达式的b-tree索引；当collate不为"C"时，可以使用类型对应的pattern ops(例如text_pattern_ops)建立b-tree索引。
对于前后均模糊(%abc%)查询需求，并且包含中文，请使用lc_ctype <> "C"的数据库，同时使用pg_trgm插件的gin索引。