Postgresql杂谈 09—Postgresql中的Gist索引的深入学习

本文链接：https://blog.csdn.net/hyman_c/article/details/119202913

本文详细介绍了PostgreSQL中的Gist索引，探讨了其存储结构、在不同类型数据上的应用，如Point和inet，以及与BTree索引的对比。重点展示了Gist索引在空间类型和网络类型查询中的优势。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本文，我们进一步学习下Gist索引。Gist是Generalized Search Tree的意思，意思是通用搜索树，底层结构也是一种平衡树，它是一套索引模板，可以支持用户实现自定义的索引。相比于BTree索引，BTree索引可以建立在任意类型之上，但是BTree只支持<、=、>操作符，而Gist索引可以支持@>、&&等复杂运算的操作符。

一、Gist索引的存储结构

在《Postgresql杂谈 04—Postgresql中的四种常规索引》一文中，笔者曾经简单介绍过Gist索引的使用，并创建过一个包含两个列的索引：

stock_analysis_data=# create index mygistinx on test using gist(fund_code,record_time); 
CREATE INDEX

test表的结构如下：

stock_analysis_data=# \d+ test
                                                Table "public.test"
   Column    |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 fund_code   | character varying(256)      |           |          |         | extended |              | 
 fund_name   | character varying(256)      |           |          |         | extended |              | 
 record_time | timestamp without time zone |           |          |         | plain    |              | 
Indexes:
    "myspgistinx" spgist (fund_code)

实际上，在我们创建好gist索引之后，整个索引的结构如下：

可以看到：

（1）只有在叶子节点中保存着数据的otid，上图右边()里面表示数据的otid，逗号左边表示磁盘的页号，右边表示该页中的序号。

（2）树的上层节点具有指向叶子节点的指针，并定义了叶子节点一个Page上数据的范围（对应红色字体部分），但是不含有指向数据的指针。

（3）基于上述的原因，相比于BTree索引，Gist索引所占的空间更大（BTree索引上非叶子节点也包括指向数据的指针）

二、Postgresql中支持Gist索引的操作类

Postgresql中对一些内置类型已经实现了Gist索引的操作类，我们可以直接使用在这些类型之上使用Gist索引。支持Gist索引的的操作类：

数据类型	索引操作符
box	&& &> &< &<\| >> << <<\| <@ @> @ \|&> !>> ~ ~=
circle	&& &> &< &<\| >> << <<\| <@ @> @ \|&> !>> ~ ~=
inet,cidr	&& >> >>= > >= <> << <<= < <= =
point	>> >^ << <@ <^ ~=
polygon	&& &> &< &<\| >> << <<\| <@ @> @ \|&》 \|》》 ~ ~=
range	&& &> &< >> << <@ -\|- = <@ @>
tsquery	<@ @>
tsvector	@@

下面，笔者通过实例来说明下Gist索引的用法：

Point类型上创建Gist索引：

首先，创建一个测试表：

stock_analysis_data=# create table pts(id int ,p point);
CREATE TABLE
stock_analysis_data=# \d+ pts;
                                    Table "public.pts"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 id     | integer |           |          |         | plain   |              | 
 p      | point   |           |          |         | plain   |              |

在表里面插入待测试的数据：

stock_analysis_data=# insert into pts select t.d,point(ceil(random()*1000),ceil(random()*1000)) from generate_series(1,1000000) as t(d);
INSERT 0 1000000

在没有索引的条件下进行查询，查询语句的意思是查找所有在圆形((100,100) 100)范围内的点：

stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing)  select * from pts where circle '((100,100) 100)'  @> p;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12678.33 rows=1000 width=20) (actual time=0.323..175.264 rows=31426 loops=1)
   Output: id, p
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=6370
   ->  Parallel Seq Scan on public.pts  (cost=0.00..11578.33 rows=417 width=20) (actual time=0.030..63.779 rows=10475 loops=3)
         Output: id, p
         Filter: ('<(100,100),100>'::circle @> pts.p)
         Rows Removed by Filter: 322858
         Buffers: shared hit=6370
         Worker 0: actual time=0.041..33.690 rows=8188 loops=1
           Buffers: shared hit=1675
         Worker 1: actual time=0.032..52.241 rows=8901 loops=1
           Buffers: shared hit=1791
 Planning Time: 0.060 ms
 Execution Time: 190.921 ms
(16 rows)

在没有索引的条件下，查询总共耗时190ms。接下来，创建Gist索引：

stock_analysis_data=# create index on pts using gist(p); 
CREATE INDEX

通过Gist索引进行查询：

stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing)  select * from pts where circle '((100,100) 100)'  @> p;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.pts  (cost=44.03..2705.93 rows=1000 width=20) (actual time=9.080..36.787 rows=31426 loops=1)
   Output: id, p
   Recheck Cond: ('<(100,100),100>'::circle @> pts.p)
   Heap Blocks: exact=6331
   Buffers: shared hit=6689
   ->  Bitmap Index Scan on pts_p_idx  (cost=0.00..43.78 rows=1000 width=0) (actual time=8.222..8.223 rows=31426 loops=1)
         Index Cond: ('<(100,100),100>'::circle @> pts.p)
         Buffers: shared hit=358
 Planning Time: 0.620 ms
 Execution Time: 52.507 ms
(10 rows)

通过Explain中可以看到，加了索引之后，整个查询的效率提高了4倍。

inet类型的Gist索引

笔者再来列举一个使用inet类型Gist索引进行查询的列子，首先创建一个vector的表，表中包含inet类型的字段：

stock_analysis_data=# create table vector(id int,ip inet);
CREATE TABLE
stock_analysis_data=# \d+ vector;
                                  Table "public.vector"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 id     | integer |           |          |         | plain   |              | 
 ip     | inet    |           |          |         | main    |              |

插入测试数据：

stock_analysis_data=# insert into vector select t.d,inet(ceil(random()*255)||'.'||ceil(random()*255)||'.'||ceil(random()*255)||'.'||ceil(random()*255)) from generate_series(1,1000000) as t(d);
INSERT 0 1000000

在插入索引之前，我们先查询IP地址等于77.80.250.123的所有IP地址：

stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing) select * from vector where ip = '77.80.250.123'::inet;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..11614.43 rows=1 width=11) (actual time=0.222..108.731 rows=1 loops=1)
   Output: id, ip
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=5406
   ->  Parallel Seq Scan on public.vector  (cost=0.00..10614.33 rows=1 width=11) (actual time=50.047..85.650 rows=0 loops=3)
         Output: id, ip
         Filter: (vector.ip = '77.80.250.123'::inet)
         Rows Removed by Filter: 333333
         Buffers: shared hit=5406
         Worker 0: actual time=81.686..81.686 rows=0 loops=1
           Buffers: shared hit=1633
         Worker 1: actual time=68.437..68.438 rows=0 loops=1
           Buffers: shared hit=1312
 Planning Time: 0.057 ms
 Execution Time: 108.759 ms
(16 rows)

在插入Gist索引之前，我们先插入BTree索引，来查看查询效率如何：

stock_analysis_data=# create index vector_btree_inx on vector using btree(ip);

CREATE INDEX

进行查询，发现虽然走了BTree索引，而且查询效率提升了不少。

stock_analysis_data=# create index vector_btree_inx on vector using btree(ip);
CREATE INDEX
stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing) select * from vector where ip = '77.80.250.123'::inet;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using vector_btree_inx on public.vector  (cost=0.42..8.44 rows=1 width=11) (actual time=0.028..0.029 rows=1 loops=1)
   Output: id, ip
   Index Cond: (vector.ip = '77.80.250.123'::inet)
   Buffers: shared hit=1 read=3
 Planning Time: 0.180 ms
 Execution Time: 0.054 ms
(6 rows)

接下来，创建Gist索引并进行查询：

stock_analysis_data=# create index vector_btree_inx on vector using gist(ip inet_ops);
CREATE INDEX
stock_analysis_data=# explain (analyze,verbose,costs,buffers,timing) select * from vector where ip = '77.80.250.123'::inet;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using vector_gist_inx on public.vector  (cost=0.29..8.30 rows=1 width=11) (actual time=0.033..0.034 rows=1 loops=1)
   Output: id, ip
   Index Cond: (vector.ip = '77.80.250.123'::inet)
   Buffers: shared hit=4
 Planning Time: 0.133 ms
 Execution Time: 0.056 ms
 stock_analysis_data=#

可以看到，使用Gist索引的查询效率和使用Btree的查询效率差不多，但是需要BTree索引是不支持<<、@>这些操作符的。