Postgresql杂谈 08—Postgresql中的BTree索引存储结构和查询过程分析

最新推荐文章于 2024-02-23 22:53:56 发布

HymanLiuTS

最新推荐文章于 2024-02-23 22:53:56 发布

阅读量1.6k

点赞数

分类专栏： Postgresql原理与实战文章标签： postgresql BTree索引索引优化数据库索引

本文链接：https://blog.csdn.net/hyman_c/article/details/119137908

版权

Postgresql原理与实战专栏收录该内容

24 篇文章 35 订阅

订阅专栏

一、关于BTree

本文我们深入学习下Postgresql的BTree索引。我们知道，BTree是一种多路平衡查找树，被广泛应用于数据库的索引之中。与我们之前接触过的最多的二叉树不同，多路树的意思是每个节点不止一个子节点。而查找树是一种顺序树，也就是说对于树种每一个节点来说，它的左子树上所有的节点都不大于它，而它的右子树上所有的节点都不小于它。由于这种特性，理论上B树查找的时间复杂度可以达到二分查找的时间复杂度O(logn)。BTree的查询效率和树的高度有关，为了降低树的高度，提高查询效率，BTree上的节点往往代表一个磁盘页，也就是包含了多个Key值。

如下，是一个比较典型BTree节点的查找过程（查找10，树中的每个节点代表一个磁盘页。每次访问一个新节点代表一次磁盘IO）：

Postgresq中默认创建的索引就是BTree索引，比如下面，为users表创建一个默认的BTree索引：

create index users_btree_age_inx on users(age);

等价于：

create index users_btree_age_inx on users using btree(age);

下面，我们将从使用Btree建立单列的索引和多列的复合索引两个方面来学习下BTree索引在Postgresql中的使用。

二、使用BTree建立单列索引

单例索引的结构比较简单，就是上面BTree的基本结构，每个BTree节点包含了多个单列值得Key，笔者在这里只简单介绍下使用索引前后查询效率提升有多大。

首先，先看一下进行测试得数据表users，表结构如下：

postgres=# \d+ users;
                                                       Table "public.users"
 Column |         Type          | Collation | Nullable |              Default              | Storage  | Stats target | Description 
--------+-----------------------+-----------+----------+-----------------------------------+----------+--------------+-------------
 id     | integer               |           | not null | nextval('users_id_seq'::regclass) | plain    |              | 
 name   | character varying(32) |           |          |                                   | extended |              | 
 age    | integer               |           |          |                                   | plain    |              |

users表里面包含100W条数据，首先我们在没有建立索引得情况下进行等值查询：

postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age=80;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13348.27 rows=11984 width=12) (actual time=0.300..134.897 rows=9916 loops=1)
   Output: id, name, age
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=5406
   ->  Parallel Seq Scan on public.users  (cost=0.00..11149.88 rows=4993 width=12) (actual time=0.028..110.437 rows=3305 loops=3)
         Output: id, name, age
         Filter: (users.age = 80)
         Rows Removed by Filter: 330028
         Buffers: shared hit=5406
         Worker 0: actual time=0.030..107.305 rows=2626 loops=1
           Buffers: shared hit=1441
         Worker 1: actual time=0.037..99.501 rows=3266 loops=1
           Buffers: shared hit=1774
 Planning Time: 0.099 ms
 Execution Time: 140.010 m

可以看到，整个过程花费了140ms的时间。接下来，再针对age字段进行非等值查询：

postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age>80;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.users  (cost=0.00..19191.30 rows=222734 width=12) (actual time=0.020..249.068 rows=199795 loops=1)
   Output: id, name, age
   Filter: (users.age > 80)
   Rows Removed by Filter: 800205
   Buffers: shared hit=5406
 Planning Time: 0.068 ms
 Execution Time: 366.632 ms
(7 rows)

整个过程使用了全表扫描，花费了366ms的时间。下面，笔者这对age字段加上BTree索引，加索引的语句这里不再赘述。我们只看下，加了索引后再次进行等值查询和范围查询的耗时：

postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age=80;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.users  (cost=204.64..5746.48 rows=10867 width=12) (actual time=1.739..13.640 rows=9916 loops=1)
   Output: id, name, age
   Recheck Cond: (users.age = 80)
   Heap Blocks: exact=4537
   Buffers: shared hit=4537 read=30
   ->  Bitmap Index Scan on users_btree_age_inx  (cost=0.00..201.93 rows=10867 width=0) (actual time=1.159..1.160 rows=9916 loops=1)
         Index Cond: (users.age = 80)
         Buffers: shared read=30
 Planning Time: 0.181 ms
 Execution Time: 19.536 ms
 
 postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age>80;
                                                                QUERY PLAN                                                             
    
---------------------------------------------------------------------------------------------------------------------------------------
----
 Bitmap Heap Scan on public.users  (cost=3785.67..11716.26 rows=201967 width=12) (actual time=13.971..141.120 rows=199795 loops=1)
   Output: id, name, age
   Recheck Cond: (users.age > 80)
   Heap Blocks: exact=5406
   Buffers: shared hit=5409 read=546
   ->  Bitmap Index Scan on users_btree_age_inx  (cost=0.00..3735.18 rows=201967 width=0) (actual time=13.295..13.295 rows=199795 loops
=1)
         Index Cond: (users.age > 80)
         Buffers: shared hit=3 read=546
 Planning Time: 0.113 ms
 Execution Time: 239.571 ms
(10 rows)

三、使用BTree建立多列复合索引

下面，我们通过创建多列复合索引，学习下Btree索引内部的存储结构，并验证复合索引在不同查询条件下的表现。在验证之前，我们需要安装好Postgresql的pageinspect拓展，用来协助分析BTree索引的内部结构。

create extension pageinspect ;

接下来，创建一个测试表mytab，并加上索引：

postgres=# create table mytab (id int,c1 int,c2 int);
CREATE TABLE
postgres=# create index on mytab using btree(c1,c2);
CREATE INDEX

然后，插入测试数据：

postgres=# insert into mytab values (1,1,2);
INSERT 0 1
postgres=# insert into mytab select 1,random()*10,t.d from generate_series(1,2000) as t(d);
INSERT 0 2000

首先，查看索引的第一个索引页，它会告诉我们索引一共有多少层，根页的编号是什么。

postgres=# select * from bt_metap('mytab_c1_c2_idx');
 magic  | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples 
--------+---------+------+-------+----------+-----------+-------------+-------------------------
 340322 |       3 |    3 |     1 |        3 |         1 |           0 |                      -1
(1 row)

上面的信息表明：这个索引的存储结构，除去ROOT节点有两层，root节点在第3号Page页。接下来，看下3号Page页的详细信息：

postgres=# select * from bt_page_items('mytab_c1_c2_idx',3);
 itemoffset |  ctid   | itemlen | nulls | vars |          data           
------------+---------+---------+-------+------+-------------------------
          1 | (1,0)   |       8 | f     | f    | 
          2 | (6,100) |      16 | f     | f    | 02 00 00 00 1c 01 00 00
          3 | (9,25)  |      16 | f     | f    | 03 00 00 00 d1 00 00 00
          4 | (4,83)  |      16 | f     | f    | 04 00 00 00 0b 01 00 00
          5 | (8,9)   |      16 | f     | f    | 05 00 00 00 c1 00 00 00
          6 | (5,118) |      16 | f     | f    | 06 00 00 00 e7 01 00 00
          7 | (7,70)  |      16 | f     | f    | 07 00 00 00 29 03 00 00
          8 | (2,122) |      16 | f     | f    | 09 00 00 00 79 00 00 00
(8 rows)

data表示当前索引页的最小值，最左边页没有最小值。data字段里面的数值从左到右数值数位越来越高，比如02 00 00 00 1c 01 00 00表示的实际上是：c1的最小值为00 00 00 02(即，十进制的2)，c2的最小值00 00 01 1c(即，十进制的284)。我们要找c1=1,c2=2的话，显然在编号为1的Page上。在去查看1的详细信息：

postgres=# select * from bt_page_items('mytab_c1_c2_idx',1);
 itemoffset |   ctid   | itemlen | nulls | vars |          data           
------------+----------+---------+-------+------+-------------------------
         ...
        114 | (10,94)  |      16 | f     | f    | 00 00 00 00 97 07 00 00
        115 | (10,99)  |      16 | f     | f    | 00 00 00 00 9c 07 00 00
        116 | (10,103) |      16 | f     | f    | 00 00 00 00 a0 07 00 00
        117 | (10,109) |      16 | f     | f    | 00 00 00 00 a6 07 00 00
        118 | (10,127) |      16 | f     | f    | 00 00 00 00 b8 07 00 00
        119 | (10,136) |      16 | f     | f    | 00 00 00 00 c1 07 00 00
        120 | (10,137) |      16 | f     | f    | 00 00 00 00 c2 07 00 00
        121 | (10,139) |      16 | f     | f    | 00 00 00 00 c4 07 00 00
        122 | (10,143) |      16 | f     | f    | 00 00 00 00 c8 07 00 00
        123 | (10,149) |      16 | f     | f    | 00 00 00 00 ce 07 00 00
        124 | (0,1)    |      16 | f     | f    | 01 00 00 00 02 00 00 00
        125 | (0,8)    |      16 | f     | f    | 01 00 00 00 07 00 00 00
        ...

我们在ctid为(0,1)，即page页为0，tid为1的位置找到了c1=1,c2=2。

前后都是等值链接的查询

下面，笔者来验证下两列都是等值查询的话，走索引的情况：

postgres=# explain (analyze,verbose,buffers,costs,timing) select * from mytab where c1=1 and c2=2;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using mytab_c1_c2_idx on public.mytab  (cost=0.28..8.30 rows=1 width=12) (actual time=0.016..0.018 rows=1 loops=1)
   Output: id, c1, c2
   Index Cond: ((mytab.c1 = 1) AND (mytab.c2 = 2))
   Buffers: shared hit=3
 Planning Time: 0.101 ms
 Execution Time: 0.038 ms
(6 rows)

显然对于这种情况，查询是走了索引的。

前面是等值连接，后面是查询范围

postgres=# explain (analyze,verbose,buffers,costs,timing)  select * from mytab where c1=2 and c2>10;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.mytab  (cost=6.20..20.02 rows=188 width=12) (actual time=0.049..0.224 rows=189 loops=1)
   Output: id, c1, c2
   Recheck Cond: ((mytab.c1 = 2) AND (mytab.c2 > 10))
   Heap Blocks: exact=11
   Buffers: shared hit=14
   ->  Bitmap Index Scan on mytab_c1_c2_idx  (cost=0.00..6.16 rows=188 width=0) (actual time=0.035..0.036 rows=189 loops=1)
         Index Cond: ((mytab.c1 = 2) AND (mytab.c2 > 10))
         Buffers: shared hit=3
 Planning Time: 0.122 ms
 Execution Time: 0.411 ms
(10 rows)

在这种情况下，我们看到查询是走了索引的，只不过采用了位图扫描的方式。

前面是查询范围，后面是等值查询

postgres=# explain (analyze,verbose,buffers,costs,timing)  select * from mytab where c1>2 and c2=10;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Seq Scan on public.mytab  (cost=0.00..41.02 rows=1 width=12) (actual time=0.014..0.175 rows=1 loops=1)
   Output: id, c1, c2
   Filter: ((mytab.c1 > 2) AND (mytab.c2 = 10))
   Rows Removed by Filter: 2000
   Buffers: shared hit=11
 Planning Time: 0.076 ms
 Execution Time: 0.194 ms
(7 rows)

在这种情况下，查询并没有走索引。

前后都是查询范围

postgres=# explain (analyze,verbose,buffers,costs,timing)  select * from mytab where c1>2 and c2>10;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Seq Scan on public.mytab  (cost=0.00..41.02 rows=1477 width=12) (actual time=0.014..0.992 rows=1477 loops=1)
   Output: id, c1, c2
   Filter: ((mytab.c1 > 2) AND (mytab.c2 > 10))
   Rows Removed by Filter: 524
   Buffers: shared hit=11
 Planning Time: 0.073 ms
 Execution Time: 1.800 ms
(7 rows)

这种情况下，查询直接走了全表扫描。