Postgresql杂谈 08—Postgresql中的BTree索引存储结构和查询过程分析

一、关于BTree

       本文我们深入学习下Postgresql的BTree索引。我们知道,BTree是一种多路平衡查找树,被广泛应用于数据库的索引之中。与我们之前接触过的最多的二叉树不同,多路树的意思是每个节点不止一个子节点。而查找树是一种顺序树,也就是说对于树种每一个节点来说,它的左子树上所有的节点都不大于它,而它的右子树上所有的节点都不小于它。由于这种特性,理论上B树查找的时间复杂度可以达到二分查找的时间复杂度O(logn)。BTree的查询效率和树的高度有关,为了降低树的高度,提高查询效率,BTree上的节点往往代表一个磁盘页,也就是包含了多个Key值。

       如下,是一个比较典型BTree节点的查找过程(查找10,树中的每个节点代表一个磁盘页。每次访问一个新节点代表一次磁盘IO):

 

       Postgresq中默认创建的索引就是BTree索引,比如下面,为users表创建一个默认的BTree索引:

create index users_btree_age_inx on users(age);

       等价于:

create index users_btree_age_inx on users using btree(age);

       下面,我们将从使用Btree建立单列的索引和多列的复合索引两个方面来学习下BTree索引在Postgresql中的使用。

二、使用BTree建立单列索引

       单例索引的结构比较简单,就是上面BTree的基本结构,每个BTree节点包含了多个单列值得Key,笔者在这里只简单介绍下使用索引前后查询效率提升有多大。

       首先,先看一下进行测试得数据表users,表结构如下:

postgres=# \d+ users;
                                                       Table "public.users"
 Column |         Type          | Collation | Nullable |              Default              | Storage  | Stats target | Description 
--------+-----------------------+-----------+----------+-----------------------------------+----------+--------------+-------------
 id     | integer               |           | not null | nextval('users_id_seq'::regclass) | plain    |              | 
 name   | character varying(32) |           |          |                                   | extended |              | 
 age    | integer               |           |          |                                   | plain    |              | 

       users表里面包含100W条数据,首先我们在没有建立索引得情况下进行等值查询:

postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age=80;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..13348.27 rows=11984 width=12) (actual time=0.300..134.897 rows=9916 loops=1)
   Output: id, name, age
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=5406
   ->  Parallel Seq Scan on public.users  (cost=0.00..11149.88 rows=4993 width=12) (actual time=0.028..110.437 rows=3305 loops=3)
         Output: id, name, age
         Filter: (users.age = 80)
         Rows Removed by Filter: 330028
         Buffers: shared hit=5406
         Worker 0: actual time=0.030..107.305 rows=2626 loops=1
           Buffers: shared hit=1441
         Worker 1: actual time=0.037..99.501 rows=3266 loops=1
           Buffers: shared hit=1774
 Planning Time: 0.099 ms
 Execution Time: 140.010 m

       可以看到,整个过程花费了140ms的时间。接下来,再针对age字段进行非等值查询:

postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age>80;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.users  (cost=0.00..19191.30 rows=222734 width=12) (actual time=0.020..249.068 rows=199795 loops=1)
   Output: id, name, age
   Filter: (users.age > 80)
   Rows Removed by Filter: 800205
   Buffers: shared hit=5406
 Planning Time: 0.068 ms
 Execution Time: 366.632 ms
(7 rows)

       整个过程使用了全表扫描,花费了366ms的时间。下面,笔者这对age字段加上BTree索引,加索引的语句这里不再赘述。我们只看下,加了索引后再次进行等值查询和范围查询的耗时:

postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age=80;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.users  (cost=204.64..5746.48 rows=10867 width=12) (actual time=1.739..13.640 rows=9916 loops=1)
   Output: id, name, age
   Recheck Cond: (users.age = 80)
   Heap Blocks: exact=4537
   Buffers: shared hit=4537 read=30
   ->  Bitmap Index Scan on users_btree_age_inx  (cost=0.00..201.93 rows=10867 width=0) (actual time=1.159..1.160 rows=9916 loops=1)
         Index Cond: (users.age = 80)
         Buffers: shared read=30
 Planning Time: 0.181 ms
 Execution Time: 19.536 ms
 
 postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age>80;
                                                                QUERY PLAN                                                             
    
---------------------------------------------------------------------------------------------------------------------------------------
----
 Bitmap Heap Scan on public.users  (cost=3785.67..11716.26 rows=201967 width=12) (actual time=13.971..141.120 rows=199795 loops=1)
   Output: id, name, age
   Recheck Cond: (users.age > 80)
   Heap Blocks: exact=5406
   Buffers: shared hit=5409 read=546
   ->  Bitmap Index Scan on users_btree_age_inx  (cost=0.00..3735.18 rows=201967 width=0) (actual time=13.295..13.295 rows=199795 loops
=1)
         Index Cond: (users.age > 80)
         Buffers: shared hit=3 read=546
 Planning Time: 0.113 ms
 Execution Time: 239.571 ms
(10 rows)

三、使用BTree建立多列复合索引

       下面,我们通过创建多列复合索引,学习下Btree索引内部的存储结构,并验证复合索引在不同查询条件下的表现。在验证之前,我们需要安装好Postgresql的pageinspect拓展,用来协助分析BTree索引的内部结构。

create extension pageinspect ;

       接下来,创建一个测试表mytab,并加上索引:

postgres=# create table mytab (id int,c1 int,c2 int);
CREATE TABLE
postgres=# create index on mytab using btree(c1,c2);
CREATE INDEX

       然后,插入测试数据:

postgres=# insert into mytab values (1,1,2);
INSERT 0 1
postgres=# insert into mytab select 1,random()*10,t.d from generate_series(1,2000) as t(d);
INSERT 0 2000

       首先,查看索引的第一个索引页,它会告诉我们索引一共有多少层,根页的编号是什么。

postgres=# select * from bt_metap('mytab_c1_c2_idx');
 magic  | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples 
--------+---------+------+-------+----------+-----------+-------------+-------------------------
 340322 |       3 |    3 |     1 |        3 |         1 |           0 |                      -1
(1 row)

       上面的信息表明:这个索引的存储结构,除去ROOT节点有两层,root节点在第3号Page页。接下来,看下3号Page页的详细信息:

postgres=# select * from bt_page_items('mytab_c1_c2_idx',3);
 itemoffset |  ctid   | itemlen | nulls | vars |          data           
------------+---------+---------+-------+------+-------------------------
          1 | (1,0)   |       8 | f     | f    | 
          2 | (6,100) |      16 | f     | f    | 02 00 00 00 1c 01 00 00
          3 | (9,25)  |      16 | f     | f    | 03 00 00 00 d1 00 00 00
          4 | (4,83)  |      16 | f     | f    | 04 00 00 00 0b 01 00 00
          5 | (8,9)   |      16 | f     | f    | 05 00 00 00 c1 00 00 00
          6 | (5,118) |      16 | f     | f    | 06 00 00 00 e7 01 00 00
          7 | (7,70)  |      16 | f     | f    | 07 00 00 00 29 03 00 00
          8 | (2,122) |      16 | f     | f    | 09 00 00 00 79 00 00 00
(8 rows)

       data表示当前索引页的最小值,最左边页没有最小值。data字段里面的数值从左到右数值数位越来越高,比如02 00 00 00 1c 01 00 00表示的实际上是:c1的最小值为00 00 00 02(即,十进制的2),c2的最小值00 00 01 1c(即,十进制的284)。我们要找c1=1,c2=2的话,显然在编号为1的Page上。在去查看1的详细信息:

postgres=# select * from bt_page_items('mytab_c1_c2_idx',1);
 itemoffset |   ctid   | itemlen | nulls | vars |          data           
------------+----------+---------+-------+------+-------------------------
         ...
        114 | (10,94)  |      16 | f     | f    | 00 00 00 00 97 07 00 00
        115 | (10,99)  |      16 | f     | f    | 00 00 00 00 9c 07 00 00
        116 | (10,103) |      16 | f     | f    | 00 00 00 00 a0 07 00 00
        117 | (10,109) |      16 | f     | f    | 00 00 00 00 a6 07 00 00
        118 | (10,127) |      16 | f     | f    | 00 00 00 00 b8 07 00 00
        119 | (10,136) |      16 | f     | f    | 00 00 00 00 c1 07 00 00
        120 | (10,137) |      16 | f     | f    | 00 00 00 00 c2 07 00 00
        121 | (10,139) |      16 | f     | f    | 00 00 00 00 c4 07 00 00
        122 | (10,143) |      16 | f     | f    | 00 00 00 00 c8 07 00 00
        123 | (10,149) |      16 | f     | f    | 00 00 00 00 ce 07 00 00
        124 | (0,1)    |      16 | f     | f    | 01 00 00 00 02 00 00 00
        125 | (0,8)    |      16 | f     | f    | 01 00 00 00 07 00 00 00
        ...

       我们在ctid为(0,1),即page页为0,tid为1的位置找到了c1=1,c2=2。

  • 前后都是等值链接的查询

       下面,笔者来验证下两列都是等值查询的话,走索引的情况:

postgres=# explain (analyze,verbose,buffers,costs,timing) select * from mytab where c1=1 and c2=2;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using mytab_c1_c2_idx on public.mytab  (cost=0.28..8.30 rows=1 width=12) (actual time=0.016..0.018 rows=1 loops=1)
   Output: id, c1, c2
   Index Cond: ((mytab.c1 = 1) AND (mytab.c2 = 2))
   Buffers: shared hit=3
 Planning Time: 0.101 ms
 Execution Time: 0.038 ms
(6 rows)

       显然对于这种情况,查询是走了索引的。

  • 前面是等值连接,后面是查询范围

postgres=# explain (analyze,verbose,buffers,costs,timing)  select * from mytab where c1=2 and c2>10;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.mytab  (cost=6.20..20.02 rows=188 width=12) (actual time=0.049..0.224 rows=189 loops=1)
   Output: id, c1, c2
   Recheck Cond: ((mytab.c1 = 2) AND (mytab.c2 > 10))
   Heap Blocks: exact=11
   Buffers: shared hit=14
   ->  Bitmap Index Scan on mytab_c1_c2_idx  (cost=0.00..6.16 rows=188 width=0) (actual time=0.035..0.036 rows=189 loops=1)
         Index Cond: ((mytab.c1 = 2) AND (mytab.c2 > 10))
         Buffers: shared hit=3
 Planning Time: 0.122 ms
 Execution Time: 0.411 ms
(10 rows)

       在这种情况下,我们看到查询是走了索引的,只不过采用了位图扫描的方式。

  • 前面是查询范围,后面是等值查询

postgres=# explain (analyze,verbose,buffers,costs,timing)  select * from mytab where c1>2 and c2=10;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Seq Scan on public.mytab  (cost=0.00..41.02 rows=1 width=12) (actual time=0.014..0.175 rows=1 loops=1)
   Output: id, c1, c2
   Filter: ((mytab.c1 > 2) AND (mytab.c2 = 10))
   Rows Removed by Filter: 2000
   Buffers: shared hit=11
 Planning Time: 0.076 ms
 Execution Time: 0.194 ms
(7 rows)

       在这种情况下,查询并没有走索引。

  • 前后都是查询范围

postgres=# explain (analyze,verbose,buffers,costs,timing)  select * from mytab where c1>2 and c2>10;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Seq Scan on public.mytab  (cost=0.00..41.02 rows=1477 width=12) (actual time=0.014..0.992 rows=1477 loops=1)
   Output: id, c1, c2
   Filter: ((mytab.c1 > 2) AND (mytab.c2 > 10))
   Rows Removed by Filter: 524
   Buffers: shared hit=11
 Planning Time: 0.073 ms
 Execution Time: 1.800 ms
(7 rows)

       这种情况下,查询直接走了全表扫描。

关于BTree下索引扫描的思考

       从上面的例子可以看到,BTree索引的存储规律如下:以索引(A,B)为例,首先整个索引保持A有序,在A相同的情况下保持B有序。而结合这一存储规律,实际上可以按照以下的原则判断:

按照索引顺序从左到右分析查询条件,如果字段在where条件中是简单查询(等值查询)则会走索引,但是一旦字段的查询条件是复杂查询(范围查询、离散查询)则索引中断。

       有了上述结论,其实我们后面大概的一个索引优化的思路就可以总结为:

尽量将简单查询的字段放到复合索引的最前面。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值