一、关于BTree
本文我们深入学习下Postgresql的BTree索引。我们知道,BTree是一种多路平衡查找树,被广泛应用于数据库的索引之中。与我们之前接触过的最多的二叉树不同,多路树的意思是每个节点不止一个子节点。而查找树是一种顺序树,也就是说对于树种每一个节点来说,它的左子树上所有的节点都不大于它,而它的右子树上所有的节点都不小于它。由于这种特性,理论上B树查找的时间复杂度可以达到二分查找的时间复杂度O(logn)。BTree的查询效率和树的高度有关,为了降低树的高度,提高查询效率,BTree上的节点往往代表一个磁盘页,也就是包含了多个Key值。
如下,是一个比较典型BTree节点的查找过程(查找10,树中的每个节点代表一个磁盘页。每次访问一个新节点代表一次磁盘IO):
Postgresq中默认创建的索引就是BTree索引,比如下面,为users表创建一个默认的BTree索引:
create index users_btree_age_inx on users(age);
等价于:
create index users_btree_age_inx on users using btree(age);
下面,我们将从使用Btree建立单列的索引和多列的复合索引两个方面来学习下BTree索引在Postgresql中的使用。
二、使用BTree建立单列索引
单例索引的结构比较简单,就是上面BTree的基本结构,每个BTree节点包含了多个单列值得Key,笔者在这里只简单介绍下使用索引前后查询效率提升有多大。
首先,先看一下进行测试得数据表users,表结构如下:
postgres=# \d+ users;
Table "public.users"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+-----------------------+-----------+----------+-----------------------------------+----------+--------------+-------------
id | integer | | not null | nextval('users_id_seq'::regclass) | plain | |
name | character varying(32) | | | | extended | |
age | integer | | | | plain | |
users表里面包含100W条数据,首先我们在没有建立索引得情况下进行等值查询:
postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age=80;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..13348.27 rows=11984 width=12) (actual time=0.300..134.897 rows=9916 loops=1)
Output: id, name, age
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5406
-> Parallel Seq Scan on public.users (cost=0.00..11149.88 rows=4993 width=12) (actual time=0.028..110.437 rows=3305 loops=3)
Output: id, name, age
Filter: (users.age = 80)
Rows Removed by Filter: 330028
Buffers: shared hit=5406
Worker 0: actual time=0.030..107.305 rows=2626 loops=1
Buffers: shared hit=1441
Worker 1: actual time=0.037..99.501 rows=3266 loops=1
Buffers: shared hit=1774
Planning Time: 0.099 ms
Execution Time: 140.010 m
可以看到,整个过程花费了140ms的时间。接下来,再针对age字段进行非等值查询:
postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age>80;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on public.users (cost=0.00..19191.30 rows=222734 width=12) (actual time=0.020..249.068 rows=199795 loops=1)
Output: id, name, age
Filter: (users.age > 80)
Rows Removed by Filter: 800205
Buffers: shared hit=5406
Planning Time: 0.068 ms
Execution Time: 366.632 ms
(7 rows)
整个过程使用了全表扫描,花费了366ms的时间。下面,笔者这对age字段加上BTree索引,加索引的语句这里不再赘述。我们只看下,加了索引后再次进行等值查询和范围查询的耗时:
postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age=80;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.users (cost=204.64..5746.48 rows=10867 width=12) (actual time=1.739..13.640 rows=9916 loops=1)
Output: id, name, age
Recheck Cond: (users.age = 80)
Heap Blocks: exact=4537
Buffers: shared hit=4537 read=30
-> Bitmap Index Scan on users_btree_age_inx (cost=0.00..201.93 rows=10867 width=0) (actual time=1.159..1.160 rows=9916 loops=1)
Index Cond: (users.age = 80)
Buffers: shared read=30
Planning Time: 0.181 ms
Execution Time: 19.536 ms
postgres=# explain (analyze,verbose,timing,buffers,costs) select * from users where age>80;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
----
Bitmap Heap Scan on public.users (cost=3785.67..11716.26 rows=201967 width=12) (actual time=13.971..141.120 rows=199795 loops=1)
Output: id, name, age
Recheck Cond: (users.age > 80)
Heap Blocks: exact=5406
Buffers: shared hit=5409 read=546
-> Bitmap Index Scan on users_btree_age_inx (cost=0.00..3735.18 rows=201967 width=0) (actual time=13.295..13.295 rows=199795 loops
=1)
Index Cond: (users.age > 80)
Buffers: shared hit=3 read=546
Planning Time: 0.113 ms
Execution Time: 239.571 ms
(10 rows)
三、使用BTree建立多列复合索引
下面,我们通过创建多列复合索引,学习下Btree索引内部的存储结构,并验证复合索引在不同查询条件下的表现。在验证之前,我们需要安装好Postgresql的pageinspect拓展,用来协助分析BTree索引的内部结构。
create extension pageinspect ;
接下来,创建一个测试表mytab,并加上索引:
postgres=# create table mytab (id int,c1 int,c2 int);
CREATE TABLE
postgres=# create index on mytab using btree(c1,c2);
CREATE INDEX
然后,插入测试数据:
postgres=# insert into mytab values (1,1,2);
INSERT 0 1
postgres=# insert into mytab select 1,random()*10,t.d from generate_series(1,2000) as t(d);
INSERT 0 2000
首先,查看索引的第一个索引页,它会告诉我们索引一共有多少层,根页的编号是什么。
postgres=# select * from bt_metap('mytab_c1_c2_idx');
magic | version | root | level | fastroot | fastlevel | oldest_xact | last_cleanup_num_tuples
--------+---------+------+-------+----------+-----------+-------------+-------------------------
340322 | 3 | 3 | 1 | 3 | 1 | 0 | -1
(1 row)
上面的信息表明:这个索引的存储结构,除去ROOT节点有两层,root节点在第3号Page页。接下来,看下3号Page页的详细信息:
postgres=# select * from bt_page_items('mytab_c1_c2_idx',3);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------------------
1 | (1,0) | 8 | f | f |
2 | (6,100) | 16 | f | f | 02 00 00 00 1c 01 00 00
3 | (9,25) | 16 | f | f | 03 00 00 00 d1 00 00 00
4 | (4,83) | 16 | f | f | 04 00 00 00 0b 01 00 00
5 | (8,9) | 16 | f | f | 05 00 00 00 c1 00 00 00
6 | (5,118) | 16 | f | f | 06 00 00 00 e7 01 00 00
7 | (7,70) | 16 | f | f | 07 00 00 00 29 03 00 00
8 | (2,122) | 16 | f | f | 09 00 00 00 79 00 00 00
(8 rows)
data表示当前索引页的最小值,最左边页没有最小值。data字段里面的数值从左到右数值数位越来越高,比如02 00 00 00 1c 01 00 00表示的实际上是:c1的最小值为00 00 00 02(即,十进制的2),c2的最小值00 00 01 1c(即,十进制的284)。我们要找c1=1,c2=2的话,显然在编号为1的Page上。在去查看1的详细信息:
postgres=# select * from bt_page_items('mytab_c1_c2_idx',1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+----------+---------+-------+------+-------------------------
...
114 | (10,94) | 16 | f | f | 00 00 00 00 97 07 00 00
115 | (10,99) | 16 | f | f | 00 00 00 00 9c 07 00 00
116 | (10,103) | 16 | f | f | 00 00 00 00 a0 07 00 00
117 | (10,109) | 16 | f | f | 00 00 00 00 a6 07 00 00
118 | (10,127) | 16 | f | f | 00 00 00 00 b8 07 00 00
119 | (10,136) | 16 | f | f | 00 00 00 00 c1 07 00 00
120 | (10,137) | 16 | f | f | 00 00 00 00 c2 07 00 00
121 | (10,139) | 16 | f | f | 00 00 00 00 c4 07 00 00
122 | (10,143) | 16 | f | f | 00 00 00 00 c8 07 00 00
123 | (10,149) | 16 | f | f | 00 00 00 00 ce 07 00 00
124 | (0,1) | 16 | f | f | 01 00 00 00 02 00 00 00
125 | (0,8) | 16 | f | f | 01 00 00 00 07 00 00 00
...
我们在ctid为(0,1),即page页为0,tid为1的位置找到了c1=1,c2=2。
-
前后都是等值链接的查询
下面,笔者来验证下两列都是等值查询的话,走索引的情况:
postgres=# explain (analyze,verbose,buffers,costs,timing) select * from mytab where c1=1 and c2=2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using mytab_c1_c2_idx on public.mytab (cost=0.28..8.30 rows=1 width=12) (actual time=0.016..0.018 rows=1 loops=1)
Output: id, c1, c2
Index Cond: ((mytab.c1 = 1) AND (mytab.c2 = 2))
Buffers: shared hit=3
Planning Time: 0.101 ms
Execution Time: 0.038 ms
(6 rows)
显然对于这种情况,查询是走了索引的。
-
前面是等值连接,后面是查询范围
postgres=# explain (analyze,verbose,buffers,costs,timing) select * from mytab where c1=2 and c2>10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.mytab (cost=6.20..20.02 rows=188 width=12) (actual time=0.049..0.224 rows=189 loops=1)
Output: id, c1, c2
Recheck Cond: ((mytab.c1 = 2) AND (mytab.c2 > 10))
Heap Blocks: exact=11
Buffers: shared hit=14
-> Bitmap Index Scan on mytab_c1_c2_idx (cost=0.00..6.16 rows=188 width=0) (actual time=0.035..0.036 rows=189 loops=1)
Index Cond: ((mytab.c1 = 2) AND (mytab.c2 > 10))
Buffers: shared hit=3
Planning Time: 0.122 ms
Execution Time: 0.411 ms
(10 rows)
在这种情况下,我们看到查询是走了索引的,只不过采用了位图扫描的方式。
-
前面是查询范围,后面是等值查询
postgres=# explain (analyze,verbose,buffers,costs,timing) select * from mytab where c1>2 and c2=10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on public.mytab (cost=0.00..41.02 rows=1 width=12) (actual time=0.014..0.175 rows=1 loops=1)
Output: id, c1, c2
Filter: ((mytab.c1 > 2) AND (mytab.c2 = 10))
Rows Removed by Filter: 2000
Buffers: shared hit=11
Planning Time: 0.076 ms
Execution Time: 0.194 ms
(7 rows)
在这种情况下,查询并没有走索引。
-
前后都是查询范围
postgres=# explain (analyze,verbose,buffers,costs,timing) select * from mytab where c1>2 and c2>10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Seq Scan on public.mytab (cost=0.00..41.02 rows=1477 width=12) (actual time=0.014..0.992 rows=1477 loops=1)
Output: id, c1, c2
Filter: ((mytab.c1 > 2) AND (mytab.c2 > 10))
Rows Removed by Filter: 524
Buffers: shared hit=11
Planning Time: 0.073 ms
Execution Time: 1.800 ms
(7 rows)
这种情况下,查询直接走了全表扫描。
关于BTree下索引扫描的思考
从上面的例子可以看到,BTree索引的存储规律如下:以索引(A,B)为例,首先整个索引保持A有序,在A相同的情况下保持B有序。而结合这一存储规律,实际上可以按照以下的原则判断:
按照索引顺序从左到右分析查询条件,如果字段在where条件中是简单查询(等值查询)则会走索引,但是一旦字段的查询条件是复杂查询(范围查询、离散查询)则索引中断。
有了上述结论,其实我们后面大概的一个索引优化的思路就可以总结为:
尽量将简单查询的字段放到复合索引的最前面。