pg中一条查询语句返回多列记录时,如果索引包含全部的返回列则不用回表获取数据,例如select c1,c2,c3 from t1 where c1=10这样一条查询如果索引包含c1,c2,c3三列则会用到index only scan。除此之外pg中还有一种方法可以实现类似的效果,即include子句。
index include即在索引中,放入额外属性内容,搜索时不需要回表,例如:
create index idx_t1_1 on t1 (id) include(c1,c2,c3,info,crt_time);
create index idx_t2_1 on t2 (id,c1,c2,c3,info,crt_time);
上面第二个创建索引的语句是我们比较常见的多列索引,第一种则是include index,两者有什么区别呢?
索引1,KEY是ID,在叶子节点中,存入KEY与(c1,c2,c3,info,crt_time)的内容。
索引2,KEY是(id,c1,c2,c3,info,crt_time),在所有节点中,存储的都是所有字段的值,比索引1要重,包括空间,索引维护,更新等。
下面测试下三种索引:普通索引、多列索引、include index三者的读写性能。
–创建实验环境
bill=# create table t1 (id int, c1 int, c2 int, c3 int, info text, crt_time timestamp);
CREATE TABLE
bill=# create table t2(like t1);
CREATE TABLE
bill=# create table t3(like t1);
CREATE TABLE
bill=# create index idx_t1_1 on t1 (id) include(c1,c2,c3,info,crt_time);
CREATE INDEX
bill=# create index idx_t2_1 on t2 (id,c1,c2,c3,info,crt_time);
CREATE INDEX
bill=# create index idx_t3_1 on t3(id);
CREATE INDEX
–写入性能对比
1、include index
bill=# insert into t1 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000);
INSERT 0 10000000
Time: 47103.110 ms (00:47.103)
2、多列索引
bill=# insert into t2 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000);
INSERT 0 10000000
Time: 62311.384 ms (01:02.311)
3、普通索引
bill=# insert into t3 select (1000*random())::int,1,1,1,'test',now() from generate_series(1,10000000);
INSERT 0 10000000
Time: 42390.336 ms (00:42.390)
–查询性能对比
1、include index
bill=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t1 where id=1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_t1_1 on public.t1 (cost=0.43..12356.25 rows=9943 width=29) (actual time=0.050..35.200 rows=10001 loops=1)
Output: id, c1, c2, c3, info, crt_time
Index Cond: (t1.id = 1)
Heap Fetches: 10001
Buffers: shared read=9418 dirtied=5564
Planning Time: 0.252 ms
Execution Time: 36.004 ms
(7 rows)
Time: 36.771 ms
2、多列索引
bill=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t2 where id=1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_t2_1 on public.t2 (cost=0.56..12358.05 rows=9943 width=29) (actual time=0.054..33.694 rows=9933 loops=1)
Output: id, c1, c2, c3, info, crt_time
Index Cond: (t2.id = 1)
Heap Fetches: 9933
Buffers: shared read=9343 dirtied=5457
Planning Time: 0.218 ms
Execution Time: 34.503 ms
(7 rows)
Time: 35.180 ms
3、普通索引
bill=# explain (analyze,verbose,timing,costs,buffers) select id,c1,c2,c3,info,crt_time from t3 where id=1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.t3 (cost=111.16..11331.86 rows=9926 width=29) (actual time=3.203..14.420 rows=9955 loops=1)
Output: id, c1, c2, c3, info, crt_time
Recheck Cond: (t3.id = 1)
Heap Blocks: exact=9330
Buffers: shared hit=9358
-> Bitmap Index Scan on idx_t3_1 (cost=0.00..108.68 rows=9926 width=0) (actual time=1.662..1.662 rows=9955 loops=1)
Index Cond: (t3.id = 1)
Buffers: shared hit=28
Planning Time: 0.083 ms
Execution Time: 15.113 ms
(10 rows)
Time: 15.600 ms
总结:
- index include技术,将key值以外的数据存储在index leaf
page中,不需要回表就可以查询到这些数据,提高整体性能(同时又不需要将所有属性都放在KEY中,使索引变得非常大)。 - 目前只支持btree和gist索引。
- include子句中不能使用表达式。