PostgreSQL index-only scans详解及常见误区

最新推荐文章于 2025-02-08 14:08:11 发布

foucus、

最新推荐文章于 2025-02-08 14:08:11 发布

阅读量3.2k

点赞数 4

分类专栏： PostgreSQL 文章标签： sql 数据库 postgresql

本文链接：https://blog.csdn.net/weixin_39540651/article/details/115696390

版权

PostgreSQL 专栏收录该内容

282 篇文章

订阅专栏

在oracle中，类似select id from t1 where id<10这样一个查询，当select语句的所有目标列都在索引中时，为了减少IO，就不需要再回表获取数据了。

这便是index only scan，从pg9.2开始引入了这种索引扫描方法，顾名思义即只扫描索引列。

接下来我们通过一个例子来看下什么是index only scan：
创建测试表：

bill=# create table tbl(id int,name text,data text);
CREATE TABLE

插入数据：

bill=# insert into tbl select generate_series(1,100),md5(random()::text),md5(random()::text);
INSERT 0 100

建立索引：

bill=# create index idx_tbl on tbl(id,name);
CREATE INDEX

接下来我们来看看执行下面select查询时，pg是如何获取数据的。

bill=# explain analyze select id,name from tbl where id between 18 and 19;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_tbl on tbl  (cost=0.14..3.35 rows=2 width=37) (actual time=0.077..0.080 rows=2 loops=1)
   Index Cond: ((id >= 18) AND (id <= 19))
   Heap Fetches: 2
 Planning Time: 0.598 ms
 Execution Time: 0.107 ms
(5 rows)

可以看到，上面的查询我们从表tbl中读取id和name两列，因为这两列都包含在idx_tbl索引中，所以似乎没有必要去回表访问，而且看执行计划也是走的index only scan，但实时并非如此，仔细观察可以发现还是访问了2个heap pages。

这便是pg中index only scan的一个误区：
只要select的查询列中只有索引列，执行计划都是显示index only scan，但并不能保证不回表查询！

为什么呢？因为在pg中索引元组并没有多版本的信息，也就是说我们通过索引无法判断其对应的数据元组中数据的可见性。所以上面这种情况才会需要回表检查数据元组的可见性。

面对这种情况，pg通过可见性映射表来解决。如果某一个page中存储的所有元组数据都是可见的，那么便不需要回表。

我们可以查看该表数据元组的可见性：

bill=# select relname,relpages,relallvisible from pg_class where relname='tbl';
 relname | relpages | relallvisible
---------+----------+---------------
 tbl     |        2 |             0
(1 row)

relallvisible的值为0，难怪还需要回表读取数据块。

对该表进行vacuum再查看：

bill=# vacuum tbl;
VACUUM
bill=# select relname,relpages,relallvisible from pg_class where relname='tbl';
 relname | relpages | relallvisible
---------+----------+---------------
 tbl     |        2 |             2
(1 row)

接下来再去查询：
可以看到这次heap fetches为0了，这才是真正的index only scan！

bill=# explain analyze select id,name from tbl where id between 18 and 19;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_tbl on tbl  (cost=0.14..1.48 rows=2 width=37) (actual time=0.025..0.027 rows=2 loops=1)
   Index Cond: ((id >= 18) AND (id <= 19))
   Heap Fetches: 0
 Planning Time: 0.283 ms
 Execution Time: 0.053 ms
(5 rows)