从无法创建的索引看 PostgreSQL的create index concurrently（CIC）过程

最新推荐文章于 2024-07-16 06:54:09 发布

Hehuyi_In

最新推荐文章于 2024-07-16 06:54:09 发布

阅读量3.2k

点赞数 1

分类专栏：源码学习 PostgreSQL 索引文章标签： postgresql 并发创建索引 create index concurrently CIC

本文链接：https://blog.csdn.net/Hehuyi_In/article/details/109268806

版权

PostgreSQL 同时被 3 个专栏收录

140 篇文章 121 订阅

订阅专栏

源码学习

67 篇文章 53 订阅

订阅专栏

索引

6 篇文章 0 订阅

订阅专栏

文章详细介绍了PostgreSQL中CREATEINDEXCONCURRENTLY命令的工作原理，包括其创建过程中的三个阶段、两次表扫描和三次等待，以及为何在某些情况下可能会被阻塞。此外，还提到了与HOT堆内元组的关系和索引状态的标记位。文章强调了在长事务环境下使用CIC的注意事项，并提供了源码层面的理解。

摘要由CSDN通过智能技术生成

CREATE INDEX CONCURRENTLY (CIC)大概是DBA们最常用的语句之一，创建索引时只加4级锁，不阻塞DML。听上去非常美好，但在大事务、长事务较多的系统，可能被阻塞得一个中午也建不上一个索引。本篇就从这个无法创建的索引开始，学习CIC的过程、原理以及注意事项。

一、无法创建的索引

create table test(id int);
INSERT INTO test(id) VALUES (generate_series(1, 10000));

create table tmp02(a int);
insert into tmp02 values(1);

会话1

select count(*) from test a, test b;

会话2

create index concurrently ind_01 on tmp02(a);

可以看到，即使test和tmp02都不是同一个表，test执行的都不是dml语句，tmp02的索引创建依然被阻塞了。如果会话1中是要执行好几个小时的查询，会话2的索引创建也将一直被阻塞。

查看等待情况

SELECT pid, locktype,virtualxid,relation::regclass, mode FROM pg_locks where granted='f' order by pid;

SELECT pid, locktype,virtualxid,relation::regclass, mode FROM pg_locks where granted='t' order by pid;

我们知道查询语句执行时会获取一个virtualxid（dml语句也会），但为什么创建索引要跟它获取同一个？令人迷茫。

看看执行的函数堆栈，发现DefineIndex调用一个函数叫WaitForOlderSnapshots，它在等更旧的快照。

二、预备知识

1. HOT 堆内元组

CIC与HOT息息相关，新建索引后，HOT更新必须符合相应规则。关于HOT，参考：

《PostgreSQL面试题集锦》学习与回答_Hehuyi_In的博客-CSDN博客

postgresql_internals-14 学习笔记（一）-CSDN博客

Broken HOT chains（HOT断链）：更新前后的元组在同一个数据块内放不下，或者有索引键字段被更新时，PG必须加一个独立索引项指向新元组。
HOT-safe（HOT安全）：没有索引键字段被更新

2. pg_index中的标记位

indislive为true：索引可见，新事务知道有这个索引存在
indisready为true：表示该索引可写，新事务的DML操作需要维护改索引
indisvalid 为true：表示改索引可读，新事务可以使用此索引进行查询

三、 CIC创建过程

结合官方文档及网上文章的介绍，CIC的创建可以概括为：三个阶段、两次扫描、三次等待。

初始表状态，索引尚未创建

1. 阶段1

一堆语法解析和预检查
构建catalog , 主要包括 relcache，pg_class, pg_index（indislive=true 索引可见、indisready=false不能被写入、indisvalid= false不能被查询）
获取一个会话锁（ShareUpdateExclusiveLock），防止第一个事务提交之后，表或新索引被其它事务删除
提交当前事务，以便新建索引可见，开启事务1

此阶段后，新事务会看到表中有一个invalid索引（但不可读写），因此此后需要考虑HOT-safe，避免更新索引键值字段导致HOT断链。

2. 阶段2

获取ShareLock，等待本表上所有的dml事务结束（第1次等待，仅与本表DML相关，影响较小）

等待原因：虽然新索引此时还不能读写，但新事务已经能看到它的存在，此后再对该表进行修改时，必须保证HOT链满足新索引定义。即更新到新索引字段时，需要产生新的HOT链。而早于阶段1开始的事务无法看到新索引，还会按原先的规则进行HOT更新，无法满足要求。

获取快照，对该表进行全表扫描，将对此快照可见的所有元组构建索引（第1次表扫描）。在这个阶段，其它事务对该表进行写入时，并不维护索引（因为还不能写入），仅保证HOT更新满足新索引定义，因此会有索引和表数据不一致的情况（例如下图将表的b1更新成b2，但索引并未更新）

更新pg_index indisready=true ，索引可写入但不能查询，此后其它事务修改该表时，需要维护新索引（index_concurrently_build -> index_set_state_flags）
提交事务1，开启事务2

此阶段后，索引可写入但不能查询（因为数据还不一致），其他事务修改该表时，需要维护新索引。

3. 阶段3

第三阶段实际就是补数据，保证数据一致性。

使用ShareLock等待表上所有的dml事务结束（第2次等待，仅与本表DML相关，影响较小）

等待原因：Phase2中事务结束前开始的事务，无法看到新索引已变为可写状态，修改基表时并不维护新索引。

再次获取快照，进行一次全表扫描，为Phase2事务开始到现在索引中缺少的元组添加到索引中，过程类似merge操作（第2次表扫描）
记下当前快照的xmin，提交事务2，开启事务3
获取所有早于当前快照xmin的快照的vxid，等待所有旧读写事务结束（第三次等待，甚至会被其他表影响，影响最大，我们的例子就卡在这步）

等待原因：旧事务的快照可以看到比构建索引时的快照更旧的行，如果它们使用新索引进行查询，可能索引中会没有它们想要看到的旧数据，导致数据不一致（例如下图中索引并没有值为b的数据，但旧事务可能看到此值）。因此，第3阶段必须等所有旧读写事务结束，才能将新索引置为可读状态。

更新pg_index indisvalid=true，此后该索引可以被用于查询
更新cache，释放会话锁

至此，索引对所有事务可用。

四、 CIC的注意事项

不要在有长事务/慢查询时执行，否则会等待非常久
CIC需要扫描两遍表，耗时更长，资源消耗更多
CIC是自阻塞的，不能同时对一个表执行
分区表不支持在主表CIC创建索引（单独在各子表可以）

五、源码学习

再从源码层学习下CIC的创建过程，DefineIndex函数位于indexcmds.c文件，这里只根据创建阶段截取部分代码。

1. 阶段1

一堆语法解析和预检查（非重点）

可以看到锁模式

lockmode = concurrent ? ShareUpdateExclusiveLock : ShareLock;
    rel = table_open(relationId, lockmode);

另外分区表不支持CIC

partitioned = rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
    if (partitioned)
    {
        /*
         * Note: we check 'stmt->concurrent' rather than 'concurrent', so that
         * the error is thrown also for temporary tables.  Seems better to be
         * consistent, even though we could do it on temporary table because
         * we're not actually doing it concurrently.
         */
        if (stmt->concurrent)
            ereport(ERROR,
                    (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                     errmsg("cannot create index on partitioned table \"%s\" concurrently",
                            RelationGetRelationName(rel))));
…
    }

调用index_create函数，构建catalog , 主要包括 relcache，pg_class, pg_index（设置indislive=true、indisready=false、indisvalid= false）

indexRelationId =
        index_create(rel, indexRelationName, indexRelationId, parentIndexId,
                     parentConstraintId,
                     stmt->oldNode, indexInfo, indexColNames,
                     accessMethodId, tablespaceId,
                     collationObjectId, classObjectId,
                     coloptions, reloptions,
                     flags, constr_flags,
                     allowSystemTableMods, !check_rights,
                     &createdConstraintId);

index_create函数

/*
     * store index's pg_class entry
     */
    InsertPgClassTuple(pg_class, indexRelation,
                       RelationGetRelid(indexRelation),
                       (Datum) 0,
                       reloptions);

/* ----------------
     *    update pg_index
     *    (append INDEX tuple)
     *
     *    Note that this stows away a representation of "predicate".
     *    (Or, could define a rule to maintain the predicate) --Nels, Feb '92
     * ----------------
     */
    UpdateIndexRelation(indexRelationId, heapRelationId, parentIndexRelid,
                        indexInfo,
                        collationObjectId, classObjectId, coloptions,
                        isprimary, is_exclusion,
                        (constr_flags & INDEX_CONSTR_CREATE_DEFERRABLE) == 0,
                        !concurrent && !invalid,
                        !concurrent);

UpdateIndexRelation函数

/*
     * Build a pg_index tuple
     */
…
    values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
    values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
    values[Anum_pg_index_indislive - 1] = BoolGetDatum(true);
…

获取一个会话锁（ShareUpdateExclusiveLock），防止第一个事务提交之后，表或新索引被其它事务删除

    LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);

提交当前事务，以便新建索引可见，开启事务1

    PopActiveSnapshot();
    CommitTransactionCommand();
    StartTransactionCommand();

此阶段后，新事务会看到表中有一个invalid索引（但不可读写），因此此后需要考虑HOT-safe，避免更新索引键值字段导致HOT断链。

2. 阶段2

获取ShareLock，等待本表上所有的dml事务结束（第1次等待，仅与本表DML相关，影响较小）

WaitForLockers(heaplocktag, ShareLock, true);

获取快照，对该表进行全表扫描，将对此快照可见的所有元组构建索引（第1次表扫描）。在这个阶段，其它事务对该表进行写入时，并不维护索引，仅仅保证HOT更新满足新索引定义

    /* Set ActiveSnapshot since functions in the indexes may need it */
    PushActiveSnapshot(GetTransactionSnapshot());

    /* Perform concurrent build of index */
    index_concurrently_build(relationId, indexRelationId);

index_concurrently_build函数调用index_build函数

    /* Now build the index */
    index_build(heapRel, indexRelation, indexInfo, false, true);

index_concurrently_build函数调用index_set_state_flags函数，更新pg_index indisready=true（索引可写），此后其它事务修改该表时，需要维护新索引

 /*
     * Update the pg_index row to mark the index as ready for inserts. Once we
     * commit this transaction, any new transactions that open the table must
     * insert new entries into the index for insertions and non-HOT updates.
     */
    index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);

index_set_state_flags函数

/* Perform the requested state change on the copy */
    switch (action)
    {
        case INDEX_CREATE_SET_READY:
            /* Set indisready during a CREATE INDEX CONCURRENTLY sequence */
            Assert(indexForm->indislive);
            Assert(!indexForm->indisready);
            Assert(!indexForm->indisvalid);
            indexForm->indisready = true;
            break;
  …
    }

提交事务1，开启事务2

    /*
     * Commit this transaction to make the indisready update visible.
     */
    CommitTransactionCommand();
    StartTransactionCommand();

此阶段后，索引可写入但不能查询（因为数据还不一致），其他事务修改该表时，需要维护新索引。

3. 阶段3

第三阶段实际就是补数据，保证数据一致性。

使用ShareLock等待表上所有的dml事务结束（第2次等待，仅与本表DML相关，影响较小）

    WaitForLockers(heaplocktag, ShareLock, true);

再次获取快照，进行一次全表扫描，为Phase2事务开始到现在索引中缺少的元组添加到索引中，过程类似merge操作（第2次表扫描）

  snapshot = RegisterSnapshot(GetTransactionSnapshot());
  PushActiveSnapshot(snapshot);

记下当前快照的xmin，提交事务2，开启事务3

    limitXmin = snapshot->xmin;
    PopActiveSnapshot();
    UnregisterSnapshot(snapshot);
    CommitTransactionCommand();
    StartTransactionCommand();

获取所有早于当前快照xmin的快照的vxid，等待所有旧读写事务结束（第三次等待，甚至会被其他表影响，影响最大，就是我们抓到的函数）

WaitForOlderSnapshots(limitXmin, true);

更新pg_index indisvalid=true，此后该索引可以被用于查询

index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);

index_set_state_flags函数

    /* Perform the requested state change on the copy */
    switch (action)
    {
        case INDEX_CREATE_SET_VALID:
            /* Set indisvalid during a CREATE INDEX CONCURRENTLY sequence */
            Assert(indexForm->indislive);
            Assert(indexForm->indisready);
            Assert(!indexForm->indisvalid);
            indexForm->indisvalid = true;
            break;
…
    }

更新cache，释放会话锁

    CacheInvalidateRelcacheByRelid(heaprelid.relId);
    UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock); 
    pgstat_progress_end_command();
    return address;
}

参考

PostgreSQL create index concurrently原理分析 – 数据库内核研究

Explaining CREATE INDEX CONCURRENTLY - 2ndQuadrant | PostgreSQL

http://mysql.taobao.org/monthly/2020/09/05/

PostgreSQL: Documentation: 14: CREATE INDEX

https://developer.aliyun.com/article/590359