开启一个事务,后通过GDB进行跟踪,断点在 PrepareTransaction
postgres=# begin;
BEGIN
postgres=*# select pg_backend_pid();
pg_backend_pid
----------------
226252
(1 row)
postgres=*# insert into wp_shy values(11,'redacancy');
INSERT 0 1
postgres=*# prepare transaction 'prepare_12';
PrepareTransaction()的执行流程如下:
1 从当前事务块获取事务号; xid = 779;
(gdb) p xid
$21 = 779
2 判断当前事务转态,如果不在 TRANS_INPROCGRESS,会发出警告消息,即其他转态不能进行 prepare;
3 执行所有被延迟的触发器、关闭 打开的Portal;
4 关闭大对象,对可串行化事务进行合法性检查;
5 将事务状态置为 TRANS_PREPARE;
6 告知资源管理器和缓冲管理器做好 Commit 准备;
7 在 free list 中 TwoPhaseState->freeGXacts 申请一个 gxact槽位填充相关信息:xid、gid等重要标识;在此过程中以LW_EXCLUSIVE模式持有 TwoPhaseStateLock;
(gdb) p *gxact
$20 = {next = 0x7f7c7f8f71e0, pgprocno = 1034, dummyBackendId = 1030, prepared_at = 706697115158785,
prepare_start_lsn = 0, prepare_end_lsn = 0, xid = 779, owner = 10, locking_backend = 3, valid = false, ondisk = false,
inredo = false, gid = "prepare_12", '\000' <repeats 189 times>}
8 调用 StartPrepare 初始化二阶段提交相关数据结构 + 2 PC 头记录;
1)变量初始化,根据步骤7中gxact结构体中 pgprocno字段在ProcGlobal->allProcs数组中找到对象的PROC结构体;
2)创建TwoPhaseFileHeader,并填充相关上述已有的字段信息:
(gdb) p hdr
$22 = {magic = 1475953972, total_len = 0, xid = 779, database = 13835, prepared_at = 706697115158785, owner = 10,
nsubxacts = 0, ncommitrels = 0, nabortrels = 0, ninvalmsgs = 0, initfileinval = false, gidlen = 11,
origin_lsn = 59423552209248, origin_timestamp = 706697115158785}
(gdb)
注:database = 13835为该事务涉及表对应的数据库OID;
prepare_at:为prepare操作开始时的时间戳;
gidlen为用户输入的prepare transaction命令指定的字符串:strlen(prepare_12) + ‘\0’ == 11
3)为后续 事务 prepare 准备号运行环境: 初始化二阶段锁、谓词锁、relationMap;
4) 调用Endprepare 完成2pc prepare 转态数据并写入WAL日志(WAL日志后续内容详解);
- 将结束信息注册至2PC记录列表;
- 填充记录信息,如下:
typedef struct xl_xact_prepare
{
uint32 magic = TWOPHASE_MAGIC; /* format identifier */
uint32 total_len; /* actual file length */
TransactionId xid = 779; /* original transaction XID */
Oid database = 13835; /* OID of database it was in */
TimestampTz prepared_at; /* time of preparation */
Oid owner; /* user running the transaction */
int32 nsubxacts = 0; /* number of following subxact XIDs */
int32 ncommitrels = 0; /* number of delete-on-commit rels */
int32 nabortrels = 0; /* number of delete-on-abort rels */
int32 ninvalmsgs = 0; /* number of cache invalidation messages */
bool initfileinval; /* does relcache init file need invalidation? */
uint16 gidlen = 11; /* length of the GID - GID follows the header */
XLogRecPtr origin_lsn = InvalidXLogRecPtr; /* lsn of this record at origin node */
TimestampTz origin_timestamp = 0; /* time of prepare at origin node */
} xl_xact_prepare;
- 向WAL中插入2PC日志记录:将注册的rdata数据组装成完成的日志记录调用插入XLogInsert插入WAL缓冲区中,后将该日志记录Flush至磁盘;
- 更新相关字段信息XLogCtl结构体信息,标记prepared,将该事务的GXACT结构体加入全局的 global ProcArray;
- 将该事务的PGPROC结构中全局的 global ProcArray移除 [该事务仍在运行,可以通过扫描ProArray结构体结构体中的GXACT字段获取xid,若有效即视为仍在运行];
- 释放当前事务所占用的资源管理器、pined 缓冲区,清理RelationCache、释放所占用的锁资源等;
我们会看到在pg_twophase目录下记录着2阶段prepared 成功相关的信息
[root@node199 pg_twophase]# ll
-rw------- 1 postgres postgres 236 5月 25 20:37 00000312
[root@node199 pg_twophase]# pwd
/home/postgres/data/pg_twophase
(gdb) p path
$8 = "pg_twophase/00000312...
/*
* Header for each record in a state file
*
* NOTE: len counts only the rmgr data, not the TwoPhaseRecordOnDisk header.
* The rmgr data will be stored starting on a MAXALIGN boundary.
*/
typedef struct TwoPhaseRecordOnDisk
{
uint32 len; /* length of rmgr data */
TwoPhaseRmgrId rmid; /* resource manager for this record */
uint16 info; /* flag bits for use by rmgr */
} TwoPhaseRecordOnDisk;
/*
* During prepare, the state file is assembled in memory before writing it
* to WAL and the actual state file. We use a chain of StateFileChunk blocks
* for that.
*/
typedef struct StateFileChunk
{
char *data;
uint32 len;
struct StateFileChunk *next;
} StateFileChunk;
static struct xllist
{
StateFileChunk *head; /* first data block in the chain */
StateFileChunk *tail; /* last block in chain */
uint32 num_chunks;
uint32 bytes_free; /* free bytes left in tail block */
uint32 total_len; /* total data bytes in chain */
}
FinishPrepareTransaction
1 变量初始化,进入临界区;
2 验证 gid ,确保两个后端进程尝试提交相同 gid 事务,并获取该事务的事务xid
3 如果一阶段prepared 成功且相关信息持久化至磁盘,则调用ReadTwoPhaseFile读取 2PC状态数据信息,反之从buffer读取;
4 调用 RecordTransactionCommitPrepared 构建XLOG_XACT_COMMIT_PREPARED 日志并写入WAL文件, 后写CLOG标识该事务已成功提交;
5 从全局 ProcAcrry数组中移除该事务对应的PGPROC槽位,清理所有打开relations 的文件信息;
6 以排他模式获取TwoPhaseStateLock,为每个2PC记录注册回调函数防止两个后端进程使用相同的gid进而造成冲突 ;
7 释放2PC 过程中锁获取谓词锁资源,从全局 ProcAcrry数组中移除该事务对应的PGXACT槽位,;
8 释放TwoPhaseStateLock,将持久化的2PC文件信息移除,释放申请的内存资源;
[root@node199 pg_twophase]# ll