CLOG在MVCC中的作用

pg_xact是事务提交日志(Commit Log)的存储目录,事务提交日志默认256KB,文件名形如NNNN,系统初始化后从0000开始递增至FFFF。PG 10及之后的高版本改目录名为pg_xact,10之前目录名称是pg_clog。

事务提交日志存储数据库的单个事务运行状态。Commit Log由共享内存中一组8KB的page组成,每个page包含一列数组,每个数组元素包含XID和该事物的实时状态。当page不足时,创建新的page来存储新的事务。

Once a transaction has been completed successfully, its status has to be stored somehow—it must be registered that the transaction is committed. For this purpose, PostgreSQL employs a special CLOG (commit log) structure.It is stored as files in the $PGDATA/pg_xact directory rather than as a system catalog table.

CLOG is split into several files solely for convenience. These files are accessed page by page via buffers in the server’s shared memory.

Just like a tuple header, CLOG contains two bits for each transaction: committed and aborted. Once committed, a transaction is marked in CLOG with the committed bit. When any other transaction accesses a heap page, it has to answer the question: has the xmin transaction already finished?
• If not, then the created tuple must not be visible.
To check whether the transaction is still active, PostgreSQL uses yet another structure located in the shared memory of the instance; it is called ProcArray.This structure contains the list of all the active processes, with the corresponding current (active) transaction specified for each process.
• If yes, was it committed or aborted? In the latter case, the corresponding tuple cannot be visible either.
It is this check that requires CLOG. But even though the most recent CLOG pages are stored in memory buffers, it is still expensive to perform this check every time. Once determined, the transaction status is written into the tuple header—more specifically, into xmin_committed and xmin_aborted information bits, which are also called hint bits. If one of these bits is set, then the xmin transaction status is considered to be already known, and the next transaction will have to access neither CLOG nor ProcArray.
—如果插入数据的事物已经是完成的,是通过CLOG来判断这个事物是提交还是回滚。如果一旦这个事物的状态被确认了,那这些事物状态信息就会写到tuple header中的hint bits中去。

Why aren’t these bits set by the transaction that performs row insertion? The problem is that it is not known yet at that time whether this transaction will complete successfully. And when it is committed, it is already unclear which tuples and pages have been changed. If a transaction affects many pages, it may be too expensive to track them. Besides, some of these pages may be not in the cache anymore; reading them again to simply update the hint bits would seriously slow down the commit.

The flip side of this cost reduction is that any transaction (even a read-only SELECT command) can start setting hint bits, thus leaving a trail of dirtied pages in the buffer cache.
—只读事物也可以设置tuple header中的hint bits。

测试过程:

测试数据库版本:PostgreSQL 14.5

终端1

[postgres@duqk04 ~]$ psql
psql (14.5)
Type "help" for help.

postgres=# create extension pageinspect;
CREATE EXTENSION
postgres=# CREATE TABLE t(
postgres(# id integer GENERATED ALWAYS AS IDENTITY,
postgres(# s text
postgres(# );
CREATE TABLE
postgres=# CREATE INDEX ON t(s);
CREATE INDEX
postgres=# BEGIN;
BEGIN
postgres=*# INSERT INTO t(s) VALUES ('FOO');
INSERT 0 1
postgres=*# SELECT pg_current_xact_id();
 pg_current_xact_id 
--------------------
                738
(1 row)

postgres=*# 

–事物738发起插入数据动作

终端2

postgres=# SELECT * FROM heap_page_items(get_raw_page('t',0)) \gx
-[ RECORD 1 ]-------------------
lp          | 1
lp_off      | 8160
lp_flags    | 1
lp_len      | 32
t_xmin      | 738
t_xmax      | 0
t_field3    | 0
t_ctid      | (0,1)
t_infomask2 | 2
t_infomask  | 2050
t_hoff      | 24
t_bits      | 
t_oid       | 
t_data      | \x0100000009464f4f

postgres=# 
postgres=# SELECT '(0,'||lp||')' AS ctid,
postgres-# CASE lp_flags
postgres-# WHEN 0 THEN 'unused'
postgres-# WHEN 1 THEN 'normal'
postgres-# WHEN 2 THEN 'redirect to '||lp_off
postgres-# WHEN 3 THEN 'dead'
postgres-# END AS state,
postgres-# t_xmin as xmin,
postgres-# t_xmax as xmax,
postgres-# (t_infomask & 256) > 0 AS xmin_committed,
postgres-# (t_infomask & 512) > 0 AS xmin_aborted,
postgres-# (t_infomask & 1024) > 0 AS xmax_committed,
postgres-# (t_infomask & 2048) > 0 AS xmax_aborted
postgres-# FROM heap_page_items(get_raw_page('t',0)) \gx
-[ RECORD 1 ]--+-------
ctid           | (0,1)
state          | normal
xmin           | 738
xmax           | 0
xmin_committed | f
xmin_aborted   | f
xmax_committed | f
xmax_aborted   | t

postgres=# 

xmax_aborted :表示删除的事物被丢弃,刚开始设置为true,是因为删除这行的事物没有发生,在mvcc中和abort具有相同影响。

把以上查询包装成函数,以便多次查询,创建函数语句:

CREATE FUNCTION heap_page(relname text, pageno integer)
RETURNS TABLE(ctid tid, state text, xmin text, xmax text)
AS $$
SELECT (pageno,lp)::text::tid AS ctid,
CASE lp_flags
WHEN 0 THEN 'unused'
WHEN 1 THEN 'normal'
WHEN 2 THEN 'redirect to '||lp_off
WHEN 3 THEN 'dead'
END AS state,
t_xmin || CASE
WHEN (t_infomask & 256) > 0 THEN ' c'
WHEN (t_infomask & 512) > 0 THEN ' a'
ELSE ''
END AS xmin,
t_xmax || CASE
WHEN (t_infomask & 1024) > 0 THEN ' c'
WHEN (t_infomask & 2048) > 0 THEN ' a'
ELSE ''
END AS xmax
FROM heap_page_items(get_raw_page(relname,pageno))
ORDER BY lp;
$$ LANGUAGE sql;

查询结果如下:

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin | xmax 
-------+--------+------+------
 (0,1) | normal | 738  | 0 a
(1 row)

postgres=# 

终端1

postgres=*# commit;
COMMIT
postgres=# 

–终端1提交事物

终端2

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin | xmax 
-------+--------+------+------
 (0,1) | normal | 738  | 0 a
(1 row)

postgres=# SELECT '(0,'||lp||')' AS ctid,
postgres-# CASE lp_flags
postgres-# WHEN 0 THEN 'unused'
postgres-# WHEN 1 THEN 'normal'
postgres-# WHEN 2 THEN 'redirect to '||lp_off
postgres-# WHEN 3 THEN 'dead'
postgres-# END AS state,
postgres-# t_xmin as xmin,
postgres-# t_xmax as xmax,
postgres-# (t_infomask & 256) > 0 AS xmin_committed,
postgres-# (t_infomask & 512) > 0 AS xmin_aborted,
postgres-# (t_infomask & 1024) > 0 AS xmax_committed,
postgres-# (t_infomask & 2048) > 0 AS xmax_aborted
postgres-# FROM heap_page_items(get_raw_page('t',0)) \gx
-[ RECORD 1 ]--+-------
ctid           | (0,1)
state          | normal
xmin           | 738
xmax           | 0
xmin_committed | f
xmin_aborted   | f
xmax_committed | f
xmax_aborted   | t

postgres=#

Now the first transaction that accesses the page (in a “standard” way, without using pageinspect) has to determine the status of the xmin transaction and update the hint bits:
第一次访问数据,会到$PGDATA/pg_xact目录去找到CLOG信息,来确定数据是否可见。这之后,xmin的相关信息,会存储到 tuple header中去。

postgres=# select * from t;
 id |  s  
----+-----
  1 | FOO
(1 row)

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin  | xmax 
-------+--------+-------+------
 (0,1) | normal | 738 c | 0 a
(1 row)

postgres=# 

终端1

postgres=# begin;
BEGIN
postgres=*# delete from t;
DELETE 1
postgres=*# SELECT pg_current_xact_id();
 pg_current_xact_id 
--------------------
                740
(1 row)

postgres=*# 

–事物740发起删除数据操作

终端2

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin  | xmax 
-------+--------+-------+------
 (0,1) | normal | 738 c | 740
(1 row)

postgres=# 

–由于已经发起了delete动作,所以xmax 有了具体的值。

终端1

postgres=*# rollback;
ROLLBACK
postgres=# 

—回滚删除事物

终端2

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin  | xmax 
-------+--------+-------+------
 (0,1) | normal | 738 c | 740
(1 row)

postgres=#

第一次访问数据之后,才会写到 tuple header中。
When the page is accessed, the transaction status is checked, and the tuple receives the xmax_aborted hint bit. The xmax number itself still remains in the page, but no one is going to pay attention to it anymore:

postgres=# select xmin,xmax,* from t;
 xmin | xmax | id |  s  
------+------+----+-----
  738 |  740 |  1 | FOO
(1 row)

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin  | xmax  
-------+--------+-------+-------
 (0,1) | normal | 738 c | 740 a
(1 row)

postgres=# 

终端1

postgres=# BEGIN;
BEGIN
postgres=*# UPDATE t SET s = 'BAR';
UPDATE 1
postgres=*# SELECT pg_current_xact_id();
 pg_current_xact_id 
--------------------
                741
(1 row)

postgres=*# SELECT * FROM t;
 id |  s  
----+-----
  1 | BAR
(1 row)

postgres=*# 

–事物741发起update操作

终端2

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin  | xmax 
-------+--------+-------+------
 (0,1) | normal | 738 c | 741
 (0,2) | normal | 741   | 0 a
(2 rows)

postgres=# 

终端1

postgres=*# commit;
COMMIT
postgres=# 

终端2

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin  | xmax 
-------+--------+-------+------
 (0,1) | normal | 738 c | 741
 (0,2) | normal | 741   | 0 a
(2 rows)

postgres=# select * from t;
 id |  s  
----+-----
  1 | BAR
(1 row)

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin  | xmax  
-------+--------+-------+-------
 (0,1) | normal | 738 c | 741 c
 (0,2) | normal | 741 c | 0 a
(2 rows)

postgres=# 

–也是需要先通过select访问数据,到clog中去读取事物状态信息之后,相应的信息才会写到tuple header中。

终端1

postgres=# vacuum full t;
VACUUM
postgres=# SELECT * FROM index_page('t_s_idx',1);
 itemoffset | htid  
------------+-------
          1 | (0,1)
(1 row)

postgres=# SELECT * FROM heap_page('t',0);
 ctid  | state  | xmin  | xmax 
-------+--------+-------+------
 (0,1) | normal | 741 c | 0 a
(1 row)

postgres=# 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值