PG数据库内核分析学习笔记_MultiXact日志管理器

PG数据库内核分析学习笔记_MultiXact日志管理器

MultiXact日志是PG系统用来记录组合事务ID的一种日志. 由于PG采用了多版本并发控制, 因此同一个元组相关的事务ID可能有多个, 为了在加锁(行共享锁)的时候统一操作, PG将与该元组相关联的多个事务ID组合起来用一个MultiXactID代替来管理.同CLOG, SubTrans日志一样, MultiXact日志也是利用SLRU缓冲区来实现.

typedef uint32 TransactionId;

/* MultiXactId must be equivalent to TransactionId, to fit in t_xmax */
typedef TransactionId MultiXactId;

1 MultiXact日志管理器相关数据结构

MultiXact是一个多对一的映射关系, 需要在事务ID数组中标记哪一段映射到一个MultiXactID(如图). 所以在映射的过程中需要存储两种信息, 即需要标识一段事务ID的偏移量(Offset), 还需要记录这段偏移量的大小(NMembers).

图 MultiXactID组合关系
在这里插入图片描述
由于需要对MultiXactID的分配进行维护, 于是定义了数据结构MultiXactStateData(数据结构如图所示).

数据结构 MultiXactStateData

// multixact.c
/*
 * MultiXact state shared across all backends.  All this state is protected
 * by MultiXactGenLock.  (We also use MultiXactOffsetControlLock and
 * MultiXactMemberControlLock to guard accesses to the two sets of SLRU
 * buffers.  For concurrency's sake, we avoid holding more than one of these
 * locks at a time.)
 */
typedef struct MultiXactStateData
{
	/* 下一个可分配的 MultiXactId */
	MultiXactId nextMXact;

	/* 下一个对应MultiXactId的起始偏移量 */
	MultiXactOffset nextOffset;

	/* Have we completed multixact startup? */
	bool		finishedStartup;

	/*
	 * Oldest multixact that is still potentially referenced by a relation.
	 * Anything older than this should not be consulted.  These values are
	 * updated by vacuum.
	 */
	MultiXactId oldestMultiXactId;
	Oid			oldestMultiXactDB;

	/*
	 * Oldest multixact offset that is potentially referenced by a multixact
	 * referenced by a relation.  We don't always know this value, so there's
	 * a flag here to indicate whether or not we currently do.
	 */
	MultiXactOffset oldestOffset;
	bool		oldestOffsetKnown;

	/* support for anti-wraparound measures */
	MultiXactId multiVacLimit;
	MultiXactId multiWarnLimit;
	MultiXactId multiStopLimit;
	MultiXactId multiWrapLimit;

	/* support for members anti-wraparound measures */
	MultiXactOffset offsetStopLimit;	/* known if oldestOffsetKnown */

	/*
	 * Per-backend data starts here.  We have two arrays stored in the area
	 * immediately following the MultiXactStateData struct. Each is indexed by
	 * BackendId.
	 *
	 * In both arrays, there's a slot for all normal backends (1..MaxBackends)
	 * followed by a slot for max_prepared_xacts prepared transactions. Valid
	 * BackendIds start from 1; element zero of each array is never used.
	 *
	 * OldestMemberMXactId[k] is the oldest MultiXactId each backend's current
	 * transaction(s) could possibly be a member of, or InvalidMultiXactId
	 * when the backend has no live transaction that could possibly be a
	 * member of a MultiXact.  Each backend sets its entry to the current
	 * nextMXact counter just before first acquiring a shared lock in a given
	 * transaction, and clears it at transaction end. (This works because only
	 * during or after acquiring a shared lock could an XID possibly become a
	 * member of a MultiXact, and that MultiXact would have to be created
	 * during or after the lock acquisition.)
	 *
	 * OldestVisibleMXactId[k] is the oldest MultiXactId each backend's
	 * current transaction(s) think is potentially live, or InvalidMultiXactId
	 * when not in a transaction or not in a transaction that's paid any
	 * attention to MultiXacts yet.  This is computed when first needed in a
	 * given transaction, and cleared at transaction end.  We can compute it
	 * as the minimum of the valid OldestMemberMXactId[] entries at the time
	 * we compute it (using nextMXact if none are valid).  Each backend is
	 * required not to attempt to access any SLRU data for MultiXactIds older
	 * than its own OldestVisibleMXactId[] setting; this is necessary because
	 * the checkpointer could truncate away such data at any instant.
	 *
	 * The oldest valid value among all of the OldestMemberMXactId[] and
	 * OldestVisibleMXactId[] entries is considered by vacuum as the earliest
	 * possible value still having any live member transaction.  Subtracting
	 * vacuum_multixact_freeze_min_age from that value we obtain the freezing
	 * point for multixacts for that table.  Any value older than that is
	 * removed from tuple headers (or "frozen"; see FreezeMultiXactId.  Note
	 * that multis that have member xids that are older than the cutoff point
	 * for xids must also be frozen, even if the multis themselves are newer
	 * than the multixid cutoff point).  Whenever a full table vacuum happens,
	 * the freezing point so computed is used as the new pg_class.relminmxid
	 * value.  The minimum of all those values in a database is stored as
	 * pg_database.datminmxid.  In turn, the minimum of all of those values is
	 * stored in pg_control and used as truncation point for pg_multixact.  At
	 * checkpoint or restartpoint, unneeded segments are removed.
	 */
	MultiXactId perBackendXactIds[FLEXIBLE_ARRAY_MEMBER];
} MultiXactStateData;

MultiXactStateData用来管理和维护MultiXactID, 另外还需要定义一个统一的接口来操作MultiXactID, 即存储这种多对一的映射关系, PG定义mXactCacheEnt来完成此工作.

/*
 * Definitions for the backend-local MultiXactId cache.
 *
 * We use this cache to store known MultiXacts, so we don't need to go to
 * SLRU areas every time.
 *
 * The cache lasts for the duration of a single transaction, the rationale
 * for this being that most entries will contain our own TransactionId and
 * so they will be uninteresting by the time our next transaction starts.
 * (XXX not clear that this is correct --- other members of the MultiXact
 * could hang around longer than we did.  However, it's not clear what a
 * better policy for flushing old cache entries would be.)	FIXME actually
 * this is plain wrong now that multixact's may contain update Xids.
 *
 * We allocate the cache entries in a memory context that is deleted at
 * transaction end, so we don't need to do retail freeing of entries.
 */
typedef struct mXactCacheEnt
{
	MultiXactId multi;
	int			nmembers;
	dlist_node	node;
	MultiXactMember members[FLEXIBLE_ARRAY_MEMBER];
} mXactCacheEnt;

MultiXact日志存储在PGDATA/pg_multixact/目录下, 其中会有两个子目录members和offset, 分别存储XactID成员和偏移量.

[postgres@localhost ~/db1_normal/pg_multixact]$ ls
members  offsets
[postgres@localhost ~/db1_normal/pg_multixact]$ 

从一个MultiXactID映射到具体的存储位置是通过下面的变换完成的:

/* We need four bytes per offset */
#define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))

#define MultiXactIdToOffsetPage(xid) \
	((xid) / (MultiXactOffset) MULTIXACT_OFFSETS_PER_PAGE)
#define MultiXactIdToOffsetEntry(xid) \
	((xid) % (MultiXactOffset) MULTIXACT_OFFSETS_PER_PAGE)
#define MultiXactIdToOffsetSegment(xid) (MultiXactIdToOffsetPage(xid) / SLRU_PAGES_PER_SEGMENT)
/* page in which a member is to be found */
#define MXOffsetToMemberPage(xid) ((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_PAGE)
#define MXOffsetToMemberSegment(xid) (MXOffsetToMemberPage(xid) / SLRU_PAGES_PER_SEGMENT)

/* Location (byte offset within page) of flag word for a given member */
#define MXOffsetToFlagsOffset(xid) \
	((((xid) / (TransactionId) MULTIXACT_MEMBERS_PER_MEMBERGROUP) % \
	  (TransactionId) MULTIXACT_MEMBERGROUPS_PER_PAGE) * \
	 (TransactionId) MULTIXACT_MEMBERGROUP_SIZE)
#define MXOffsetToFlagsBitShift(xid) \
	(((xid) % (TransactionId) MULTIXACT_MEMBERS_PER_MEMBERGROUP) * \
	 MXACT_MEMBER_BITS_PER_XACT)

/* Location (byte offset within page) of TransactionId of given member */
#define MXOffsetToMemberOffset(xid) \
	(MXOffsetToFlagsOffset(xid) + MULTIXACT_FLAGBYTES_PER_GROUP + \
	 ((xid) % MULTIXACT_MEMBERS_PER_MEMBERGROUP) * sizeof(TransactionId))

MultiXact日志的缓冲区用两个SLRU缓冲区来实现, 分别是MultiXactOffsetCtl和MultiXactMemberCtl, 分别记录Members和Offsets. 在Postmaster启动后, 就注册在共享内存中, 管理全局的MultiXactID.

// multixact.c
/*
 * Links to shared-memory data structures for MultiXact control
 */
static SlruCtlData MultiXactOffsetCtlData;
static SlruCtlData MultiXactMemberCtlData;

#define MultiXactOffsetCtl	(&MultiXactOffsetCtlData)
#define MultiXactMemberCtl	(&MultiXactMemberCtlData)
// slru.h
/*
 * SlruCtlData is an unshared structure that points to the active information
 * in shared memory.
 */
typedef struct SlruCtlData
{
	SlruShared	shared;

	/*
	 * This flag tells whether to fsync writes (true for pg_xact and multixact
	 * stuff, false for pg_subtrans and pg_notify).
	 */
	bool		do_fsync;

	/*
	 * Decide which of two page numbers is "older" for truncation purposes. We
	 * need to use comparison of TransactionIds here in order to do the right
	 * thing with wraparound XID arithmetic.
	 */
	bool		(*PagePrecedes) (int, int);

	/*
	 * Dir is set during SimpleLruInit and does not change thereafter. Since
	 * it's always the same, it doesn't need to be in shared memory.
	 */
	char		Dir[64];
} SlruCtlData;

参考

<PG数据库内核分析> 7.11.4 MultiXact日志管理器

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值