PostgreSQL 基础模块---表和元组组织方式

最新推荐文章于 2023-03-28 16:37:00 发布

obvious__

最新推荐文章于 2023-03-28 16:37:00 发布

阅读量1.3k

点赞数

分类专栏： postgresql 文章标签： postgresql 数据库内核

本文链接：https://blog.csdn.net/obvious__/article/details/109328425

版权

postgresql 专栏收录该内容

25 篇文章 28 订阅

订阅专栏

参考资料

《PostgreSQL数据库内核分析》彭智勇彭煜玮：P58~P60

概述

PostgreSQL是堆表，其中每个文件由多个块组成，块在物理磁盘中的存储形式如下图所示：
在这里插入图片描述
块由4个部分组成：

块头：PageHeaderData
记录：记录由两部分组成
a. Linp：Linp是ItemIdData类型，长度固定，在块中从前向后分配。每个ItemIdData都记录了一个偏移，用于指向Tuple。
b. Tuple：记录的头信息，一条记录由Tuple+记录本身构成。记录本身长度不固定，在块中从后向前分配。
c. 空闲空间：Freespace，位于Linp和Tuple之间。
特定数据：Special space，用于存放于索引方法相关的特定数据。

PageHeaderData

PageHeaderData结构如下：


typedef struct PageHeaderData
{
	/* XXX LSN is member of *any* block, not only page-organized ones */
	PageXLogRecPtr 	pd_lsn;								/* LSN: next byte after last byte of xlog
								 						 * record for last change to this page */
	uint16			pd_checksum;						/* checksum */
	uint16			pd_flags;							/* flag bits, see below */
	LocationIndex 	pd_lower;							/* offset to start of free space */
	LocationIndex 	pd_upper;							/* offset to end of free space */
	LocationIndex 	pd_special;							/* offset to start of special space */
	uint16			pd_pagesize_version;
	TransactionId 	pd_prune_xid; 						/* oldest prunable XID, or zero if none */
	ItemIdData		pd_linp[FLEXIBLE_ARRAY_MEMBER]; 	/* line pointer array */
} PageHeaderData;

其中：

pd_lower、pd_upper：分别表示空闲空间的起始偏移和结束偏移，pd_upper - pd_lower即可获得空闲空间的大小。当插入一条记录时pd_upper向低地址移动，pd_lower向高地址移动。
pd_linp：为ItemIdData数组。通过pd_linp可以很方便的定位到一条记录。

ItemIdData

ItemIdData结构如下：

typedef struct ItemIdData
{
	unsigned	lp_off:15,		/* offset to tuple (from start of page) */
				lp_flags:2,		/* state of item pointer, see below */
				lp_len:15;		/* byte length of tuple */
} ItemIdData;

其中：

lp_off：表示tuple的偏移（相对于块的起始位置）。
lp_flags：表示记录的状态：未使用、正常使用、HOT重定向、死亡。
lp_len：表示记录的长度。

HeapTupleHeader

HeapTupleHeader结构如下：

struct HeapTupleHeaderData
{
	union
	{
		HeapTupleFields t_heap;
		DatumTupleFields t_datum;
	}			t_choice;

	ItemPointerData t_ctid;		/* current TID of this or newer tuple (or a
								 * speculative insertion token) */

	/* Fields below here must match MinimalTupleData! */

	uint16		t_infomask2;	/* number of attributes + various flags */

	uint16		t_infomask;		/* various flag bits, see below */

	uint8		t_hoff;			/* sizeof header incl. bitmap, padding */

	/* ^ - 23 bytes - ^ */

	bits8		t_bits[FLEXIBLE_ARRAY_MEMBER];	/* bitmap of NULLs */

	/* 
	 * MORE DATA FOLLOWS AT END OF STRUCT 
	 * 元组的具体数据将跟在这个结构后面
	 */
};
typedef struct HeapTupleHeaderData HeapTupleHeaderData;
typedef HeapTupleHeaderData *HeapTupleHeader;

其中：

t_choise：是具体两个成员的联合类型：
- t_heap：用于记录对元组执行插入/删除操作的事务ID和命令ID。
- t_datum：当一个新的元组在内存中形成的时候，并不关心事务可见性，因此t_choise中只需用DatumTupleFields来记录元组长度等信息，这是临时信息。在把该元组插入到表文件时，需要在元组头信息中记录插入该元组的事务和命令ID，此时会把t_choise转换为HeapTupleFields结构并填充相应数据后再进行元组的插入。

t_ctid：用于记录当前元组或新元组的物理位置，若元组被更新（删除旧版本元组，然后插入新版本元组），则记录的是新版本元组的物理位置。

t_ctid是一个ItemPointerData类型的结构体，其定义如下：

typedef struct ItemPointerData
{
	BlockIdData 	ip_blkid;
	OffsetNumber 	ip_posid;
}
/* If compiler understands packed and aligned pragmas, use those */
#if defined(pg_attribute_packed) && defined(pg_attribute_aligned)
pg_attribute_packed()
pg_attribute_aligned(2)
#endif
ItemPointerData;

其中：

ip_blkid：表示文件块编号。
ip_posid：表示该元组对应的ItemIdData数组的下标（ItemIdData数组见PageHeaderData）。

值得注意的是ip_blkid是一个BlockIdData的结构体，其定义如下：

typedef struct BlockIdData
{
	uint16		bi_hi;
	uint16		bi_lo;
} BlockIdData;

乍一看这个定义，感觉BlockIdData暗藏玄机，看起来bi_hi和bi_lo有什么特殊的用途，其实并不是。其实完全可以把BlockIdData看作一个uint32，之所以这里将一个uint32分为两个uint16，是为了ItemPointerData在字节对齐时节省空间。

/* sizeof(BlockIdData1) = 8*/
typedef struct ItemPointerData1
{
	uint32 	ip_blkid;
	uint16 	ip_posid;
}BlockIdData1;

/* sizeof(BlockIdData2) = 6*/
typedef struct ItemPointerData2
{
	uint16 	ip_blkid_hi;
    uint16 	ip_blkid_lo;
	uint16 	ip_posid;
}BlockIdData2;

注意：在插入元组时，t_ctid并不是在构建元组时就存在，而是在元组写入文件块对应的共享内存文件页后才设置的，其代码如下：

void
RelationPutHeapTuple(Relation relation,
					 Buffer buffer,
					 HeapTuple tuple,
					 bool token)
{
	Page		pageHeader;
	OffsetNumber offnum;
    
	Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));

	pageHeader = BufferGetPage(buffer);

    /* 
	 * Add the tuple to the page 
	 * 此时tuple中的t_ctid还是一个非法值
	 */
	offnum = PageAddItem(pageHeader, (Item) tuple->t_data,
						 tuple->t_len, InvalidOffsetNumber, false, true);

	if (offnum == InvalidOffsetNumber)
		elog(PANIC, "failed to add tuple to page");

	/* Update tuple->t_self to the actual position where it was stored */
	ItemPointerSet(&(tuple->t_self), BufferGetBlockNumber(buffer), offnum);

	/*
	 * Insert the correct position into CTID of the stored tuple, too (unless
	 * this is a speculative insertion, in which case the token is held in
	 * CTID field instead)
	 */
	if (!token)
	{
        /*
         * 修改tuple中的t_ctid
         */
		ItemId		itemId = PageGetItemId(pageHeader, offnum);
		Item		item = PageGetItem(pageHeader, itemId);

		((HeapTupleHeader) item)->t_ctid = tuple->t_self;
	}
}

t_infomask2：使用其低11位表示当前元组的属性个数，其他位用于包含用于HOT技术及元组可见性的标志位。
t_infomask：标识元组的当前状态，比如：是否具有OID、是否有空属性等，t_infomask的每一位对应不同的状态，共16种状态。
t_hoff：表示元组头的大小。
t_bits：用于标识该元组哪些字段为空。

HOT技术

PostgreSQL对元组采用多版本技术，对于元组的每个更新操作都会产生一个新版本，版本之间从老到新形成一条版本链。此外，更新操作不但会在表文件中产生元组的新版本，在表的每个索引中也会产生新版本的索引记录。即使更新操作没有修改索引属性，也会在每个索引中都产生一个新版本，这样就会浪费存储空间。

为了解决这个问题，PostgreSQL使用一种HOT机制，当更新的元组同时满足如下条件时，成为HOT元组：

所有索引属性都没有被修改。
更新的元组新版本与旧版本在同一个文件块内。

更新一条HOT元组将不会在索引中引入新版本，当通过索引获取元组时首先会找到同一块中最老的版本，然后顺着版本链向后找，直到遇到HOT元组为止。（这就是为什么需要限制新旧版本在同一文件块内，如果不在同一个块内，那么就会有额外的I\O）。

obvious__

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
PostgreSQL 基础模块---表和元组组织方式

参考资料《PostgreSQL数据库内核分析》彭智勇彭煜玮：P58~P60概述PostgreSQL是堆表，其中每个文件由多个块组成，块在物理磁盘中的存储形式如下图所示：块由4个部分组成：块头：PageHeaderData记录：记录由两部分组成Linp：Linp是ItemIdData类型，长度固定，在块中从前向后分配。每个ItemIdData都记录了一个偏移，用于指向Tuple。Tuple：记录的头信息，一条记录由Tuple+记录本身构成。记录本身长度不固定，在块中从后向前分配。空
复制链接

扫一扫