这一次终于把PostgreSQL Page结构搞懂了

最新推荐文章于 2024-08-06 11:12:07 发布

原创

最新推荐文章于 2024-08-06 11:12:07 发布 · 6.2k 阅读

21 ·

CC 4.0 BY-SA版权

文章标签：

#数据库 #postgresql #内核 #源码探究

PostgreSQL中的page指的是数据文件内部被划分的一个个固定长度的页，和Oracle中的数据块类似。page默认大小是8k，可以在编译数据库时通过–with-blocksize参数指定。

文件中的page从0开始一个个进行编号，当一个8k的page写满时就会在该page尾部追加一个新的page。这也是为什么pg中单表只能时32T，因为pg默认采用32位寻址，也就是说单张表的数据文件最多有2^32=4294967296个page。

言归正传，接下来我们来一起揭开page的庐山真面目。

1、page整体结构

下图是page的整体结构图：
在这里插入图片描述
我们通过上图，可以把page大致划分为以下5个部分：

其中比较重要的有3部分：

head info：文件头信息，大小位24B，记录了页面的一些元信息；
line pointers：行指针，每个行指针大小4B，保存指向行数据的指针；
heap tuples：堆元组的数据，就是page中记录存储的行数据的地方。

2、pageheader文件头结构

首先我们先来看看pageheader的结构，其定义在src/include/storage/bufpage.h
在这里插入图片描述
每部分的介绍大致如下：

在这里插入图片描述
其中pd_lsn表示的是最后修改过这个page的lsn（PostgreSQL LSN详解）。

pd_checksum表示的是校验和，默认是关闭的，可以在initdb时加上-k参数开启（PostgreSQL checksum与Data Corruption
）。

pd_flags为标志位，其取值可以有：

#define PD_HAS_FREE_LINES   0x0001  /* are there any unused line pointers? */

#define PD_PAGE_FULL        0x0002  /* not enough free space for new tuple? */

#define PD_ALL_VISIBLE      0x0004  /* all tuples on page are visible to * everyone */

#define PD_VALID_FLAG_BITS  0x0007  /* OR of all valid pd_flags bits */

pd_lower和pd_upper分别表示空闲空间起始位置和结束位置，这个后面我们会再详细介绍。

pd_special存放和索引方法相关的数据，普通的数据page不使用该参数，默认为pagesize大小。

pd_pagesize_version是版本的标志符，pg8.3之后都是为4。

pd_prune_xid表示这个page上最早删除或者修改tuple的事务id，在vacuum操作的时候会用到。

3、line pointers结构

行指针的结构如下：src/include/storage/itemid.h
在这里插入图片描述
lp_off：元组偏移量，例如我们普通表的page中special的位置是8192，如果插入一个长度为40B的元组，那么此时的偏移量就是8192-40 = 8152，这个我们后面会再介绍如何计算。

lp_flags：标志位，取值有下面几种：

#define LP_UNUSED		0		/* unused ( lp_len始终=0) */
#define LP_NORMAL		1		/* used (lp_len始终>0) */
#define LP_REDIRECT		2		/* HOT 重定向(lp_len必须=0) */
#define LP_DEAD			3		/* 死元组，不确定是否有存储 */

lp_len：元组的长度，单位为字节，后面我们会介绍如何计算。

4、HeapTupleHeaderData结构

接下来就是存放数据的heaptuple了，这里我们主要介绍下heaptuple的头部结构：
src/include/access/htup_details.h

typedef struct HeapTupleFields
{
   
   
	TransactionId t_xmin;		/* inserting xact ID */
	TransactionId t_xmax;		/* deleting or locking xact ID */

	union
	{
   
   
		CommandId	t_cid;		/* inserting or deleting command ID, or both */
		TransactionId t_xvac;	/* old-style VACUUM FULL xact ID */
	}			t_field3;
} HeapTupleFields;

typedef struct DatumTupleFields
{
   
   
	int32		datum_len_;		/* varlena header (do not touch directly!) */

	int32		datum_typmod;	/* -1, or identifier of a record type */

	Oid			datum_typeid;	/* composite type OID, or RECORDOID */

	/*
	 * datum_typeid cannot be a domain over composite, only plain composite,
	 * even if the datum is meant as a value of a domain-over-composite type.
	 * This is in line with the general principle that CoerceToDomain does not
	 * change the physical representation of the base type value.
	 *
	 * Note: field ordering is chosen with thought that Oid might someday
	 * widen to 64 bits.
	 */
} DatumTupleFields;

struct HeapTupleHeaderData
{
   
   
	union
	{
   
   
		HeapTupleFields t_heap;
		DatumTupleFields t_datum;
	}			t_choice;

	ItemPointerData t_ctid;		/* current TID of this or newer tuple (or a
								 * speculative insertion token) */

	/* Fields below here must match MinimalTupleData! */

#