rewriteheap.cpp分析--存储引擎（7）_高斯 row is too big: size %lu, maximum size %lu-CSDN博客

本文链接：https://blog.csdn.net/m0_66594489/article/details/133874184

引言

在之前的博客中，我们完成了heapam.cpp的学习，接下来我们将对和他同属一个文件夹下的rewriteheap.cpp展开学习解读文件路径 opengauss-server\src\gausskernel\storage\access\heap\rewriteheap.cpp

一、文件概述

文件提供重写表的函数，这些函数提供了在完全重写堆的同时保证了信息的可见性，同时更新信息链的功能。文件整体结构如下图所示，对部分函数的功能做了一定解释。整体框架如下图所示：

调用者负责创建新的堆，所有目录的更改，提供要写入新堆的元组，并重建索引。调用者必须对目标表持有独占访问锁，因为我们假设没有其他人正在向其写入。整体功能概述如下：

To use the facility:
begin_heap_rewrite
while (fetch next tuple)
{
if (tuple is dead)
rewrite_heap_dead_tuple
else
{
// do any transformations here if required
rewrite_heap_tuple
}
}
end_heap_rewrite

二、名词解释

TOAST:是”The Oversized-Attribute Storage Technique”的缩写，是Open Gauss用来处理大块数据以适应页面缓冲区的机制。在Open Gauss中，页（或者叫block）是数据在文件存储中的基本单位，其大小是固定的，并且只能在编译期指定，之后无法修改，默认的大小为8KB。由于页大小的限制，一行数据不能跨页存储，因此，对于超长的行数据，Open Gauss会启动TOAST，将大的字段压缩或切片成多个物理行存到另一张系统表中（TOAST表），这种存储方式叫行外存储。 TOAST代码识别四种不同数据类型的在磁盘上存储可TOAST列的策略

策略	避免压缩或线外存储	变长类型单字节头部	TOAST数据类型默认策略	允许线外存储	允许压缩
PLAIN	是	是	不可能	否	否
EXTENDED	否	否	是 (大多数)	是	是
EXTERNAL	否	否	不可能	是	否
MAIN	否	是	不可能	否 (但实际上会进行)

-HOT 仅堆元组（HOT，全称为Heap Only Tuple）特性消除了冗余的索引条目，并允许重用由DELETEd或废弃UPDATEd元组所占用的空间，无需执行表范围的垃圾回收。它通过允许单页面的垃圾回收（也称为“重排序”）实现这一点

-更新链 Update chain一条更新链由多个元组组成，每个元组的ctid都指向链中的下一个元组。一条HOT更新链（或称为部分更新链）由一个根元组和一个或多个仅堆部分（heap-only）元组组成。完整的更新链可以同时包含HOT和非HOT（冷）更新的元组。

在数据库中，更新链是一种机制，用于跟踪记录的历史版本。当一条记录被更新时，数据库不会直接修改原始记录，而是创建一个新的版本，并通过更新链将新旧版本链接起来。这样，数据库就可以通过遍历更新链来访问记录的历史版本。

-冷更新（Cold update） 一种正常的非HOT更新，其中为元组的新版本创建索引条目。

-HOT更新（HOT update） 一种UPDATE操作，其中新元组成为仅堆（heap-only）元组，并且不会创建新的索引条目。

三、代码解读

代码解读部分主要对raw_heap_insert、rewrite_write_one_page两个函数做分析阅读rewriteheap.cpp文件开头的一段注释

*We can't use the normal heap_insert function to insert into the new
* heap, because heap_insert overwrites the visibility information.......```

其展示的信息是：我们不能使用正常的heap_insert函数来插入新的堆，因为heap_insert会覆盖可见性信息。我们使用专用的raw_heap_insert函数，该函数针对批量插入大量元组进行了优化，知道我们对堆有独占访问权限。raw_heap_insert在本地存储中构建新页面。当一个页面满了，或者在处理结束时，我们将其作为一个记录插入到WAL中，然后直接通过smgr写入到磁盘。然而，请注意，发送到新堆的TOAST表的任何数据都将通过正常的bufmgr。接下来我们会对raw_heap_insert进行详细分析

Function name:raw_heap_insert

函数主要用于在重写堆时插入元组，并处理一些特殊情况，如TOAST值和大于阈值的元组。同时，它还处理了一些错误情况，并确保了数据的完整性. 完整源码以及注释如下：

/*
* This function is mainly used to insert tuples during heap rewrite and handles some special cases,
* such as TOAST values and tuples greater than the threshold.
* At the same time, it also handles some error situations and ensures data integrity
*/
static void raw_heap_insert(RewriteState state, HeapTuple tup)
{
Page page = state->rs_buffer;
Size page_free_space, save_free_space;
Size len;
OffsetNumber newoff;
HeapTuple heaptup;
TransactionId xmin, xmax;
if (tup != NULL)
Assert(TUPLE_IS_HEAP_TUPLE(tup));
else {
ereport(DEBUG5, (errmodule(MOD_TBLSPC), errmsg("tuple is null")));
return;
}
//If tup is not empty, it checks if tup is a heap tuple. If tup is empty,
// it reports an error at DEBUG5 level
/*
* If the new tuple is too big for storage or contains already toasted
* out-of-line attributes from some other relation, invoke the toaster.
*
* Note: below this point, heaptup is the data we actually intend to store
* into the relation; tup is the caller's original untoasted data.
*/
if (state->rs_new_rel->rd_rel->relkind == RELKIND_TOASTVALUE) {
/* toast table entries should never be recursively toasted */
Assert(!HeapTupleHasExternal(tup));
heaptup = tup;
} else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)
heaptup = toast_insert_or_update(state->rs_new_rel, tup, NULL,
HEAP_INSERT_SKIP_FSM | (state->rs_use_wal ? 0 : HEAP_INSERT_SKIP_WAL), NULL);
else
heaptup = tup;
len = MAXALIGN(heaptup->t_len); /* be conservative */
/*
* If we're gonna fail for oversize tuple, do it right away
*/
if (len > MaxHeapTupleSize)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("row is too big: size %lu, maximum size %lu",
(unsigned long)len, (unsigned long)MaxHeapTupleSize)));
/* Compute desired extra freespace due to fillfactor option */
save_free_space = RelationGetTargetPageFreeSpace(state->rs_new_rel, HEAP_DEFAULT_FILLFACTOR);
/* Now we can check to see if there's enough free space already. */
if (state->rs_buffer_valid) {
page_free_space = PageGetHeapFreeSpace(page);
if (len + save_free_space > page_free_space) {
rewrite_write_one_page(state, page);
state->rs_blockno++;
state->rs_buffer_valid = false;
}
}
//If the buffer is invalid, it initializes the page and sets some basic attributes
if (!state->rs_buffer_valid) {
HeapPageHeader phdr = (HeapPageHeader)page;
/* Initialize a new empty page */
PageInit(page, BLCKSZ, 0, true);
phdr->pd_xid_base = u_sess->utils_cxt.RecentXmin - FirstNormalTransactionId;
phdr->pd_multi_base = 0;
state->rs_buffer_valid = true;
const char* algo = RelationGetAlgo(state->rs_new_rel);
if (RelationisEncryptEnable(state->rs_new_rel) || (algo && *algo != '\0')) {
/*
* For the reason of saving TdeInfo,
* we need to move the pointer(pd_special) forward by the length of TdeInfo.
*/
phdr->pd_upper -= sizeof(TdePageInfo);
phdr->pd_special -= sizeof(TdePageInfo);
PageSetTDE(page);
}
}
// prepares the page to get xmin and xmax
xmin = HeapTupleGetRawXmin(heaptup);
xmax = HeapTupleGetRawXmax(heaptup);
rewrite_page_prepare_for_xid(page, xmin, false);
(void)rewrite_page_prepare_for_xid(page, xmax, (heaptup->t_data->t_infomask & HEAP_XMAX_IS_MULTI) ? true : false);
HeapTupleCopyBaseFromPage(heaptup, page);
HeapTupleSetXmin(heaptup, xmin);
HeapTupleSetXmax(heaptup, xmax);
/* And now we can insert the tuple into the page */
newoff = PageAddItem(page, (Item)heaptup->t_data, heaptup->t_len, InvalidOffsetNumber, false, true);
if (newoff == InvalidOffsetNumber)
ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("failed to add tuple")));
/* Update caller's t_self to the actual position where it was stored */
ItemPointerSet(&(tup->t_self), state->rs_blockno, newoff);
/*
* Insert the correct position into CTID of the stored tuple, too, if the
* caller didn't supply a valid CTID.
*/
if (!ItemPointerIsValid(&tup->t_data->t_ctid)) {
ItemId newitemid;
HeapTupleHeader onpage_tup;
newitemid = PageGetItemId(page, newoff);
onpage_tup = (HeapTupleHeader)PageGetItem(page, newitemid);
onpage_tup->t_ctid = tup->t_self;
}
/* If heaptup is a private copy, release it. */
if (heaptup != tup)
heap_freetuple(heaptup);
}

这段代码是一个名为raw_heap_insert的函数，它的主要目的是将一个元组插入到新的堆中。以下是对该函数的详细解读：

首先，函数接收两个参数：一个RewriteState类型的state和一个HeapTuple类型的tup。
如果tup不为空，它会检查tup是否是堆元组。如果tup为空，则会报告一个DEBUG5级别的错误。
接下来，函数检查新关系的类型是否为TOAST值。如果是，它会断言tup没有外部元素，并将tup赋值给heaptup。否则，如果tup有外部元素或者tup的长度大于TOAST_TUPLE_THRESHOLD，它会调用toast_insert_or_update函数并将结果赋值给heaptup。如果都不满足，它会直接将tup赋值给heaptup。
然后，函数检查元组长度是否超过了最大堆元组大小。如果超过了，它会报告一个错误。
接下来，函数获取新关系的目标页面空闲空间。
如果缓冲区有效，并且页面空闲空间小于元组长度加上保存的空闲空间，那么它会写入一个页面并增加块号。
如果缓冲区无效，那么它会初始化页面，并设置一些基本属性。
然后，函数准备页面以获取xmin和xmax。
接下来，函数将元组添加到页面中，并设置t_self指针。
如果t_ctid无效，那么它会在页面上设置正确的t_ctid。
最后，如果heaptup不等于tup，那么它会释放heaptup。

这个函数主要用于在重写堆时插入元组，并处理一些特殊情况，如TOAST值和大于阈值的元组。同时，它还处理了一些错误情况，并确保了数据的完整性。

Function name:rewrite_write_one_page

这个函数主要用于数据库系统的数据页重写过程，在数据页被修改后，需要将修改后的数据页写回到磁盘中。在这个过程中，可能涉及到一些特殊操作，如加密、日志记录等。这就是这个函数的主要作用。完整源码以及注释如下：

/*
* Function name:rewrite_write_one_page
* Description:This function is mainly used in the data page rewriting process
* of a database system. After a data page is modified, it needs to write back
* the modified data page to disk. In this process, it may involve some special
* operations such as encryption, logging, etc.
*/
static void rewrite_write_one_page(RewriteState state, Page page)
{
TdeInfo tde_info = {0};
if (RelationisEncryptEnable(state->rs_new_rel)) {
GetTdeInfoFromRel(state->rs_new_rel, &tde_info);
}
/*
* If the new relation (`state->rs_new_rel`) has encryption enabled,
* it gets the TDE information from this relation.
*/
if (IsSegmentFileNode(state->rs_new_rel->rd_node)) {
//checks if the new relation is a segment file node
Assert(state->rs_use_wal);
Buffer buf = ReadBuffer(state->rs_new_rel, P_NEW);
//reads a buffer of the new relation
#ifdef USE_ASSERT_CHECKING
BufferDesc *buf_desc = GetBufferDescriptor(buf - 1);
Assert(buf_desc->tag.blockNum == state->rs_blockno);
#endif
//check the rs_blockno if the USE_ASSERT_CHEKING is defined.
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
XLogRecPtr xlog_ptr = log_newpage(&state->rs_new_rel->rd_node, MAIN_FORKNUM, state->rs_blockno, page, true,
&tde_info);
//locks this buffer
errno_t rc = memcpy_s(BufferGetBlock(buf), BLCKSZ, page, BLCKSZ);
//copier the page data to the buffer block
securec_check(rc, "\0", "\0");
PageSetLSN(BufferGetPage(buf), xlog_ptr);//sets the LSN of the buffer block
MarkBufferDirty(buf);
UnlockReleaseBuffer(buf);
} else {
/* check tablespace size limitation when extending new file. */
STORAGE_SPACE_OPERATION(state->rs_new_rel, BLCKSZ);
if (state->rs_use_wal) {
log_newpage(&state->rs_new_rel->rd_node, MAIN_FORKNUM, state->rs_blockno, page, true, &tde_info);
}
// if the WAL is used,it logs a new page.
RelationOpenSmgr(state->rs_new_rel);
//open the storage manager of the new relation.
char *bufToWrite = NULL;
if (RelationisEncryptEnable(state->rs_new_rel)) {
bufToWrite = PageDataEncryptIfNeed(page, &tde_info, true);
} else {
bufToWrite = page;
}
//it needs to encrypt the page data when the new relation has encryption enabled.
PageSetChecksumInplace((Page)bufToWrite, state->rs_blockno);
//set page checksum in the correct location .
rewrite_flush_page(state, (Page)bufToWrite);
//flushes the page state
}
}

Function name:cmpr_heap_insert

这个函数主要用于数据库系统的数据页重写过程，在数据页被修改后，需要将修改后的数据页写回到磁盘中。在这个过程中，可能涉及到一些特殊操作，如压缩、事务处理等。完整源码以及注释如下：

/*
* Function name:cmpr_heap_insert
* Description:This function is mainly used in the data page rewriting process of a database system.
* After a data page is modified, it needs to write back the modified data page to disk.
* In this process, it may involve some special operations such as compression, transaction handling, etc.
*/
static void cmpr_heap_insert(RewriteState state, HeapTuple tup)
{
Page page = state->rs_cmprBuffer;
Size len;
OffsetNumber newoff;
TransactionId xmin, xmax;
Assert(state->rs_new_rel->rd_rel->relkind != RELKIND_TOASTVALUE);
Assert(!HeapTupleHasExternal(tup) && !(tup->t_len > TOAST_TUPLE_THRESHOLD));
xmin = HeapTupleGetRawXmin(tup);
xmax = HeapTupleGetRawXmax(tup);
rewrite_page_prepare_for_xid(page, xmin, false);
(void)rewrite_page_prepare_for_xid(page, xmax, (tup->t_data->t_infomask & HEAP_XMAX_IS_MULTI) ? true : false);
//It gets the xmin and xmax transaction IDs of the tuple and prepares the page to handle these transaction IDs
HeapTupleCopyBaseFromPage(tup, page);
HeapTupleSetXmin(tup, xmin);
HeapTupleSetXmax(tup, xmax);
len = MAXALIGN(tup->t_len);
if (len > MaxHeapTupleSize)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("row is too big: size %lu, maximum size %lu",
(unsigned long)len, (unsigned long)MaxHeapTupleSize)));
//It checks if the length of the tuple exceeds the maximum heap tuple size. If it does, it reports an error.
Assert(PageIsCompressed(page));
//ensure page has been compressed
/* And now we can insert the tuple into the page */
newoff = PageAddItem(page, (Item)tup->t_data, tup->t_len, InvalidOffsetNumber, false, true);
Assert(newoff != InvalidOffsetNumber);
/* Update caller's t_self to the actual position where it was stored */
ItemPointerSet(&(tup->t_self), state->rs_blockno, newoff);
// If the CTID of the stored tuple is invalid, it updates the CTID of the stored tuple on the page to its actual location.
if (!ItemPointerIsValid(&tup->t_data->t_ctid)) {
ItemId newitemid = PageGetItemId(page, newoff);
HeapTupleHeader onpage_tup = (HeapTupleHeader)PageGetItem(page, newitemid);
onpage_tup->t_ctid = tup->t_self;
}
}