引言
在之前的博客中,我们完成了heapam.cpp的学习,接下来我们将对和他同属一个文件夹下的rewriteheap.cpp展开学习解读 文件路径 opengauss-server\src\gausskernel\storage\access\heap\rewriteheap.cpp
一、文件概述
文件提供重写表的函数,这些函数提供了在完全重写堆的同时保证了信息的可见性,同时更新信息链的功能。文件整体结构如下图所示,对部分函数的功能做了一定解释。整体框架如下图所示:
调用者负责创建新的堆,所有目录的更改,提供要写入新堆的元组,并重建索引。调用者必须对目标表持有独占访问锁,因为我们假设没有其他人正在向其写入。 整体功能概述如下:
To use the facility:
begin_heap_rewrite
while (fetch next tuple)
{
if (tuple is dead)
rewrite_heap_dead_tuple
else
{
// do any transformations here if required
rewrite_heap_tuple
}
}
end_heap_rewrite
二、名词解释
TOAST:是”The Oversized-Attribute Storage Technique”的缩写,是Open Gauss用来处理大块数据以适应页面缓冲区的机制。在Open Gauss中,页(或者叫block)是数据在文件存储中的基本单位,其大小是固定的,并且只能在编译期指定,之后无法修改,默认的大小为8KB。由于页大小的限制,一行数据不能跨页存储,因此,对于超长的行数据,Open Gauss会启动TOAST,将大的字段压缩或切片成多个物理行存到另一张系统表中(TOAST表),这种存储方式叫行外存储。 TOAST代码识别四种不同数据类型的在磁盘上存储可TOAST列的策略
策略 | 避免压缩或线外存储 | 变长类型单字节头部 | TOAST数据类型默认策略 | 允许线外存储 | 允许压缩 |
---|---|---|---|---|---|
PLAIN | 是 | 是 | 不可能 | 否 | 否 |
EXTENDED | 否 | 否 | 是 (大多数) | 是 | 是 |
EXTERNAL | 否 | 否 | 不可能 | 是 | 否 |
MAIN | 否 | 是 | 不可能 | 否 (但实际上会进行) |
-HOT 仅堆元组(HOT,全称为Heap Only Tuple)特性消除了冗余的索引条目,并允许重用由DELETEd或废弃UPDATEd元组所占用的空间,无需执行表范围的垃圾回收。它通过允许单页面的垃圾回收(也称为“重排序”)实现这一点
-更新链 Update chain一条更新链由多个元组组成,每个元组的ctid都指向链中的下一个元组。一条HOT更新链(或称为部分更新链)由一个根元组和一个或多个仅堆部分(heap-only)元组组成。完整的更新链可以同时包含HOT和非HOT(冷)更新的元组。
在数据库中,更新链是一种机制,用于跟踪记录的历史版本。当一条记录被更新时,数据库不会直接修改原始记录,而是创建一个新的版本,并通过更新链将新旧版本链接起来。这样,数据库就可以通过遍历更新链来访问记录的历史版本。
-冷更新(Cold update) 一种正常的非HOT更新,其中为元组的新版本创建索引条目。
-HOT更新(HOT update) 一种UPDATE操作,其中新元组成为仅堆(heap-only)元组,并且不会创建新的索引条目。
三、代码解读
代码解读部分主要对raw_heap_insert、rewrite_write_one_page两个函数做分析 阅读rewriteheap.cpp文件开头的一段注释
*We can't use the normal heap_insert function to insert into the new
* heap, because heap_insert overwrites the visibility information.......```
其展示的信息是:我们不能使用正常的heap_insert函数来插入新的堆,因为heap_insert会覆盖可见性信息。我们使用专用的raw_heap_insert函数,该函数针对批量插入大量元组进行了优化,知道我们对堆有独占访问权限。raw_heap_insert在本地存储中构建新页面。当一个页面满了,或者在处理结束时,我们将其作为一个记录插入到WAL中,然后直接通过smgr写入到磁盘。然而,请注意,发送到新堆的TOAST表的任何数据都将通过正常的bufmgr。 接下来我们会对raw_heap_insert进行详细分析
Function name:raw_heap_insert
函数主要用于在重写堆时插入元组,并处理一些特殊情况,如TOAST值和大于阈值的元组。同时,它还处理了一些错误情况,并确保了数据的完整性. 完整源码以及注释如下:
/*
* This function is mainly used to insert tuples during heap rewrite and handles some special cases,
* such as TOAST values and tuples greater than the threshold.
* At the same time, it also handles some error situations and ensures data integrity
*/
static void raw_heap_insert(RewriteState state, HeapTuple tup)
{
Page page = state->rs_buffer;
Size page_free_space, save_free_space;
Size len;
OffsetNumber newoff;
HeapTuple heaptup;
TransactionId xmin, xmax;
if (tup != NULL)
Assert(TUPLE_IS_HEAP_TUPLE(tup));
else {
ereport(DEBUG5, (errmodule(MOD_TBLSPC), errmsg("tuple is null")));
return;
}
//If tup is not empty, it checks if tup is a heap tuple. If tup is empty,
// it reports an error at DEBUG5 level
/*
* If the new tuple is too big for storage or contains already toasted
* out-of-line attributes from some other relation, invoke the toaster.
*
* Note: below this point, heaptup is the data we actually intend to store
* into the relation; tup is the caller's original untoasted data.
*/
if (state->rs_new_rel->rd_rel->relkind == RELKIND_TOASTVALUE) {
/* toast table entries should never be recursively toasted */
Assert(!HeapTupleHasExternal(tup));
heaptup = tup;
} else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)
heaptup = toast_insert_or_update(state->rs_new_rel, tup, NULL,
HEAP_INSERT_SKIP_FSM | (state->rs_use_wal ? 0 : HEAP_INSERT_SKIP_WAL), NULL);
else
heaptup = tup;
len = MAXALIGN(heaptup->t_len); /* be conservative */
/*
* If we're gonna fail for oversize tuple, do it right away
*/
if (len > MaxHeapTupleSize)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("row is too big: size %lu, maximum size %lu",
(unsigned long)len, (unsigned long)MaxHeapTupleSize)));
/* Compute desired extra freespace due to fillfactor option */
save_free_space = RelationGetTargetPageFreeSpace(state->rs_new_rel, HEAP_DEFAULT_FILLFACTOR);
/* Now we can check to see if there's enough free space already. */
if (state->rs_buffer_valid) {
page_free_space = PageGetHeapFreeSpace(page);
if (len + save_free_space > page_free_space) {
rewrite_write_one_page(state, page);
state->rs_blockno++;
state->rs_buffer_valid = false;
}
}
//If the buffer is invalid, it initializes the page and sets some basic attributes
if (!state->rs_buffer_valid) {
HeapPageHeader phdr = (HeapPageHeader)page;
/* Initialize a new empty page */
PageInit(page, BLCKSZ, 0, true);
phdr->pd_xid_base = u_sess->utils_cxt.RecentXmin - FirstNormalTransactionId;
phdr->pd_multi_base = 0;
state->rs_buffer_valid = true;
const char* algo = RelationGetAlgo(state->rs_new_rel);
if (RelationisEncryptEnable(state->rs_new_rel) || (algo && *algo != '\0')) {
/*
* For the reason of saving TdeInfo,
* we need to move the pointer(pd_special) forward by the length of TdeInfo.
*/
phdr->pd_upper -= sizeof(TdePageInfo);
phdr->pd_special -= sizeof(TdePageInfo);
PageSetTDE(page);
}
}
// prepares the page to get xmin and xmax
xmin = HeapTupleGetRawXmin(heaptup);
xmax = HeapTupleGetRawXmax(heaptup);
rewrite_page_prepare_for_xid(page, xmin, false);
(void)rewrite_page_prepare_for_xid(page, xmax, (heaptup->t_data->t_infomask & HEAP_XMAX_IS_MULTI) ? true : false);
HeapTupleCopyBaseFromPage(heaptup, page);
HeapTupleSetXmin(heaptup, xmin);
HeapTupleSetXmax(heaptup, xmax);
/* And now we can insert the tuple into the page */
newoff = PageAddItem(page, (Item)heaptup->t_data, heaptup->t_len, InvalidOffsetNumber, false, true);
if (newoff == InvalidOffsetNumber)
ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("failed to add tuple")));
/* Update caller's t_self to the actual position where it was stored */
ItemPointerSet(&(tup->t_self), state->rs_blockno, newoff);
/*
* Insert the correct position into CTID of the stored tuple, too, if the
* caller didn't supply a valid CTID.
*/
if (!ItemPointerIsValid(&tup->t_data->t_ctid)) {
ItemId newitemid;
HeapTupleHeader onpage_tup;
newitemid = PageGetItemId(page, newoff);
onpage_tup = (HeapTupleHeader)PageGetItem(page, newitemid);
onpage_tup->t_ctid = tup->t_self;
}
/* If heaptup is a private copy, release it. */
if (heaptup != tup)
heap_freetuple(heaptup);
}
这段代码是一个名为raw_heap_insert
的函数,它的主要目的是将一个元组插入到新的堆中。以下是对该函数的详细解读:
-
首先,函数接收两个参数:一个
RewriteState
类型的state
和一个HeapTuple
类型的tup
。 -
如果
tup
不为空,它会检查tup
是否是堆元组。如果tup
为空,则会报告一个DEBUG5级别的错误。 -
接下来,函数检查新关系的类型是否为TOAST值。如果是,它会断言
tup
没有外部元素,并将tup
赋值给heaptup
。否则,如果tup
有外部元素或者tup
的长度大于TOAST_TUPLE_THRESHOLD,它会调用toast_insert_or_update函数并将结果赋值给heaptup
。如果都不满足,它会直接将tup
赋值给heaptup
。 -
然后,函数检查元组长度是否超过了最大堆元组大小。如果超过了,它会报告一个错误。
-
接下来,函数获取新关系的目标页面空闲空间。
-
如果缓冲区有效,并且页面空闲空间小于元组长度加上保存的空闲空间,那么它会写入一个页面并增加块号。
-
如果缓冲区无效,那么它会初始化页面,并设置一些基本属性。
-
然后,函数准备页面以获取xmin和xmax。
-
接下来,函数将元组添加到页面中,并设置t_self指针。
-
如果t_ctid无效,那么它会在页面上设置正确的t_ctid。
-
最后,如果heaptup不等于tup,那么它会释放heaptup。
这个函数主要用于在重写堆时插入元组,并处理一些特殊情况,如TOAST值和大于阈值的元组。同时,它还处理了一些错误情况,并确保了数据的完整性。
Function name:rewrite_write_one_page
这个函数主要用于数据库系统的数据页重写过程,在数据页被修改后,需要将修改后的数据页写回到磁盘中。在这个过程中,可能涉及到一些特殊操作,如加密、日志记录等。这就是这个函数的主要作用。 完整源码以及注释如下:
/*
* Function name:rewrite_write_one_page
* Description:This function is mainly used in the data page rewriting process
* of a database system. After a data page is modified, it needs to write back
* the modified data page to disk. In this process, it may involve some special
* operations such as encryption, logging, etc.
*/
static void rewrite_write_one_page(RewriteState state, Page page)
{
TdeInfo tde_info = {0};
if (RelationisEncryptEnable(state->rs_new_rel)) {
GetTdeInfoFromRel(state->rs_new_rel, &tde_info);
}
/*
* If the new relation (`state->rs_new_rel`) has encryption enabled,
* it gets the TDE information from this relation.
*/
if (IsSegmentFileNode(state->rs_new_rel->rd_node)) {
//checks if the new relation is a segment file node
Assert(state->rs_use_wal);
Buffer buf = ReadBuffer(state->rs_new_rel, P_NEW);
//reads a buffer of the new relation
#ifdef USE_ASSERT_CHECKING
BufferDesc *buf_desc = GetBufferDescriptor(buf - 1);
Assert(buf_desc->tag.blockNum == state->rs_blockno);
#endif
//check the rs_blockno if the USE_ASSERT_CHEKING is defined.
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
XLogRecPtr xlog_ptr = log_newpage(&state->rs_new_rel->rd_node, MAIN_FORKNUM, state->rs_blockno, page, true,
&tde_info);
//locks this buffer
errno_t rc = memcpy_s(BufferGetBlock(buf), BLCKSZ, page, BLCKSZ);
//copier the page data to the buffer block
securec_check(rc, "\0", "\0");
PageSetLSN(BufferGetPage(buf), xlog_ptr);//sets the LSN of the buffer block
MarkBufferDirty(buf);
UnlockReleaseBuffer(buf);
} else {
/* check tablespace size limitation when extending new file. */
STORAGE_SPACE_OPERATION(state->rs_new_rel, BLCKSZ);
if (state->rs_use_wal) {
log_newpage(&state->rs_new_rel->rd_node, MAIN_FORKNUM, state->rs_blockno, page, true, &tde_info);
}
// if the WAL is used,it logs a new page.
RelationOpenSmgr(state->rs_new_rel);
//open the storage manager of the new relation.
char *bufToWrite = NULL;
if (RelationisEncryptEnable(state->rs_new_rel)) {
bufToWrite = PageDataEncryptIfNeed(page, &tde_info, true);
} else {
bufToWrite = page;
}
//it needs to encrypt the page data when the new relation has encryption enabled.
PageSetChecksumInplace((Page)bufToWrite, state->rs_blockno);
//set page checksum in the correct location .
rewrite_flush_page(state, (Page)bufToWrite);
//flushes the page state
}
}
Function name:cmpr_heap_insert
这个函数主要用于数据库系统的数据页重写过程,在数据页被修改后,需要将修改后的数据页写回到磁盘中。在这个过程中,可能涉及到一些特殊操作,如压缩、事务处理等。完整源码以及注释如下:
/*
* Function name:cmpr_heap_insert
* Description:This function is mainly used in the data page rewriting process of a database system.
* After a data page is modified, it needs to write back the modified data page to disk.
* In this process, it may involve some special operations such as compression, transaction handling, etc.
*/
static void cmpr_heap_insert(RewriteState state, HeapTuple tup)
{
Page page = state->rs_cmprBuffer;
Size len;
OffsetNumber newoff;
TransactionId xmin, xmax;
Assert(state->rs_new_rel->rd_rel->relkind != RELKIND_TOASTVALUE);
Assert(!HeapTupleHasExternal(tup) && !(tup->t_len > TOAST_TUPLE_THRESHOLD));
xmin = HeapTupleGetRawXmin(tup);
xmax = HeapTupleGetRawXmax(tup);
rewrite_page_prepare_for_xid(page, xmin, false);
(void)rewrite_page_prepare_for_xid(page, xmax, (tup->t_data->t_infomask & HEAP_XMAX_IS_MULTI) ? true : false);
//It gets the xmin and xmax transaction IDs of the tuple and prepares the page to handle these transaction IDs
HeapTupleCopyBaseFromPage(tup, page);
HeapTupleSetXmin(tup, xmin);
HeapTupleSetXmax(tup, xmax);
len = MAXALIGN(tup->t_len);
if (len > MaxHeapTupleSize)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("row is too big: size %lu, maximum size %lu",
(unsigned long)len, (unsigned long)MaxHeapTupleSize)));
//It checks if the length of the tuple exceeds the maximum heap tuple size. If it does, it reports an error.
Assert(PageIsCompressed(page));
//ensure page has been compressed
/* And now we can insert the tuple into the page */
newoff = PageAddItem(page, (Item)tup->t_data, tup->t_len, InvalidOffsetNumber, false, true);
Assert(newoff != InvalidOffsetNumber);
/* Update caller's t_self to the actual position where it was stored */
ItemPointerSet(&(tup->t_self), state->rs_blockno, newoff);
// If the CTID of the stored tuple is invalid, it updates the CTID of the stored tuple on the page to its actual location.
if (!ItemPointerIsValid(&tup->t_data->t_ctid)) {
ItemId newitemid = PageGetItemId(page, newoff);
HeapTupleHeader onpage_tup = (HeapTupleHeader)PageGetItem(page, newitemid);
onpage_tup->t_ctid = tup->t_self;
}
}
四、小结
本次主要对rewriteheap.cpp的文件结构功能做了分析,了解了相关专业名词的释义并且对其中部分函数做了注释。
接下来将继续学习rewriteheap.cpp,注释其他函数,并且更深一步了解专业名词所代表的相关技术以及其实现原理。