rewriteheap.cpp分析--存储引擎(7)

引言

在之前的博客中,我们完成了heapam.cpp的学习,接下来我们将对和他同属一个文件夹下的rewriteheap.cpp展开学习解读 文件路径 opengauss-server\src\gausskernel\storage\access\heap\rewriteheap.cpp

一、文件概述

文件提供重写表的函数,这些函数提供了在完全重写堆的同时保证了信息的可见性,同时更新信息链的功能。文件整体结构如下图所示,对部分函数的功能做了一定解释。整体框架如下图所示:

调用者负责创建新的堆,所有目录的更改,提供要写入新堆的元组,并重建索引。调用者必须对目标表持有独占访问锁,因为我们假设没有其他人正在向其写入。 整体功能概述如下:

 
  1. To use the facility:
  2. begin_heap_rewrite
  3. while (fetch next tuple)
  4. {
  5. if (tuple is dead)
  6. rewrite_heap_dead_tuple
  7. else
  8. {
  9. // do any transformations here if required
  10. rewrite_heap_tuple
  11. }
  12. }
  13. end_heap_rewrite

二、名词解释

TOAST:是”The Oversized-Attribute Storage Technique”的缩写,是Open Gauss用来处理大块数据以适应页面缓冲区的机制。在Open Gauss中,页(或者叫block)是数据在文件存储中的基本单位,其大小是固定的,并且只能在编译期指定,之后无法修改,默认的大小为8KB。由于页大小的限制,一行数据不能跨页存储,因此,对于超长的行数据,Open Gauss会启动TOAST,将大的字段压缩或切片成多个物理行存到另一张系统表中(TOAST表),这种存储方式叫行外存储。 TOAST代码识别四种不同数据类型的在磁盘上存储可TOAST列的策略

策略避免压缩或线外存储变长类型单字节头部TOAST数据类型默认策略允许线外存储允许压缩
PLAIN不可能
EXTENDED是 (大多数)
EXTERNAL不可能
MAIN不可能否 (但实际上会进行)

-HOT 仅堆元组(HOT,全称为Heap Only Tuple)特性消除了冗余的索引条目,并允许重用由DELETEd或废弃UPDATEd元组所占用的空间,无需执行表范围的垃圾回收。它通过允许单页面的垃圾回收(也称为“重排序”)实现这一点

-更新链 Update chain一条更新链由多个元组组成,每个元组的ctid都指向链中的下一个元组。一条HOT更新链(或称为部分更新链)由一个根元组和一个或多个仅堆部分(heap-only)元组组成。完整的更新链可以同时包含HOT和非HOT(冷)更新的元组。

在数据库中,更新链是一种机制,用于跟踪记录的历史版本。当一条记录被更新时,数据库不会直接修改原始记录,而是创建一个新的版本,并通过更新链将新旧版本链接起来。这样,数据库就可以通过遍历更新链来访问记录的历史版本。

-冷更新(Cold update) 一种正常的非HOT更新,其中为元组的新版本创建索引条目。

-HOT更新(HOT update) 一种UPDATE操作,其中新元组成为仅堆(heap-only)元组,并且不会创建新的索引条目。

三、代码解读

代码解读部分主要对raw_heap_insert、rewrite_write_one_page两个函数做分析 阅读rewriteheap.cpp文件开头的一段注释

 
  1. *We can't use the normal heap_insert function to insert into the new
  2. * heap, because heap_insert overwrites the visibility information.......```

其展示的信息是:我们不能使用正常的heap_insert函数来插入新的堆,因为heap_insert会覆盖可见性信息。我们使用专用的raw_heap_insert函数,该函数针对批量插入大量元组进行了优化,知道我们对堆有独占访问权限。raw_heap_insert在本地存储中构建新页面。当一个页面满了,或者在处理结束时,我们将其作为一个记录插入到WAL中,然后直接通过smgr写入到磁盘。然而,请注意,发送到新堆的TOAST表的任何数据都将通过正常的bufmgr。 接下来我们会对raw_heap_insert进行详细分析

Function name:raw_heap_insert

函数主要用于在重写堆时插入元组,并处理一些特殊情况,如TOAST值和大于阈值的元组。同时,它还处理了一些错误情况,并确保了数据的完整性. 完整源码以及注释如下:

 
  1. /*
  2. * This function is mainly used to insert tuples during heap rewrite and handles some special cases,
  3. * such as TOAST values and tuples greater than the threshold.
  4. * At the same time, it also handles some error situations and ensures data integrity
  5. */
  6. static void raw_heap_insert(RewriteState state, HeapTuple tup)
  7. {
  8. Page page = state->rs_buffer;
  9. Size page_free_space, save_free_space;
  10. Size len;
  11. OffsetNumber newoff;
  12. HeapTuple heaptup;
  13. TransactionId xmin, xmax;
  14. if (tup != NULL)
  15. Assert(TUPLE_IS_HEAP_TUPLE(tup));
  16. else {
  17. ereport(DEBUG5, (errmodule(MOD_TBLSPC), errmsg("tuple is null")));
  18. return;
  19. }
  20. //If tup is not empty, it checks if tup is a heap tuple. If tup is empty,
  21. // it reports an error at DEBUG5 level
  22. /*
  23. * If the new tuple is too big for storage or contains already toasted
  24. * out-of-line attributes from some other relation, invoke the toaster.
  25. *
  26. * Note: below this point, heaptup is the data we actually intend to store
  27. * into the relation; tup is the caller's original untoasted data.
  28. */
  29. if (state->rs_new_rel->rd_rel->relkind == RELKIND_TOASTVALUE) {
  30. /* toast table entries should never be recursively toasted */
  31. Assert(!HeapTupleHasExternal(tup));
  32. heaptup = tup;
  33. } else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)
  34. heaptup = toast_insert_or_update(state->rs_new_rel, tup, NULL,
  35. HEAP_INSERT_SKIP_FSM | (state->rs_use_wal ? 0 : HEAP_INSERT_SKIP_WAL), NULL);
  36. else
  37. heaptup = tup;
  38. len = MAXALIGN(heaptup->t_len); /* be conservative */
  39. /*
  40. * If we're gonna fail for oversize tuple, do it right away
  41. */
  42. if (len > MaxHeapTupleSize)
  43. ereport(ERROR,
  44. (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("row is too big: size %lu, maximum size %lu",
  45. (unsigned long)len, (unsigned long)MaxHeapTupleSize)));
  46. /* Compute desired extra freespace due to fillfactor option */
  47. save_free_space = RelationGetTargetPageFreeSpace(state->rs_new_rel, HEAP_DEFAULT_FILLFACTOR);
  48. /* Now we can check to see if there's enough free space already. */
  49. if (state->rs_buffer_valid) {
  50. page_free_space = PageGetHeapFreeSpace(page);
  51. if (len + save_free_space > page_free_space) {
  52. rewrite_write_one_page(state, page);
  53. state->rs_blockno++;
  54. state->rs_buffer_valid = false;
  55. }
  56. }
  57. //If the buffer is invalid, it initializes the page and sets some basic attributes
  58. if (!state->rs_buffer_valid) {
  59. HeapPageHeader phdr = (HeapPageHeader)page;
  60. /* Initialize a new empty page */
  61. PageInit(page, BLCKSZ, 0, true);
  62. phdr->pd_xid_base = u_sess->utils_cxt.RecentXmin - FirstNormalTransactionId;
  63. phdr->pd_multi_base = 0;
  64. state->rs_buffer_valid = true;
  65. const char* algo = RelationGetAlgo(state->rs_new_rel);
  66. if (RelationisEncryptEnable(state->rs_new_rel) || (algo && *algo != '\0')) {
  67. /*
  68. * For the reason of saving TdeInfo,
  69. * we need to move the pointer(pd_special) forward by the length of TdeInfo.
  70. */
  71. phdr->pd_upper -= sizeof(TdePageInfo);
  72. phdr->pd_special -= sizeof(TdePageInfo);
  73. PageSetTDE(page);
  74. }
  75. }
  76. // prepares the page to get xmin and xmax
  77. xmin = HeapTupleGetRawXmin(heaptup);
  78. xmax = HeapTupleGetRawXmax(heaptup);
  79. rewrite_page_prepare_for_xid(page, xmin, false);
  80. (void)rewrite_page_prepare_for_xid(page, xmax, (heaptup->t_data->t_infomask & HEAP_XMAX_IS_MULTI) ? true : false);
  81. HeapTupleCopyBaseFromPage(heaptup, page);
  82. HeapTupleSetXmin(heaptup, xmin);
  83. HeapTupleSetXmax(heaptup, xmax);
  84. /* And now we can insert the tuple into the page */
  85. newoff = PageAddItem(page, (Item)heaptup->t_data, heaptup->t_len, InvalidOffsetNumber, false, true);
  86. if (newoff == InvalidOffsetNumber)
  87. ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("failed to add tuple")));
  88. /* Update caller's t_self to the actual position where it was stored */
  89. ItemPointerSet(&(tup->t_self), state->rs_blockno, newoff);
  90. /*
  91. * Insert the correct position into CTID of the stored tuple, too, if the
  92. * caller didn't supply a valid CTID.
  93. */
  94. if (!ItemPointerIsValid(&tup->t_data->t_ctid)) {
  95. ItemId newitemid;
  96. HeapTupleHeader onpage_tup;
  97. newitemid = PageGetItemId(page, newoff);
  98. onpage_tup = (HeapTupleHeader)PageGetItem(page, newitemid);
  99. onpage_tup->t_ctid = tup->t_self;
  100. }
  101. /* If heaptup is a private copy, release it. */
  102. if (heaptup != tup)
  103. heap_freetuple(heaptup);
  104. }

这段代码是一个名为raw_heap_insert的函数,它的主要目的是将一个元组插入到新的堆中。以下是对该函数的详细解读:

  • 首先,函数接收两个参数:一个RewriteState类型的state和一个HeapTuple类型的tup

  • 如果tup不为空,它会检查tup是否是堆元组。如果tup为空,则会报告一个DEBUG5级别的错误。

  • 接下来,函数检查新关系的类型是否为TOAST值。如果是,它会断言tup没有外部元素,并将tup赋值给heaptup。否则,如果tup有外部元素或者tup的长度大于TOAST_TUPLE_THRESHOLD,它会调用toast_insert_or_update函数并将结果赋值给heaptup。如果都不满足,它会直接将tup赋值给heaptup

  • 然后,函数检查元组长度是否超过了最大堆元组大小。如果超过了,它会报告一个错误。

  • 接下来,函数获取新关系的目标页面空闲空间。

  • 如果缓冲区有效,并且页面空闲空间小于元组长度加上保存的空闲空间,那么它会写入一个页面并增加块号。

  • 如果缓冲区无效,那么它会初始化页面,并设置一些基本属性。

  • 然后,函数准备页面以获取xmin和xmax。

  • 接下来,函数将元组添加到页面中,并设置t_self指针。

  • 如果t_ctid无效,那么它会在页面上设置正确的t_ctid。

  • 最后,如果heaptup不等于tup,那么它会释放heaptup。

这个函数主要用于在重写堆时插入元组,并处理一些特殊情况,如TOAST值和大于阈值的元组。同时,它还处理了一些错误情况,并确保了数据的完整性。

Function name:rewrite_write_one_page

这个函数主要用于数据库系统的数据页重写过程,在数据页被修改后,需要将修改后的数据页写回到磁盘中。在这个过程中,可能涉及到一些特殊操作,如加密、日志记录等。这就是这个函数的主要作用。 完整源码以及注释如下:

 
  1. /*
  2. * Function name:rewrite_write_one_page
  3. * Description:This function is mainly used in the data page rewriting process
  4. * of a database system. After a data page is modified, it needs to write back
  5. * the modified data page to disk. In this process, it may involve some special
  6. * operations such as encryption, logging, etc.
  7. */
  8. static void rewrite_write_one_page(RewriteState state, Page page)
  9. {
  10. TdeInfo tde_info = {0};
  11. if (RelationisEncryptEnable(state->rs_new_rel)) {
  12. GetTdeInfoFromRel(state->rs_new_rel, &tde_info);
  13. }
  14. /*
  15. * If the new relation (`state->rs_new_rel`) has encryption enabled,
  16. * it gets the TDE information from this relation.
  17. */
  18. if (IsSegmentFileNode(state->rs_new_rel->rd_node)) {
  19. //checks if the new relation is a segment file node
  20. Assert(state->rs_use_wal);
  21. Buffer buf = ReadBuffer(state->rs_new_rel, P_NEW);
  22. //reads a buffer of the new relation
  23. #ifdef USE_ASSERT_CHECKING
  24. BufferDesc *buf_desc = GetBufferDescriptor(buf - 1);
  25. Assert(buf_desc->tag.blockNum == state->rs_blockno);
  26. #endif
  27. //check the rs_blockno if the USE_ASSERT_CHEKING is defined.
  28. LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
  29. XLogRecPtr xlog_ptr = log_newpage(&state->rs_new_rel->rd_node, MAIN_FORKNUM, state->rs_blockno, page, true,
  30. &tde_info);
  31. //locks this buffer
  32. errno_t rc = memcpy_s(BufferGetBlock(buf), BLCKSZ, page, BLCKSZ);
  33. //copier the page data to the buffer block
  34. securec_check(rc, "\0", "\0");
  35. PageSetLSN(BufferGetPage(buf), xlog_ptr);//sets the LSN of the buffer block
  36. MarkBufferDirty(buf);
  37. UnlockReleaseBuffer(buf);
  38. } else {
  39. /* check tablespace size limitation when extending new file. */
  40. STORAGE_SPACE_OPERATION(state->rs_new_rel, BLCKSZ);
  41. if (state->rs_use_wal) {
  42. log_newpage(&state->rs_new_rel->rd_node, MAIN_FORKNUM, state->rs_blockno, page, true, &tde_info);
  43. }
  44. // if the WAL is used,it logs a new page.
  45. RelationOpenSmgr(state->rs_new_rel);
  46. //open the storage manager of the new relation.
  47. char *bufToWrite = NULL;
  48. if (RelationisEncryptEnable(state->rs_new_rel)) {
  49. bufToWrite = PageDataEncryptIfNeed(page, &tde_info, true);
  50. } else {
  51. bufToWrite = page;
  52. }
  53. //it needs to encrypt the page data when the new relation has encryption enabled.
  54. PageSetChecksumInplace((Page)bufToWrite, state->rs_blockno);
  55. //set page checksum in the correct location .
  56. rewrite_flush_page(state, (Page)bufToWrite);
  57. //flushes the page state
  58. }
  59. }
Function name:cmpr_heap_insert

这个函数主要用于数据库系统的数据页重写过程,在数据页被修改后,需要将修改后的数据页写回到磁盘中。在这个过程中,可能涉及到一些特殊操作,如压缩、事务处理等。完整源码以及注释如下:

 
  1. /*
  2. * Function name:cmpr_heap_insert
  3. * Description:This function is mainly used in the data page rewriting process of a database system.
  4. * After a data page is modified, it needs to write back the modified data page to disk.
  5. * In this process, it may involve some special operations such as compression, transaction handling, etc.
  6. */
  7. static void cmpr_heap_insert(RewriteState state, HeapTuple tup)
  8. {
  9. Page page = state->rs_cmprBuffer;
  10. Size len;
  11. OffsetNumber newoff;
  12. TransactionId xmin, xmax;
  13. Assert(state->rs_new_rel->rd_rel->relkind != RELKIND_TOASTVALUE);
  14. Assert(!HeapTupleHasExternal(tup) && !(tup->t_len > TOAST_TUPLE_THRESHOLD));
  15. xmin = HeapTupleGetRawXmin(tup);
  16. xmax = HeapTupleGetRawXmax(tup);
  17. rewrite_page_prepare_for_xid(page, xmin, false);
  18. (void)rewrite_page_prepare_for_xid(page, xmax, (tup->t_data->t_infomask & HEAP_XMAX_IS_MULTI) ? true : false);
  19. //It gets the xmin and xmax transaction IDs of the tuple and prepares the page to handle these transaction IDs
  20. HeapTupleCopyBaseFromPage(tup, page);
  21. HeapTupleSetXmin(tup, xmin);
  22. HeapTupleSetXmax(tup, xmax);
  23. len = MAXALIGN(tup->t_len);
  24. if (len > MaxHeapTupleSize)
  25. ereport(ERROR,
  26. (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("row is too big: size %lu, maximum size %lu",
  27. (unsigned long)len, (unsigned long)MaxHeapTupleSize)));
  28. //It checks if the length of the tuple exceeds the maximum heap tuple size. If it does, it reports an error.
  29. Assert(PageIsCompressed(page));
  30. //ensure page has been compressed
  31. /* And now we can insert the tuple into the page */
  32. newoff = PageAddItem(page, (Item)tup->t_data, tup->t_len, InvalidOffsetNumber, false, true);
  33. Assert(newoff != InvalidOffsetNumber);
  34. /* Update caller's t_self to the actual position where it was stored */
  35. ItemPointerSet(&(tup->t_self), state->rs_blockno, newoff);
  36. // If the CTID of the stored tuple is invalid, it updates the CTID of the stored tuple on the page to its actual location.
  37. if (!ItemPointerIsValid(&tup->t_data->t_ctid)) {
  38. ItemId newitemid = PageGetItemId(page, newoff);
  39. HeapTupleHeader onpage_tup = (HeapTupleHeader)PageGetItem(page, newitemid);
  40. onpage_tup->t_ctid = tup->t_self;
  41. }
  42. }

四、小结

本次主要对rewriteheap.cpp的文件结构功能做了分析,了解了相关专业名词的释义并且对其中部分函数做了注释。

接下来将继续学习rewriteheap.cpp,注释其他函数,并且更深一步了解专业名词所代表的相关技术以及其实现原理。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值