postgres 源码解析17 原地更新元组 heap_inplace_update

背景

  postgres采用MVCC机制实现读不阻塞写,写不阻塞读,极大提高数据库的并发度。其实现原理为:当发生删除/更新时,保留旧元组数据并标记死元组,将新数据插入表中,结合MVCC和隔离界别技术,便能实现上述功能。近阅读源代码过程中发现pg中涉及原地更新元组的情况,这无疑破坏了上述规则,接下来一起学习分析。

源码解析

1 heap_inplace_update 接口函数介绍

/*
 * heap_inplace_update - update a tuple "in place" (ie, overwrite it)
 *
 * Overwriting violates both MVCC and transactional safety, so the uses
 * of this function in Postgres are extremely limited.  Nonetheless we
 * find some places to use it.
 // 原地重写覆盖违反 MVCC和事务安全,因此使用Postgres 中的这个功能非常有限。尽管如此,
    我们还是找到了一些使用它的地方。  
 * The tuple cannot change size, and therefore it's reasonable to assume
 * that its null bitmap (if any) doesn't change either.  So we just
 * overwrite the data portion of the tuple without touching the null
 * bitmap or any of the header fields.
 // 元组不能改变大小,因此可以合理地假设它的空位图(如果有
    的话)也不会改变。 所以我们只是覆盖元组的数据部分而不触及空位图或任何标题字段。
 * tuple is an in-memory tuple structure containing the data to be written
 * over the target tuple.  Also, tuple->t_self identifies the target tuple.
 // tuple 是一个内存中的元组结构,包含要写入目标元组
    的数据。 此外, tuple->t_self 标识目标元组。
 * Note that the tuple updated here had better not come directly from the
 * syscache if the relation has a toast relation as this tuple could
 * include toast values that have been expanded, causing a failure here.
 /* 请注意,如果关系具有 toast 关系,则此处更新的元组最好不要直接来自 syscache,因为此元
    组可能包含已扩展的 toast 值,从而导致此处失败。
*/

** 2 源码流程详解**
2.1 有效性检查

void
heap_inplace_update(Relation relation, HeapTuple tuple)
{
	Buffer		buffer;
	Page		page;
	OffsetNumber offnum;
	ItemId		lp = NULL;
	HeapTupleHeader htup;
	uint32		oldlen;
	uint32		newlen;

	/*
	 * For now, we don't allow parallel updates.  Unlike a regular update,
	 * this should never create a combo CID, so it might be possible to relax
	 * this restriction, but not without more thought and testing.  It's not
	 * clear that it would be useful, anyway.
	 */
	 // 并行模式,不允许更新tuples 
	if (IsInParallelMode())
		ereport(ERROR,
				(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
				 errmsg("cannot update tuples during a parallel operation")));
	// 将指定元组躲在的物理读入buffer中 【可能会涉及页面置换算法】,并施加buffer排他锁,防止并发操作
	buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&(tuple->t_self)));
	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
	page = (Page) BufferGetPage(buffer);
   // 有效性检查
	offnum = ItemPointerGetOffsetNumber(&(tuple->t_self));
	if (PageGetMaxOffsetNumber(page) >= offnum)
		lp = PageGetItemId(page, offnum);

	if (PageGetMaxOffsetNumber(page) < offnum || !ItemIdIsNormal(lp))
		elog(ERROR, "invalid lp");
	// 获取该元组的HeapTupleHeader信息
	htup = (HeapTupleHeader) PageGetItem(page, lp);
	// 获取元组数据真实长度
	oldlen = ItemIdGetLength(lp) - htup->t_hoff; 
	newlen = tuple->t_len - tuple->t_data->t_hoff;                     
	if (oldlen != newlen || htup->t_hoff != tuple->t_data->t_hoff)    // 这里看出上述两种计算元组数据长度方法,
		elog(ERROR, "wrong tuple length");

2.2元组原地拷贝
提示: tuple 是修改后的数据,位于syscache
    htup 是修改前的元祖数据,位于buffer pool

	/* NO EREPORT(ERROR) from here till changes are logged */
	START_CRIT_SECTION();
  	// 进入临界区,
	memcpy((char *) htup + htup->t_hoff,                       // 将修改的元组数据填充至相应位置
		   (char *) tuple->t_data + tuple->t_data->t_hoff,
		   newlen);
	// 标脏,后续有checkpoint 刷盘
	MarkBufferDirty(buffer);

	// 写相应的XLOG日志
	/* XLOG stuff */
	if (RelationNeedsWAL(relation))
	{
		xl_heap_inplace xlrec;
		XLogRecPtr	recptr;

		xlrec.offnum = ItemPointerGetOffsetNumber(&tuple->t_self);

		XLogBeginInsert();
		XLogRegisterData((char *) &xlrec, SizeOfHeapInplace);

		XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
		XLogRegisterBufData(0, (char *) htup + htup->t_hoff, newlen);

		/* inplace updates aren't decoded atm, don't log the origin */

		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_INPLACE);

		PageSetLSN(page, recptr);
	}

	END_CRIT_SECTION();
	//释放前施加的buffer lock
	UnlockReleaseBuffer(buffer);

	/*
	 * Send out shared cache inval if necessary.  Note that because we only
	 * pass the new version of the tuple, this mustn't be used for any
	 * operations that could change catcache lookup keys.  But we aren't
	 * bothering with index updates either, so that's true a fortiori.
	 * 告知其他backend此元组的cache失效
	 /
	if (!IsBootstrapProcessingMode())
		CacheInvalidateHeapTuple(relation, tuple, NULL);
}

日志填充信息:该元组所在偏移量 + 替换的数据,

/* This is what we need to know about in-place update */
typedef struct xl_heap_inplace
{
	OffsetNumber offnum;		/* updated tuple's offset on page */
	/* TUPLE DATA FOLLOWS AT END OF STRUCT */
} xl_heap_inplace;

#define SizeOfHeapInplace	(offsetof(xl_heap_inplace, offnum) + sizeof(OffsetNumber))

3 接口调用场景

阅读pg源代码,发现该接口主要用于修改系统表数据时调用,一般的表数据还是采用MVCC机制保留旧版本数据,插入新数据。虽然原地修改需要具备严格的条件,但是其实现效率远比上述方法快的多,给后续的优化改造添加另一种思路和想法。

1) index_update_stats       --- update pg_class entry after CREATE INDEX or REINDEX
2) vac_update_relstats      --- update statistics for one relation
3) vac_update_datfrozenxid  --- update pg_database.datfrozenxid for our DB
4) create_toast_table       --- While bootstrapping, we cannot UPDATE, so overwrite 
                                in-place 
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值