postgres源码解析8 Slru缓冲池--2

介绍

  本小节重点介绍Srlu缓冲池的读写操作,相关Slru缓冲池的建立和初始化见上一篇:postgres源码分析 Slru缓冲池的实现-1

缓冲池页面的读操作

SimpleLruReadPage
   |-- SlruSelectLRUPage
   |-- SlruPhysicalReadPage

1 SimpleLruReadPage
1)首先调用SlruSelectLRUPage 在slru缓冲区中获取指定页所对应slru槽;
2)如果在缓冲区中找到对应slru槽,且非空,判断该槽页状态是否为IO转态(读入或写出),如果是则等待其完成IO操作;反之更新该槽对应的引用计数,返回该槽号;
3)指定页不在缓冲区中,则调用SlruPhysicalReadPage从磁盘中读取指定页至对应的slru槽,更新该槽对应的引用计数,返回该槽号;

2 SlruSelectLRUPage
1)依次遍历slru槽,如果该槽对应于指定页且状态不等于SLRU_PAGE_EMPTY,说明该页的已存在于缓冲区,直接返回该槽号;
2)如果该页对应的slru形式不存在,则只需要在slru缓冲区中找到一个SLRU_PAGE_EMPTY的槽即可;
3)如果上述两种方式均未找到合适的槽号,则利用页面置换算法将最近最少使用的槽页给置换出去,以供指定页使用,返回对应槽号。

3 SlruPhysicalReadPage
1)根据指定页获取该页在文件中物理段号及偏移量;
2)根据物理路径打开该段文件,并定位至该页所在段文件的偏移量;
3)调用read系统函数将改业读取至指定的slru槽中,然后关闭该段文件;
4)最后返回成功。

/*
 * Find a page in a shared buffer, reading it in if necessary.
 * The page number must correspond to an already-initialized page.
 *
 * If write_ok is true then it is OK to return a page that is in
 * WRITE_IN_PROGRESS state; it is the caller's responsibility to be sure
 * that modification of the page is safe.  If write_ok is false then we
 * will not return the page until it is not undergoing active I/O.
 *
 * The passed-in xid is used only for error reporting, and may be
 * InvalidTransactionId if no specific xid is associated with the action.
 *
 * Return value is the shared-buffer slot number now holding the page.
 * The buffer's LRU access info is updated.
 *
 * Control lock must be held at entry, and will be held at exit.
 */
int
SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
				  TransactionId xid)
{
	SlruShared	shared = ctl->shared;

	/* Outer loop handles restart if we must wait for someone else's I/O */
	for (;;)
	{
		int			slotno;
		bool		ok;

		/* See if page already is in memory; if not, pick victim slot */
		slotno = SlruSelectLRUPage(ctl, pageno);

		/* Did we find the page in memory? */
		if (shared->page_number[slotno] == pageno &&
			shared->page_status[slotno] != SLRU_PAGE_EMPTY)
		{
			/*
			 * If page is still being read in, we must wait for I/O.  Likewise
			 * if the page is being written and the caller said that's not OK.
			 */
			if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS ||
				(shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
				 !write_ok))
			{
				SimpleLruWaitIO(ctl, slotno);
				/* Now we must recheck state from the top */
				continue;
			}
			/* Otherwise, it's ready to use */
			SlruRecentlyUsed(shared, slotno);

			/* update the stats counter of pages found in the SLRU */
			pgstat_count_slru_page_hit(shared->slru_stats_idx);

			return slotno;
		}

		/* We found no match; assert we selected a freeable slot */
		Assert(shared->page_status[slotno] == SLRU_PAGE_EMPTY ||
			   (shared->page_status[slotno] == SLRU_PAGE_VALID &&
				!shared->page_dirty[slotno]));

		/* Mark the slot read-busy */
		shared->page_number[slotno] = pageno;
		shared->page_status[slotno] = SLRU_PAGE_READ_IN_PROGRESS;
		shared->page_dirty[slotno] = false;

		/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
		LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);

		/* Release control lock while doing I/O */
		LWLockRelease(shared->ControlLock);

		/* Do the read */
		ok = SlruPhysicalReadPage(ctl, pageno, slotno);

		/* Set the LSNs for this newly read-in page to zero */
		SimpleLruZeroLSNs(ctl, slotno);

		/* Re-acquire control lock and update page state */
		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);

		Assert(shared->page_number[slotno] == pageno &&
			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
			   !shared->page_dirty[slotno]);

		shared->page_status[slotno] = ok ? SLRU_PAGE_VALID : SLRU_PAGE_EMPTY;

		LWLockRelease(&shared->buffer_locks[slotno].lock);

		/* Now it's okay to ereport if we failed */
		if (!ok)
			SlruReportIOError(ctl, pageno, xid);

		SlruRecentlyUsed(shared, slotno);

		/* update the stats counter of pages not found in SLRU */
		pgstat_count_slru_page_read(shared->slru_stats_idx);

		return slotno;
	}
}

缓冲池页面的刷新操作

该操作定义在SimpleLruFlush,在检查点或数据库关闭的时候进行调用,目的是将缓冲池的脏页数据刷至磁盘。

CheckPointCLOG
  |-- SimpleLruFlush // 将内存中clog相关的dirty slru刷至物理文件
  |-- SlruInternalWritePage
   |-- SlruPhysicalWritePage

1 SimpleLruFlush
1)首先获取 LW_EXCLUSIVE模式的ControlLock锁,构造一个刷写数据结构FData,记录后续刷写操作记录打开的文件描述符;
2)遍历所有的slru槽,调用SlruInternalWritePage将所有的dirty buffer页刷至clog文件
3)释放ControlLock锁,根据同步标记do_sync决定是否调用pg_fsync函数将文件页同步写入磁盘,
4)最后关闭打开的所有文件并返回成功。在此过程同步或关闭文件可能会出现错误,则需报告出错消息;

2 SlruInternalWritePage
1)首先读取指定的slru槽状态信息,如果该槽状态为“正在写入”或“脏”,说明在槽内容发生更新,需要将其内容刷盘;反之,直接返回;
2)然后获取该缓冲块的缓冲区排他锁,释放之前获取的ControlLock锁,调用SlruPhysicalWritePage将该脏buffer刷盘;
3)重新获取缓冲区控制锁,将缓冲块的状态设置为 SLRU_PAGE_VALID,最后释放该块的缓冲区排他锁;

3 SlruPhysicalWritePage
1)首先,由页号确定该页在clog文件中的实际地址;
2)判断 group_lsn是否有效[WAL异步提交],如果有效,则确定该页中最大的日志序列号max_lsn,然后将该点之前的WAL日志持久化至磁盘,同步提交直接进入步骤3);
3)检查写入的段文件是否已打开,如果是,将其文件描述符保存在FData数组中;否则根据文件名获取文件路径,并打开段文件;
4)找到该槽页对应的文件偏移量,调用write系统函数将该槽页写入物理磁盘;
5)根据同步标记do_sync决定是否调用pg_fsync函数将文件页同步写入磁盘,最后关闭打开的所有文件。

/*
 * Flush dirty pages to disk during checkpoint or database shutdown
 */
void
SimpleLruFlush(SlruCtl ctl, bool allow_redirtied)
{
	SlruShared	shared = ctl->shared;
	SlruFlushData fdata;
	int			slotno;
	int			pageno = 0;
	int			i;
	bool		ok;

	/*
	 * Find and write dirty pages
	 */
	fdata.num_files = 0;

	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);

	for (slotno = 0; slotno < shared->num_slots; slotno++)
	{
		SlruInternalWritePage(ctl, slotno, &fdata);

		/*
		 * In some places (e.g. checkpoints), we cannot assert that the slot
		 * is clean now, since another process might have re-dirtied it
		 * already.  That's okay.
		 */
		Assert(allow_redirtied ||
			   shared->page_status[slotno] == SLRU_PAGE_EMPTY ||
			   (shared->page_status[slotno] == SLRU_PAGE_VALID &&
				!shared->page_dirty[slotno]));
	}

	LWLockRelease(shared->ControlLock);

	/*
	 * Now fsync and close any files that were open
	 */
	ok = true;
	for (i = 0; i < fdata.num_files; i++)
	{
		pgstat_report_wait_start(WAIT_EVENT_SLRU_FLUSH_SYNC);
		if (ctl->do_fsync && pg_fsync(fdata.fd[i]))
		{
			slru_errcause = SLRU_FSYNC_FAILED;
			slru_errno = errno;
			pageno = fdata.segno[i] * SLRU_PAGES_PER_SEGMENT;
			ok = false;
		}
		pgstat_report_wait_end();

		if (CloseTransientFile(fdata.fd[i]))
		{
			slru_errcause = SLRU_CLOSE_FAILED;
			slru_errno = errno;
			pageno = fdata.segno[i] * SLRU_PAGES_PER_SEGMENT;
			ok = false;
		}
	}
	if (!ok)
		SlruReportIOError(ctl, pageno, InvalidTransactionId);
}

缓冲池页面的删除操作

TruncateCLOG
 |–SimpleLruTruncate
  |-- SlruScanDirectory

1 SimpleLruTruncate
1)首先获取缓冲区排他控制锁;
2) 进行安全性检查,如果指定页面[删除页面]比当前最大页面号还大,说明不合法,释放缓冲区控制锁报错并退出;
3 )遍历缓冲区所有的缓冲槽,如果槽页状态为“SLRU_PAGE_EMPTY”或槽页对应的物理页号大于等于指定页号,则跳过;如果槽页状态为“SLRU_PAGE_VALID”且非“脏”;则只需将槽页状态设置为“SLRU_PAGE_EMPTY”;如果槽页状态为“正在读入”,调用Simple如果槽页状态为“脏”或“正在写入”,调用SlruInternalWritePage将指定槽页写入对应的物理文件页。
4 )释放缓冲区排他控制锁;
5 )最后调用SlruScanDirectory执行真正的删除操作,将过时的段文件进行物理删除。

2 SlruScanDirectory
该函数功能用于删除指定目录下过时的段文件
1) 根据参数计算出待删除页面所在的段文件;
2 )然后由控制结构获得数据文件所在的目录,取得该目录下的所有段文件信息;
3 )逐个比较各个文件的名称,如果该段的第一个页面和最后一个页均小于参照页面号,通过unlink系统调用将该路径下的段文件进行物理删除。
4)释放此过程中所申请的资源并关闭文件。

/*
 * Remove all segments before the one holding the passed page number
 *
 * All SLRUs prevent concurrent calls to this function, either with an LWLock
 * or by calling it only as part of a checkpoint.  Mutual exclusion must begin
 * before computing cutoffPage.  Mutual exclusion must end after any limit
 * update that would permit other backends to write fresh data into the
 * segment immediately preceding the one containing cutoffPage.  Otherwise,
 * when the SLRU is quite full, SimpleLruTruncate() might delete that segment
 * after it has accrued freshly-written data.
 */
void
SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
{
	SlruShared	shared = ctl->shared;
	int			slotno;

	/* update the stats counter of truncates */
	pgstat_count_slru_truncate(shared->slru_stats_idx);

	/*
	 * Scan shared memory and remove any pages preceding the cutoff page, to
	 * ensure we won't rewrite them later.  (Since this is normally called in
	 * or just after a checkpoint, any dirty pages should have been flushed
	 * already ... we're just being extra careful here.)
	 */
	LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);

restart:;

	/*
	 * While we are holding the lock, make an important safety check: the
	 * current endpoint page must not be eligible for removal.
	 */
	if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
	{
		LWLockRelease(shared->ControlLock);
		ereport(LOG,
				(errmsg("could not truncate directory \"%s\": apparent wraparound",
						ctl->Dir)));
		return;
	}

	for (slotno = 0; slotno < shared->num_slots; slotno++)
	{
		if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
			continue;
		if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
			continue;

		/*
		 * If page is clean, just change state to EMPTY (expected case).
		 */
		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
			!shared->page_dirty[slotno])
		{
			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
			continue;
		}

		/*
		 * Hmm, we have (or may have) I/O operations acting on the page, so
		 * we've got to wait for them to finish and then start again. This is
		 * the same logic as in SlruSelectLRUPage.  (XXX if page is dirty,
		 * wouldn't it be OK to just discard it without writing it?
		 * SlruMayDeleteSegment() uses a stricter qualification, so we might
		 * not delete this page in the end; even if we don't delete it, we
		 * won't have cause to read its data again.  For now, keep the logic
		 * the same as it was.)
		 */
		if (shared->page_status[slotno] == SLRU_PAGE_VALID)
			SlruInternalWritePage(ctl, slotno, NULL);
		else
			SimpleLruWaitIO(ctl, slotno);
		goto restart;
	}

	LWLockRelease(shared->ControlLock);

	/* Now we can remove the old segment(s) */
	(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
}

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值