介绍
本小节重点介绍Srlu缓冲池的读写操作,相关Slru缓冲池的建立和初始化见上一篇:postgres源码分析 Slru缓冲池的实现-1
缓冲池页面的读操作
SimpleLruReadPage
|-- SlruSelectLRUPage
|-- SlruPhysicalReadPage
1 SimpleLruReadPage
1)首先调用SlruSelectLRUPage 在slru缓冲区中获取指定页所对应slru槽;
2)如果在缓冲区中找到对应slru槽,且非空,判断该槽页状态是否为IO转态(读入或写出),如果是则等待其完成IO操作;反之更新该槽对应的引用计数,返回该槽号;
3)指定页不在缓冲区中,则调用SlruPhysicalReadPage从磁盘中读取指定页至对应的slru槽,更新该槽对应的引用计数,返回该槽号;
2 SlruSelectLRUPage
1)依次遍历slru槽,如果该槽对应于指定页且状态不等于SLRU_PAGE_EMPTY,说明该页的已存在于缓冲区,直接返回该槽号;
2)如果该页对应的slru形式不存在,则只需要在slru缓冲区中找到一个SLRU_PAGE_EMPTY的槽即可;
3)如果上述两种方式均未找到合适的槽号,则利用页面置换算法将最近最少使用的槽页给置换出去,以供指定页使用,返回对应槽号。
3 SlruPhysicalReadPage
1)根据指定页获取该页在文件中物理段号及偏移量;
2)根据物理路径打开该段文件,并定位至该页所在段文件的偏移量;
3)调用read系统函数将改业读取至指定的slru槽中,然后关闭该段文件;
4)最后返回成功。
/*
* Find a page in a shared buffer, reading it in if necessary.
* The page number must correspond to an already-initialized page.
*
* If write_ok is true then it is OK to return a page that is in
* WRITE_IN_PROGRESS state; it is the caller's responsibility to be sure
* that modification of the page is safe. If write_ok is false then we
* will not return the page until it is not undergoing active I/O.
*
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
* Return value is the shared-buffer slot number now holding the page.
* The buffer's LRU access info is updated.
*
* Control lock must be held at entry, and will be held at exit.
*/
int
SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
TransactionId xid)
{
SlruShared shared = ctl->shared;
/* Outer loop handles restart if we must wait for someone else's I/O */
for (;;)
{
int slotno;
bool ok;
/* See if page already is in memory; if not, pick victim slot */
slotno = SlruSelectLRUPage(ctl, pageno);
/* Did we find the page in memory? */
if (shared->page_number[slotno] == pageno &&
shared->page_status[slotno] != SLRU_PAGE_EMPTY)
{
/*
* If page is still being read in, we must wait for I/O. Likewise
* if the page is being written and the caller said that's not OK.
*/
if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS ||
(shared->page_status[slotno] == SLRU_PAGE_WRITE_IN_PROGRESS &&
!write_ok))
{
SimpleLruWaitIO(ctl, slotno);
/* Now we must recheck state from the top */
continue;
}
/* Otherwise, it's ready to use */
SlruRecentlyUsed(shared, slotno);
/* update the stats counter of pages found in the SLRU */
pgstat_count_slru_page_hit(shared->slru_stats_idx);
return slotno;
}
/* We found no match; assert we selected a freeable slot */
Assert(shared->page_status[slotno] == SLRU_PAGE_EMPTY ||
(shared->page_status[slotno] == SLRU_PAGE_VALID &&
!shared->page_dirty[slotno]));
/* Mark the slot read-busy */
shared->page_number[slotno] = pageno;
shared->page_status[slotno] = SLRU_PAGE_READ_IN_PROGRESS;
shared->page_dirty[slotno] = false;
/* Acquire per-buffer lock (cannot deadlock, see notes at top) */
LWLockAcquire(&shared->buffer_locks[slotno].lock, LW_EXCLUSIVE);
/* Release control lock while doing I/O */
LWLockRelease(shared->ControlLock);
/* Do the read */
ok = SlruPhysicalReadPage(ctl, pageno, slotno);
/* Set the LSNs for this newly read-in page to zero */
SimpleLruZeroLSNs(ctl, slotno);
/* Re-acquire control lock and update page state */
LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
Assert(shared->page_number[slotno] == pageno &&
shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
!shared->page_dirty[slotno]);
shared->page_status[slotno] = ok ? SLRU_PAGE_VALID : SLRU_PAGE_EMPTY;
LWLockRelease(&shared->buffer_locks[slotno].lock);
/* Now it's okay to ereport if we failed */
if (!ok)
SlruReportIOError(ctl, pageno, xid);
SlruRecentlyUsed(shared, slotno);
/* update the stats counter of pages not found in SLRU */
pgstat_count_slru_page_read(shared->slru_stats_idx);
return slotno;
}
}
缓冲池页面的刷新操作
该操作定义在SimpleLruFlush,在检查点或数据库关闭的时候进行调用,目的是将缓冲池的脏页数据刷至磁盘。
CheckPointCLOG
|-- SimpleLruFlush // 将内存中clog相关的dirty slru刷至物理文件
|-- SlruInternalWritePage
|-- SlruPhysicalWritePage
1 SimpleLruFlush
1)首先获取 LW_EXCLUSIVE模式的ControlLock锁,构造一个刷写数据结构FData,记录后续刷写操作记录打开的文件描述符;
2)遍历所有的slru槽,调用SlruInternalWritePage将所有的dirty buffer页刷至clog文件
3)释放ControlLock锁,根据同步标记do_sync决定是否调用pg_fsync函数将文件页同步写入磁盘,
4)最后关闭打开的所有文件并返回成功。在此过程同步或关闭文件可能会出现错误,则需报告出错消息;
2 SlruInternalWritePage
1)首先读取指定的slru槽状态信息,如果该槽状态为“正在写入”或“脏”,说明在槽内容发生更新,需要将其内容刷盘;反之,直接返回;
2)然后获取该缓冲块的缓冲区排他锁,释放之前获取的ControlLock锁,调用SlruPhysicalWritePage将该脏buffer刷盘;
3)重新获取缓冲区控制锁,将缓冲块的状态设置为 SLRU_PAGE_VALID,最后释放该块的缓冲区排他锁;
3 SlruPhysicalWritePage
1)首先,由页号确定该页在clog文件中的实际地址;
2)判断 group_lsn是否有效[WAL异步提交],如果有效,则确定该页中最大的日志序列号max_lsn,然后将该点之前的WAL日志持久化至磁盘,同步提交直接进入步骤3);
3)检查写入的段文件是否已打开,如果是,将其文件描述符保存在FData数组中;否则根据文件名获取文件路径,并打开段文件;
4)找到该槽页对应的文件偏移量,调用write系统函数将该槽页写入物理磁盘;
5)根据同步标记do_sync决定是否调用pg_fsync函数将文件页同步写入磁盘,最后关闭打开的所有文件。
/*
* Flush dirty pages to disk during checkpoint or database shutdown
*/
void
SimpleLruFlush(SlruCtl ctl, bool allow_redirtied)
{
SlruShared shared = ctl->shared;
SlruFlushData fdata;
int slotno;
int pageno = 0;
int i;
bool ok;
/*
* Find and write dirty pages
*/
fdata.num_files = 0;
LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
for (slotno = 0; slotno < shared->num_slots; slotno++)
{
SlruInternalWritePage(ctl, slotno, &fdata);
/*
* In some places (e.g. checkpoints), we cannot assert that the slot
* is clean now, since another process might have re-dirtied it
* already. That's okay.
*/
Assert(allow_redirtied ||
shared->page_status[slotno] == SLRU_PAGE_EMPTY ||
(shared->page_status[slotno] == SLRU_PAGE_VALID &&
!shared->page_dirty[slotno]));
}
LWLockRelease(shared->ControlLock);
/*
* Now fsync and close any files that were open
*/
ok = true;
for (i = 0; i < fdata.num_files; i++)
{
pgstat_report_wait_start(WAIT_EVENT_SLRU_FLUSH_SYNC);
if (ctl->do_fsync && pg_fsync(fdata.fd[i]))
{
slru_errcause = SLRU_FSYNC_FAILED;
slru_errno = errno;
pageno = fdata.segno[i] * SLRU_PAGES_PER_SEGMENT;
ok = false;
}
pgstat_report_wait_end();
if (CloseTransientFile(fdata.fd[i]))
{
slru_errcause = SLRU_CLOSE_FAILED;
slru_errno = errno;
pageno = fdata.segno[i] * SLRU_PAGES_PER_SEGMENT;
ok = false;
}
}
if (!ok)
SlruReportIOError(ctl, pageno, InvalidTransactionId);
}
缓冲池页面的删除操作
TruncateCLOG
|–SimpleLruTruncate
|-- SlruScanDirectory
1 SimpleLruTruncate
1)首先获取缓冲区排他控制锁;
2) 进行安全性检查,如果指定页面[删除页面]比当前最大页面号还大,说明不合法,释放缓冲区控制锁报错并退出;
3 )遍历缓冲区所有的缓冲槽,如果槽页状态为“SLRU_PAGE_EMPTY”或槽页对应的物理页号大于等于指定页号,则跳过;如果槽页状态为“SLRU_PAGE_VALID”且非“脏”;则只需将槽页状态设置为“SLRU_PAGE_EMPTY”;如果槽页状态为“正在读入”,调用Simple如果槽页状态为“脏”或“正在写入”,调用SlruInternalWritePage将指定槽页写入对应的物理文件页。
4 )释放缓冲区排他控制锁;
5 )最后调用SlruScanDirectory执行真正的删除操作,将过时的段文件进行物理删除。
2 SlruScanDirectory
该函数功能用于删除指定目录下过时的段文件
1) 根据参数计算出待删除页面所在的段文件;
2 )然后由控制结构获得数据文件所在的目录,取得该目录下的所有段文件信息;
3 )逐个比较各个文件的名称,如果该段的第一个页面和最后一个页均小于参照页面号,通过unlink系统调用将该路径下的段文件进行物理删除。
4)释放此过程中所申请的资源并关闭文件。
/*
* Remove all segments before the one holding the passed page number
*
* All SLRUs prevent concurrent calls to this function, either with an LWLock
* or by calling it only as part of a checkpoint. Mutual exclusion must begin
* before computing cutoffPage. Mutual exclusion must end after any limit
* update that would permit other backends to write fresh data into the
* segment immediately preceding the one containing cutoffPage. Otherwise,
* when the SLRU is quite full, SimpleLruTruncate() might delete that segment
* after it has accrued freshly-written data.
*/
void
SimpleLruTruncate(SlruCtl ctl, int cutoffPage)
{
SlruShared shared = ctl->shared;
int slotno;
/* update the stats counter of truncates */
pgstat_count_slru_truncate(shared->slru_stats_idx);
/*
* Scan shared memory and remove any pages preceding the cutoff page, to
* ensure we won't rewrite them later. (Since this is normally called in
* or just after a checkpoint, any dirty pages should have been flushed
* already ... we're just being extra careful here.)
*/
LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
restart:;
/*
* While we are holding the lock, make an important safety check: the
* current endpoint page must not be eligible for removal.
*/
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
{
LWLockRelease(shared->ControlLock);
ereport(LOG,
(errmsg("could not truncate directory \"%s\": apparent wraparound",
ctl->Dir)));
return;
}
for (slotno = 0; slotno < shared->num_slots; slotno++)
{
if (shared->page_status[slotno] == SLRU_PAGE_EMPTY)
continue;
if (!ctl->PagePrecedes(shared->page_number[slotno], cutoffPage))
continue;
/*
* If page is clean, just change state to EMPTY (expected case).
*/
if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
!shared->page_dirty[slotno])
{
shared->page_status[slotno] = SLRU_PAGE_EMPTY;
continue;
}
/*
* Hmm, we have (or may have) I/O operations acting on the page, so
* we've got to wait for them to finish and then start again. This is
* the same logic as in SlruSelectLRUPage. (XXX if page is dirty,
* wouldn't it be OK to just discard it without writing it?
* SlruMayDeleteSegment() uses a stricter qualification, so we might
* not delete this page in the end; even if we don't delete it, we
* won't have cause to read its data again. For now, keep the logic
* the same as it was.)
*/
if (shared->page_status[slotno] == SLRU_PAGE_VALID)
SlruInternalWritePage(ctl, slotno, NULL);
else
SimpleLruWaitIO(ctl, slotno);
goto restart;
}
LWLockRelease(shared->ControlLock);
/* Now we can remove the old segment(s) */
(void) SlruScanDirectory(ctl, SlruScanDirCbDeleteCutoff, &cutoffPage);
}