背景
full_page_write该参数可以根据需要在控制文件中设定,当这个参数打开时,PG服务器在一个检查点之后的页面的第一次修改期间将每个页面的全部内容写到wal中,这么做是因为在操作系统崩溃恢复期间正在处理的一次页写入可能只有部分完成,从而导致在一个磁盘页面中混合有新旧数据。在崩溃后的恢复期间,通常存储在wal中的行级改变数据不足以恢复这样一个页面。存储完整的页面映像可以保证页面被正确的存储,但是代价是增加了必须被写入wal的数据量。
补充:怎么理解上面说的“在操作系统崩溃恢复期间正在处理的一次页写入可能只有部分完成,从而导致一个磁盘页面中混合有新旧数据”?
PG数据页写入是以page为单位,默认是使用8KB的页面大小。但是存储堆栈的其他部分可能会使用不同的块大小。Linux文件系统通常使用4KB的大小,在硬件级别上,旧的驱动器使用512B扇区,新设备可能会以更大的块(4KB或者8KB)写入数据。
因此,当PostgreSQL写8KB页面时,存储堆栈的其他层可能会把它分成更小的块,分别管理。
这样就产生了原子性的问题。8KB的postgreSQL页面可以分成两个4KB的文件系统页面,然后再分成512B扇区。如果在写入期间,服务器崩溃(电源故障、内核错误)那么就可能造成数据库写入了8KB的数据页,但是在崩溃前只有部分数据写入了磁盘。
实现
pg规定在checkpoint后页面的第一次修改/更新,需要将该页完整的写入WAL日志中,即全页写 Full Page Write。
逻辑如下数据页面中的PageHeader记录此页最近一次改动的对用LSN号:pd_lsn, 控制文件中记录了数据库最近checkpoint对应的LSN:CheckPointRedo LSN,通过比较两者的前后关系即可判断此页是否需要做FPW。
场景
发生时机:
1 对于修改操作,当启用全页写时,pg会在每个检查点之后、每个页面第一次发生变更时,将头数据和整个页面作为一条WAL记录写入WAL缓冲区。
2 对于备份操作,强制启用全页写,只要块发生变化,就会被整块写入WAL文件(不管是不是第一次,也不管有没有检查点)。因此,它写入的量是更大的。
相关源码
XLogRecordAssemble函数中涉及是否进行FPW操作
/*
* Assemble a WAL record from the registered data and buffers into an
* XLogRecData chain, ready for insertion with XLogInsertRecord().
*
* The record header fields are filled in, except for the xl_prev field. The
* calculated CRC does not include the record header yet.
*
* If there are any registered buffers, and a full-page image was not taken
* of all of them, *fpw_lsn is set to the lowest LSN among such pages. This
* signals that the assembled record is only good for insertion on the
* assumption that the RedoRecPtr and doPageWrites values were up-to-date.
*/
static XLogRecData *
XLogRecordAssemble(RmgrId rmid, uint8 info,
XLogRecPtr RedoRecPtr, bool doPageWrites,
XLogRecPtr *fpw_lsn, int *num_fpi)
{
XLogRecData *rdt;
uint32 total_len = 0;
int block_id;
pg_crc32c rdata_crc;
registered_buffer *prev_regbuf = NULL;
XLogRecData *rdt_datas_last;
XLogRecord *rechdr;
char *scratch = hdr_scratch;
/*
* Note: this function can be called multiple times for the same record.
* All the modifications we do to the rdata chains below must handle that.
*/
/* The record begins with the fixed-size header */
rechdr = (XLogRecord *) scratch;
scratch += SizeOfXLogRecord;
hdr_rdt.next = NULL;
rdt_datas_last = &hdr_rdt;
hdr_rdt.data = hdr_scratch;
/*
* Enforce consistency checks for this record if user is looking for it.
* Do this before at the beginning of this routine to give the possibility
* for callers of XLogInsert() to pass XLR_CHECK_CONSISTENCY directly for
* a record.
*/
if (wal_consistency_checking[rmid])
info |= XLR_CHECK_CONSISTENCY;
/*
* Make an rdata chain containing all the data portions of all block
* references. This includes the data for full-page images. Also append
* the headers for the block references in the scratch buffer.
*/
*fpw_lsn = InvalidXLogRecPtr;
for (block_id = 0; block_id < max_registered_block_id; block_id++)
{
registered_buffer *regbuf = ®istered_buffers[block_id];
bool needs_backup;
bool needs_data;
XLogRecordBlockHeader bkpb;
XLogRecordBlockImageHeader bimg;
XLogRecordBlockCompressHeader cbimg = {0};
bool samerel;
bool is_compressed = false;
bool include_image;
if (!regbuf->in_use)
continue;
/* Determine if this block needs to be backed up */
if (regbuf->flags & REGBUF_FORCE_IMAGE)
needs_backup = true;
else if (regbuf->flags & REGBUF_NO_IMAGE)
needs_backup = false;
else if (!doPageWrites)
needs_backup = false;
else
{
/*
* We assume page LSN is first data on *every* page that can be
* passed to XLogInsert, whether it has the standard page layout
* or not.
*/
// 判断两者逻辑决定是否进行全页写 FPW
XLogRecPtr page_lsn = PageGetLSN(regbuf->page);
needs_backup = (page_lsn <= RedoRecPtr);
if (!needs_backup)
{
if (*fpw_lsn == InvalidXLogRecPtr || page_lsn < *fpw_lsn)
*fpw_lsn = page_lsn;
}
}
/* Determine if the buffer data needs to included */
if (regbuf->rdata_len == 0)
needs_data = false;
else if ((regbuf->flags & REGBUF_KEEP_DATA) != 0)
needs_data = true;
else
needs_data = !needs_backup;
bkpb.id = block_id;
bkpb.fork_flags = regbuf->forkno;
bkpb.data_length = 0;
if ((regbuf->flags & REGBUF_WILL_INIT) == REGBUF_WILL_INIT)
bkpb.fork_flags |= BKPBLOCK_WILL_INIT;
/*
* If needs_backup is true or WAL checking is enabled for current
* resource manager, log a full-page write for the current block.
*/
include_image = needs_backup || (info & XLR_CHECK_CONSISTENCY) != 0;
...
...
/*
* Calculate CRC of the data
*
* Note that the record header isn't added into the CRC initially since we
* don't know the prev-link yet. Thus, the CRC will represent the CRC of
* the whole record in the order: rdata, then backup blocks, then record
* header.
*/
INIT_CRC32C(rdata_crc);
COMP_CRC32C(rdata_crc, hdr_scratch + SizeOfXLogRecord, hdr_rdt.len - SizeOfXLogRecord);
for (rdt = hdr_rdt.next; rdt != NULL; rdt = rdt->next)
COMP_CRC32C(rdata_crc, rdt->data, rdt->len);
/*
* Fill in the fields in the record header. Prev-link is filled in later,
* once we know where in the WAL the record will be inserted. The CRC does
* not include the record header yet.
*/
rechdr->xl_xid = GetCurrentTransactionIdIfAny();
rechdr->xl_tot_len = total_len;
rechdr->xl_info = info;
rechdr->xl_rmid = rmid;
rechdr->xl_prev = InvalidXLogRecPtr;
rechdr->xl_crc = rdata_crc;
return &hdr_rdt;
}
全页写的优缺点
优点:
提高数据库的安全性,解决块不一致问题。
缺点:
导致WAL文件膨胀
导致额外的磁盘IO,从而影响数据库整体性能
导致主备延迟变大