一、 获取恢复起点
1. 默认的恢复起点
通常,崩溃恢复的起点是最近一次检查点,这个位置保存在控制文件中。在之前创建检查点的函数中我们也看到,每次检查点创建时都会刷新控制文件中的信息。
以下代码位于StartupXLOG函数(xlog.c)
/* Get the last valid checkpoint record.
从控制文件获取检查点及redo点位置,从XLOG读取检查点记录
*/
checkPointLoc = ControlFile->checkPoint;
RedoStartLSN = ControlFile->checkPointCopy.redo;
record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, true);
/* 如果读到,输出debug信息 */
if (record != NULL)
{
ereport(DEBUG1,
(errmsg_internal("checkpoint record is at %X/%X",
LSN_FORMAT_ARGS(checkPointLoc))));
}
else
{
/* 如果读不到,直接报错。旧版本在非standby模式下会尝试再往前读取一个检查点,新版本为了简化,去掉了这一项 */
ereport(PANIC,
(errmsg("could not locate a valid checkpoint record")));
}
2. 特殊情况
前面提到过排他模式备份会创建backup_label文件,如果该文件存在,则优先从该文件获取检查点信息,作为故障恢复起点。
if (read_backup_label(&checkPointLoc, &backupEndRequired,
&backupFromStandby))
{
List *tablespaces = NIL;
/*
* Archive recovery was requested, and thanks to the backup label file, we know how far we need to replay to reach consistency. Enter archive recovery directly.
*/
InArchiveRecovery = true;
if (StandbyModeRequested)
StandbyMode = true;
/*
* When a backup_label file is present, we want to roll forward from the checkpoint it identifies, rather than using pg_control.
如果backup_label文件存在,则优先从该文件而非控制文件获取检查点信息。
*/
record = ReadCheckpointRecord(xlogreader, checkPointLoc, 0, true);
/* 虽然backup_label文件中记录了检查点信息,但有可能其所在WAL日志已被清理(尤其是备份时间太长的时候),因此XLOG中有可能读不到对应信息 */
/* 如果读到检查点信息 */
if (record != NULL)
{
memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
wasShutdown = ((record->xl_info & ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN);
ereport(DEBUG1,
(errmsg_internal("checkpoint record is at %X/%X",
LSN_FORMAT_ARGS(checkPointLoc))));
InRecovery = true; /* force recovery even if SHUTDOWNED */
/*
* Make sure that REDO location exists. This may not be the a backup_label around that references a WAL segment that's already been archived. 即使读到了检查点信息,也可能读不到redo信息(略早于检查点时间)
*/
if (checkPoint.redo < checkPointLoc)
{
XLogBeginRead(xlogreader, checkPoint.redo);
if (!ReadRecord(xlogreader, LOG, false))
ereport(FATAL,
(errmsg("could not find redo location referenced by checkpoint record"),
errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
"If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
"Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
DataDir, DataDir, DataDir)));
}
}
/* 如果读不到检查点信息 */
else
{
ereport(FATAL,
(errmsg("could not locate required checkpoint record"),
errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
"If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
"Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
DataDir, DataDir, DataDir)));
wasShutdown = false; /* keep compiler quiet */
}
如果有backup_label文件,但又无法获取到检查点或者redo点信息,数据库启动会报错。
根据error提示:
- 如果是在从备份恢复数据,则创建recovery.signal文件并添加必要的恢复选项
- 如果不是从备份恢复数据,则删除backup_label文件
- 注意,如果删除了backup_label文件,对应的那个排他备份是不能用于恢复的,相当于备份失败了
3. 重要变量
进入下一部分前,再重点看几个变量,它们会在后面的代码中频繁出现:
- InRecovery:如果为true,应该理解为进程正在replay日志记录,而不是系统正处于恢复模式,后者应该通过RecoveryInProgress() 确定。
- ArchiveRecoveryRequested:请求进行归档日志恢复
- InArchiveRecovery:若为true,说明当前在使用归档日志恢复(通常在执行PITR、或者是从库);若为false,说明当前仅使用pg_wal目录中的wal日志进行恢复(通常是崩溃恢复阶段)
/*
* Are we doing recovery from XLOG?
*
* This is only ever true in the startup process; it should be read as meaning
* "this process is replaying WAL records", rather than "the system is in
* recovery mode". It should be examined primarily by functions that need
* to act differently when called from a WAL redo function (e.g., to skip WAL
* logging). To check whether the system is in recovery regardless of which
* process you're running in, use RecoveryInProgress() but only after shared
* memory startup and lock initialization.
*/
bool InRecovery = false;
/*
* When ArchiveRecoveryRequested is set, archive recovery was requested,
* ie. signal files were present. When InArchiveRecovery is set, we are
* currently recovering using offline XLOG archives. These variables are only
* valid in the startup process.
*
* When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're
* currently performing crash recovery using only XLOG files in pg_wal, but
* will switch to using offline XLOG archives as soon as we reach the end of
* WAL in pg_wal.
*/
bool ArchiveRecoveryRequested = false;
bool InArchiveRecovery = false;
二、 进入恢复模式
从if (InRecovery) 部分开始,真正开始日志应用。首先会更新控制文件,说明当前进入了Recovery模式,并将读到的检查点信息也保存到控制文件。
/* REDO开始 */
if (InRecovery)
{
int rmid;
/*
* Update pg_control to show that we are recovering and to show the selected checkpoint as the place we are starting from. We also mark
pg_control with any minimum recovery stop point obtained from a backup history file.
更新控制文件状态以示我们正在恢复,并且展示我们选择作为恢复起点的检查点位置。另外还会用从备份历史文件获取的最小恢复结束位置(minimum recovery stop point)标记控制文件
*/
/* 先保存控制文件中的状态,然后更新 */
dbstate_at_startup = ControlFile->state;
/* 如果在使用归档日志进行恢复(PITR或者从库),更新状态 */
if (InArchiveRecovery)
{
ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
SpinLockAcquire(&XLogCtl->info_lck);
XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
SpinLockRelease(&XLogCtl->info_lck);
}
/* 否则,如果在使用WAL文件进行恢复(崩溃恢复) */
else
{
ereport(LOG,
(errmsg("database system was not properly shut down; "
"automatic recovery in progress")));
/* 如果指定了目标时间线,且大于控制文件中记录的当前时间线,则记录日志信息并更新时间线 */
if (recoveryTargetTLI > ControlFile->checkPointCopy.ThisTimeLineID)
ereport(LOG,
(errmsg("crash recovery starts in timeline %u "
"and has target timeline %u",
ControlFile->checkPointCopy.ThisTimeLineID,
recoveryTargetTLI)));
/* 修改控制文件状态,以示在进行崩溃恢复 */
ControlFile->state = DB_IN_CRASH_RECOVERY;
SpinLockAcquire(&XLogCtl->info_lck);
XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
SpinLockRelease(&XLogCtl->info_lck);
}
/* 更新控制文件中检查点信息 */
ControlFile->checkPoint = checkPointLoc;
ControlFile->checkPointCopy = checkPoint;
if (InArchiveRecovery)
{
/* initialize minRecoveryPoint if not set yet,最小恢复点不应该小于重做点 */
if (ControlFile->minRecoveryPoint < checkPoint.redo)
{
ControlFile->minRecoveryPoint = checkPoint.redo;
ControlFile->minRecoveryPointTLI = checkPoint.ThisTimeLineID;
}
}
/* 如果有backup_label文件 */
if (haveBackupLabel)
{
ControlFile->backupStartPoint = checkPoint.redo;
ControlFile->backupEndRequired = backupEndRequired;
/* 如果是从从库备份的 */
if (backupFromStandby)
{
/* 只可能是以下两种状态,若不是则报错 */
if (dbstate_at_startup != DB_IN_ARCHIVE_RECOVERY &&
dbstate_at_startup != DB_SHUTDOWNED_IN_RECOVERY)
ereport(FATAL,
(errmsg("backup_label contains data inconsistent with control file"),
errhint("This means that the backup is corrupted and you will "
"have to use another backup for recovery.")));
/* 若是,则更新备份结束点 */
ControlFile->backupEndPoint = ControlFile->minRecoveryPoint;
}
}
/* 更新时间 */
ControlFile->time = (pg_time_t) time(NULL);
/* No need to hold ControlFileLock yet, we aren't up far enough */
UpdateControlFile();
…
参考
《PostgreSQL技术内幕:事务处理深度探索》第4章
https://blog.csdn.net/asmartkiller/article/details/121245772
https://blog.nowcoder.net/n/a21fd782200e4f9f9054e66898bbccf4