作者:杨向博
一、Checkpoint简介
官方文档对于checkpoint的描述:
Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint.
At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. (The change records were previously flushed to the WAL files.)
In the event of a crash, the crash recovery procedure looks at the latest checkpoint record to determine the point in the log (known as the redo record) from which it should start the REDO operation.
Any changes made to data files before that point are guaranteed to be already on disk.
Hence, after a checkpoint, log segments preceding the one containing the redo record are no longer needed and can be recycled or removed. (When WAL archiving is being done, the log segments must be archived before being recycled or removed.)
简单来说,checkpoint就是一个事务顺序的记录点,checkpoint主要是进行刷脏页,redo时会参考checkpoint进行日志回放。除了刷脏之外还会更新一些位点信息,清理一些不再需要的wal。
下图分为part1-4,4个部分描述checkpoint的触发条件,以及触发后进行的操作等。
二、Checkpoint的触发条件
如图Part1:
在PostgreSQL中Checkpoint是由checkpointer进程执行的,大致的逻辑是这样子的。Checkpointer进程的主流程是一个无条件的for循环,在未触发checkpoint时一直在WaitLatch中sleep,也就是在epoll_wait中观察list链表,查看是否有事件句柄已经就绪(某个条件在触发checkpoint);
如果已经存在就绪事件,则wake up(通过SetLatch中write pipe的方式wake up),执行checkpoint。
哪些条件会触发checkpoint呢?
Checkpoint是由一些flag来触发的,这些flag并不只是单独作用,大多情况下是根据场景多个flag进行或运算组合为ckpt_flags
根据触发方式flag可以分为两种:
1、checkpointer进程本身通过checkpoint_timeout触发
#define CHECKPOINT_CAUSE_TIME 0x0100 /* Elapsed time */
2、其他进程向checkpointer发送信号触发:
#define CHECKPOINT_IS_SHUTDOWN 0x0001 /* Checkpoint is for shutdown */
主要场景:数据库shutdown时
其它进程调用RequestCheckpoint向checkpointer进程发送SIGINT信号触发
如图Part2:
Step1:修改共享内存CheckpointerShmem->ckpt_flags,传入对应的flags
Step2:向checkpointer进程发送SIGINT信号,唤醒进程
#define CHECKPOINT_END_OF_RECOVERY 0x0002 /* Like shutdown checkpoint, but issued at end of WAL recovery */
主要场景:startup进程StartupXlog完成时
#define CHECKPOINT_IMMEDIATE 0x0004 /* Do it without delays */
主要场景:当postgres为standalone backend模式请求checkpoint时;Basebackup执行备份时
#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */
主要场景:手动执行checkpoint命令;standby实例进行promote时
#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those belonging to unlogged tables */
主要场景:drop database或者create database后
#define CHECKPOINT_CAUSE_XLOG 0x0040 /* XLOG consumption */
主要场景:wal新增数量大于等于CheckPointSegments – 1时,默认参数下大致是42。
在9.5后CheckPointSegments不再是一个单独参数,根据max_wal_size_mb和checkpoint_completion_target参数联动。
CalculateCheckpointSegments函数中计算CheckPointSegments = max_wal_size_mb/(wal_segment_size/(1024*1024))/(1.0 + CheckPointCompletionTarget)