PostgreSQL中的WAL：3。检查点

最新推荐文章于 2023-08-28 06:00:16 发布

cullen2012

最新推荐文章于 2023-08-28 06:00:16 发布

阅读量564

点赞数 1

文章标签： python linux java 数据库大数据

原文链接：https://habr.com/en/company/postgrespro/blog/494464/

版权

buffer cache — one of the main objects of the shared memory — and concluded that to recover after failure when all the RAM contents get lost, the 缓冲区高速缓存的结构(共享内存的主要对象之一)，并得出结论，要在所有RAM内容丢失后发生故障后恢复，必须保留 write-ahead log (WAL) must be maintained. 预写日志(WAL) 。

The problem yet unaddressed, where we left off last time, is that we are unaware of where to start playing back WAL records during the recovery. To begin from the beginning, as the King from Lewis Caroll's Alice advised, is not an option: it is impossible to keep all the WAL records from the server start — this is potentially both a huge memory size and equally huge duration of the recovery. We need such a point that is gradually moving forward and that we can start the recovery at (and safely remove all the previous WAL records, accordingly). And this is the checkpoint, to be discussed below.

我们上次中断的地方尚未解决的问题是，我们不知道在恢复期间从哪里开始播放WAL记录。从头开始，正如Lewis Caroll的爱丽丝(Alice)的国王所建议的那样，这是不可行的：不可能从服务器启动时保留所有WAL记录-这可能既有巨大的内存大小，也有同样长的恢复时间。我们需要逐步发展的这一点，并且可以在以下位置开始恢复(并相应地安全删除所有先前的WAL记录)。这是检查点 ，将在下面讨论。

检查站 (Checkpoint)

What features must the checkpoint have? We must be sure that all the WAL records starting with the checkpoint will be applied to the pages flushed to disk. If it were not the case, during recovery, we could read from disk a version of the page that is too old, apply the WAL record to it and by doing so, irreversibly hurt the data.

检查点必须具备哪些功能？我们必须确保所有从检查点开始的WAL记录都将应用于刷新到磁盘的页面。如果不是这种情况，则在恢复期间，我们可以从磁盘上读取一个过旧的页面版本，对其应用WAL记录，这样做会不可逆转地损害数据。

How can we get a checkpoint? The simplest option is to suspend the work of the system from time to time and flush all the dirty pages of the buffer and other caches to disk. (Note that pages are only written, but not evicted from the cache.) Such points will meet the above condition, but nobody will be happy to work with the system that continuously dies for some time, which is indefinite, but pretty significant.

我们如何获得检查站？最简单的选择是不时暂停系统工作，并将缓冲区的所有脏页和其他高速缓存刷新到磁盘。 (请注意，仅写入页面，而不从高速缓存中逐出页面。)这些点将满足上述条件，但是没有人愿意使用连续死亡一段时间的系统，这是不确定的，但意义重大。

So actually, this is a bit more complicated: a checkpoint turns from a point into an interval. First we start a checkpoint. After that we quietly flush dirty buffers to disk without interrupting the work or causing peak loads wherever possible.

所以实际上，这有点复杂：检查点从一个点变成一个间隔。首先，我们启动一个检查点。之后，我们悄悄地将脏缓冲区刷新到磁盘上，而不会中断工作或在任何可能的情况下导致峰值负载。

When all the buffers that are dirty at the checkpoint start time are on disk, the checkpoint is considered to be complete. Now (but not earlier) we can use the start time as the point to start recovery at. And we no longer need WAL records created up to this point in time.

当所有在检查点开始时变脏的缓冲区都在磁盘上时，该检查点被认为是complete 。现在(但不是更早)，我们可以使用开始时间作为开始恢复的时间。而且我们不再需要到现在为止创建的WAL记录。

A specialized background process called checkpointer performs a checkpoint.

一个称为检查点的专门后台进程执行一个检查点。

The duration of writing dirty buffers is defined by the value of the checkpoint_completion_target parameter. It shows the fraction of time between two neighboring checkpoints when writes are done. The default value is 0.5 (as in the figures above), that is, the writes take half the time between the checkpoints. Usually this value is increased up to 1.0 for more uniformity.

写入脏缓冲区的持续时间由checkpoint_completion_target参数的值定义。它显示了写入完成后两个相邻检查点之间的时间比例。默认值为0.5(如上图所示)，即两次检查之间的写入时间占一半。通常，此值增加到1.0，以实现更高的均匀性。

Let's consider in more detail what happens when a checkpoint is performed.

让我们更详细地考虑执行检查点时会发生什么。

First the checkpointer flushes XACT buffers to disk. Since there are few of them (only 128), they get written immediately.

首先，检查点将XACT缓冲区刷新到磁盘。由于它们很少(只有128个)，因此它们会立即被写入。

Then the main task starts: flushing dirty pages from the buffer cache. As we already mentioned, all the pages cannot be flushed instantaneously since the size of the buffer cache can be large. Therefore, all pages that are currently dirty are marked in the buffer cache with a special flag in the headers.

然后，主要任务开始：从缓冲区高速缓存中清除脏页。正如我们已经提到的，由于缓冲区高速缓存的大小可能很大，因此无法立即刷新所有页面。因此，所有当前脏的页面都在缓冲区高速缓存中用标头中的特殊标志标记。

And then the checkpointer gradually walks through all the buffers and flushes those flagged to disk. To remind you, pages are not evicted from the cache, but are only written to disk; therefore, you need not pay attention either to the usage count of a buffer or to whether it is pinned.

然后，检查指针逐渐遍历所有缓冲区，并将标记的缓冲区刷新到磁盘。提醒您，页面不会从高速缓存中逐出，而只会写入磁盘。因此，您不必关注缓冲区的使用计数或是否固定。

The flagged buffers can also be written by server processes — whichever reaches the buffer first. Anyway, a write resets the flag set earlier, so (for the purpose of a checkpoint) the buffer will be written only once.

服务器进程也可以写入已标记的缓冲区-以先到达缓冲区的缓冲区为准。无论如何，写操作会重置先前设置的标志，因此(出于检查点的目的)缓冲区将仅被写入一次。

Naturally, while a checkpoint is performed, pages continue to change in the buffer cache. But new dirty buffers are not flagged, and the checkpointer will not write them.

自然，在执行检查点时，页面继续在缓冲区高速缓存中更改。但是不会标记新的脏缓冲区，并且检查指针不会将它们写入。

At the end of its work, the process creates a WAL record of the end of the checkpoint. This record contains LSN of the checkpoint start time. Since the checkpoint does not write anything to WAL when it starts, any log record can be located at this LSN.

在工作结束时，该过程将创建检查点末尾的WAL记录。该记录包含检查点开始时间的LSN。由于检查点启动时不会向WAL写入任何内容，因此任何日志记录都可以位于此LSN上。

Besides, the indication of the last completed checkpoint is updated in the $PGDATA/global/pg_control file. Before the checkpoint is completed, pg_control points to the previous checkpoint.

此外，最后完成的检查点的指示在$PGDATA/global/pg_control文件中更新。在检查点完成之前， pg_control指向上一个检查点。

To watch the work of a checkpoint, let's create a table; its pages will get into the buffer cache and will be dirty:

为了观看检查点的工作，让我们创建一个表。它的页面将进入缓冲区缓存并变脏：

=> CREATE TABLE chkpt AS SELECT * FROM generate_series(1,10000) AS g(n);
=> CREATE EXTENSION pg_buffercache;
=> SELECT count(*) FROM pg_buffercache WHERE isdirty;

count
-------
    78
(1 row)

Let's remember the current WAL location:

让我们记住当前的WAL位置：

=> SELECT pg_current_wal_insert_lsn();

pg_current_wal_insert_lsn
---------------------------
 0/3514A048
(1 row)

Now let's perform the checkpoint manually to make sure that no dirty pages are left in the cache (as we already mentioned, new dirty pages can occur, but in the above situation there were no changes while the checkpoint was performed):

现在，让我们手动执行检查点，以确保高速缓存中不留任何脏页(正如我们已经提到的，可以出现新的脏页，但是在上述情况下，执行检查点时没有更改)：

=> CHECKPOINT;
=> SELECT count(*) FROM pg_buffercache WHERE isdirty;

count
-------
     0
(1 row)

Let's look how the checkpoint is reflected in WAL:

让我们看一下检查点在WAL中的体现：

=> SELECT pg_current_wal_insert_lsn();

pg_current_wal_insert_lsn
---------------------------
 0/3514A0E4
(1 row)

postgres$ /usr/lib/postgresql/11/bin/pg_waldump -p /var/lib/postgresql/11/main/pg_wal -s 0/3514A048 -e 0/3514A0E4

rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 0/3514A048, prev 0/35149CEC, desc: RUNNING_XACTS nextXid 101105 latestCompletedXid 101104 oldestRunningXid 101105

rmgr: XLOG        len (rec/tot):    102/   102, tx:          0, lsn: 0/3514A07C, prev 0/3514A048, desc: CHECKPOINT_ONLINE redo 0/3514A048; tli 1; prev tli 1; fpw true; xid 0:101105; oid 74081; multi 1; offset 0; oldest xid 561 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 101105; online

We see two records here. The last one is the record of the checkpoint completion (CHECKPOINT_ONLINE). LSN of the checkpoint start is output after the word «redo», and this location corresponds to the WAL record that was the last at the checkpoint start time.

我们在这里看到两个记录。最后一个是检查点完成的记录(CHECKPOINT_ONLINE)。在单词“ redo”之后输出检查点开始的LSN，此位置对应于在检查点开始时间的最后一个WAL记录。

We will find the same information in the control file:

我们将在控制文件中找到相同的信息：

postgres$ /usr/lib/postgresql/11/bin/pg_controldata -D /var/lib/postgresql/11/main | egrep 'Latest.*location'

Latest checkpoint location:           0/3514A07C
Latest checkpoint's REDO location:    0/3514A048

复苏 (Recovery)

We are now ready to state more exactly the recovery algorithm outlined in the previous article.

现在，我们准备更准确地陈述上一篇文章中概述的恢复算法。

If the server fails, at the subsequent start, the startup process detects this by looking into the pg_control file to find the status different from «shut down». Automatic recovery is done in this case.

如果服务器出现故障，则在随后的启动中，启动过程会通过查看pg_control文件来查找与“关闭”状态不同的状态，以进行检测。在这种情况下，将执行自动恢复。

First the recovery process will read the checkpoint start location from the same pg_control file. (To complete the picture, note that if the backup_label file is available, the checkpoint record is read from there — this is needed to restore from backups, but it is a topic for a separate series.)

首先，恢复过程将从同一pg_control文件读取检查点开始位置。 (要完成此操作，请注意，如果backup_label文件可用，那么将从那里读取检查点记录-从备份中还原这是必需的，但这是另一个系列的主题。)

Then the process will read WAL starting at the location found and apply WAL records to the pages one by one (if the need arises, as we discussed last time).

然后，该过程将从找到的位置开始读取WAL，并将WAL记录逐一应用于页面(如果有需要，正如我们上次讨论的那样)。

In the end all unlogged tables are emptied by their initialization forks.

最后，所有未记录的表都将通过其初始化分支清空。

This is where the startup process finishes its work, and the checkpointer process immediately performs the checkpoint in order to secure the restored state on disk.

这是启动过程完成工作的地方，检查点进程立即执行检查点以保护磁盘上已还原的状态。

We can simulate a failure by forcing the server shutdown in the immediate mode.

我们可以通过强制服务器在即时模式下关闭来模拟故障。

student$ sudo pg_ctlcluster 11 main stop -m immediate --skip-systemctl-redirect

(The --skip-systemctl-redirect option is needed here since we use PostgreSQL installed on Ubuntu from the package. It is controlled by the pg_ctlcluster command, which actually calls systemctl, which, in its turn, calls pg_ctl. Because of all this wrapping, the name of the mode gets lost on the way. But the --skip-systemctl-redirect option enables us to do without systemctl and retain the important information.)

(这里需要--skip-systemctl-redirect选项，因为我们使用从软件包中安装在Ubuntu上的PostgreSQL。它由pg_ctlcluster命令控制，该命令实际上调用systemctl ，而后者又调用pg_ctl 。因此，包装时，模式名称便会丢失。但是--skip-systemctl-redirect选项使我们无需systemctl即可执行操作并保留重要信息。)

Let's check the state of the cluster:

让我们检查集群的状态：

postgres$ /usr/lib/postgresql/11/bin/pg_controldata -D /var/lib/postgresql/11/main | grep state

Database cluster state:               in production

At the start, PostgreSQL understands that a failure occurred and recovery is required.

最初，PostgreSQL知道发生了故障，需要恢复。

student$ sudo pg_ctlcluster 11 main start

postgres$ tail -n 7 /var/log/postgresql/postgresql-11-main.log

2019-07-17 15:27:49.441 MSK [8865] LOG:  database system was interrupted; last known up at 2019-07-17 15:27:48 MSK
2019-07-17 15:27:49.801 MSK [8865] LOG:  database system was not properly shut down; automatic recovery in progress
2019-07-17 15:27:49.804 MSK [8865] LOG:  redo starts at 0/3514A048
2019-07-17 15:27:49.804 MSK [8865] LOG:  invalid record length at 0/3514A0E4: wanted 24, got 0
2019-07-17 15:27:49.804 MSK [8865] LOG:  redo done at 0/3514A07C
2019-07-17 15:27:49.824 MSK [8864] LOG:  database system is ready to accept connections
2019-07-17 15:27:50.409 MSK [8872] [unknown]@[unknown] LOG:  incomplete startup packet

A need for recovery is reported in the message log: database system was not properly shut down; automatic recovery in progress. Then playback of WAL records starts at the «redo starts at» location and continues while it is possible to get next WAL records. This completes the recovery at the «redo done at» location, and the DBMS starts working with clients (database system is ready to accept connections).

消息日志中报告了需要恢复： 数据库系统未正确关闭； 自动恢复进行中 。然后在“重做开始于”位置开始播放WAL记录，并在可能获取下一个WAL记录的同时继续播放。这样就可以在“重做完成”位置完成恢复，并且DBMS开始与客户端一起工作( 数据库系统已准备好接受连接 )。

And what happens at a normal shutdown of the server? To flush dirty pages to disk, PostgreSQL disconnects all the clients and then performs the final checkpoint.

在服务器正常关闭时会发生什么？要将脏页刷新到磁盘，PostgreSQL断开所有客户端的连接，然后执行最终检查点。

Let's remember the current WAL location:

让我们记住当前的WAL位置：

=> SELECT pg_current_wal_insert_lsn();

pg_current_wal_insert_lsn
---------------------------
 0/3514A14C
(1 row)

Now we shut down the server in a regular way:

现在，我们以常规方式关闭服务器：

student$ sudo pg_ctlcluster 11 main stop

Let's check the cluster state:

让我们检查集群状态：

postgres$ /usr/lib/postgresql/11/bin/pg_controldata -D /var/lib/postgresql/11/main | grep state

Database cluster state:               shut down

And WAL has the only record of the final checkpoint (CHECKPOINT_SHUTDOWN):

而且WAL具有最终检查点(CHECKPOINT_SHUTDOWN)的唯一记录：

postgres$ /usr/lib/postgresql/11/bin/pg_waldump -p /var/lib/postgresql/11/main/pg_wal -s 0/3514A14C

rmgr: XLOG        len (rec/tot):    102/   102, tx:          0, lsn: 0/3514A14C, prev 0/3514A0E4, desc: CHECKPOINT_SHUTDOWN redo 0/3514A14C; tli 1; prev tli 1; fpw true; xid 0:101105; oid 74081; multi 1; offset 0; oldest xid 561 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown

pg_waldump: FATAL:  error in WAL record at 0/3514A14C: invalid record length at 0/3514A1B4: wanted 24, got 0

(By the fatal error message pg_waldump only wants to inform us that it read WAL to the end.)

(通过致命错误消息pg_waldump只想通知我们它读到WAL为止。)

Let's run the instance again.

让我们再次运行实例。

student$ sudo pg_ctlcluster 11 main start

背景写 (Background write)

As we figured out, the checkpoint is one of the processes that flushes dirty pages from the buffer cache to disk. But it is not the only one.

如我们所知，检查点是将脏页从缓冲区高速缓存刷新到磁盘的过程之一。但这不是唯一的。

If a backend process needs to flush a page from a buffer, but the page appears to be dirty, the process will have to write the page on disk on its own. And this situation is no good since it entails waits — it is much better when a write is done asynchronously, in the background.

如果后端进程需要从缓冲区刷新页面，但是该页面看上去很脏，则该进程将不得不自行将页面写入磁盘。而且这种情况并不好，因为它需要等待-在后台异步完成写入会更好。

Therefore, in addition to the checkpointer process, there is also the background writer process (also referred to as bgwriter or just writer). This process uses the same algorithm for searching buffers as the eviction technique. There are ultimately two differences.

因此，除了检查指针过程外，还存在后台 编写器进程(也称为bgwriter或just writer )。该过程使用与驱逐技术相同的算法来搜索缓冲区。最终有两个区别。

The writer's own pointer is used rather than the pointer to the «next victim». It can be ahead of the pointer to the «victim», but never behind it.
使用作者自己的指针，而不是指向“下一个受害者”的指针。它可以在指向“受害者”指针之前，但是永远不能在它之后。
In a walk through the buffers, the usage count is not reduced.
遍历缓冲区时，使用计数不会减少。

Those buffers are written for which all of this is true:

这些都是针对所有情况而写的缓冲区：

They contain changed data (dirty).
它们包含已更改的数据(脏)。
They are not pinned (pin count = 0).
它们没有固定(引脚数= 0)。
They have usage count = 0.
它们的使用计数= 0。

So, the background writer process sort of runs ahead of eviction and finds those buffers that are more likely to be evicted soon. Ideally, due to this, backend processes must be able to detect that the buffers they chose can be used without wasting time on writes.

因此，后台编写器进程先于逐出，然后找到更可能很快被逐出的缓冲区。理想地，因此，后端进程必须能够检测到他们选择的缓冲区可以被使用，而不会浪费写入时间。

调音 (Tuning)

stegmatic for pointing out to the mistake. Stegmatic指出错误。

The checkpointer is usually set up from the following reasoning.

通常根据以下推理来设置检查指针。

First, we need to determine what amount of WAL records we can afford keeping between two subsequent checkpoints (and what recovery time is acceptable to us). The more the better, but for obvious reasons, this value is limited.

首先，我们需要确定在两个后续检查点之间可以负担多少数量的WAL记录(以及我们可以接受的恢复时间)。越多越好，但是出于明显的原因，该值是有限的。

Then we can count what time it takes to generate this amount at normal load. We've already discussed how to do this (we need to remember the locations in WAL and subtract one from the other).

然后，我们可以计算出在正常负载下生成此金额所需的时间。我们已经讨论了如何执行此操作(我们需要记住WAL中的位置，并从另一个位置中减去一个)。

It's this time that will be our usual interval between checkpoints. And we write it into the checkpoint_timeout parameter. The default value is 5 minutes, which is evidently too little; this time is usually increased, say, to half an hour. To reiterate: the less frequent checkpoints we can afford, the better — this reduces overhead costs.

这次将是我们在检查点之间的通常间隔。然后将其写入checkpoint_timeout参数。默认值为5分钟，显然太短；这段时间通常会增加到半个小时。需要重申的是：我们负担得起的检查点次数越少越好，这会减少管理费用。

But it is possible (and even likely) that sometimes the load will be higher than usual and a too large amount of WAL records will be generated within the time specified in the parameter. In this case, it is desirable to perform the checkpoint more often. To this end, we specify the total permissible amount of WAL files in the max_wal_size parameter. If the actual amount is greater, the server initiates an unplanned checkpoint.

但是有可能(甚至可能)有时负载会比平时更高，并且在参数指定的时间内会生成过多的WAL记录。在这种情况下，希望更频繁地执行检查点。为此，我们指定的max_wal_size参数的WAL文件的总容许量。如果实际数量更多，则服务器将启动计划外的检查点。

The server needs to keep WAL files starting at the moment of the last completed checkpoint plus the files accumulated during the current checkpoint. So, the total amount can be estimated as amount in one checkpoint cycle multiplied by (1 + checkpoint_completion_target). Before version 11, we should multiply by (2 + checkpoint_completion_target) instead, as PostgreSQL also retained files from the last but one checkpoint.

服务器需要保留从最后一个完成的检查点开始的WAL文件以及当前检查点期间累积的文件。因此，可以将总数量估算为一个检查点周期中的数量乘以(1 + checkpoint_completion_target )。在版本11之前，我们应该乘以(2 + checkpoint_completion_target )，因为PostgreSQL还保留了最后一个检查点中的文件。

So, most of checkpoints are performed on schedule: once every checkpoint_timeout units of time. But at an increased load, checkpoints are performed more often, when the amount of max_wal_size is reached.

因此，大多数检查点都按计划执行：每个checkpoint_timeout时间单位一次。但是在负载增加时，达到max_wal_size的数量时，检查点执行的频率会更高。

It is important to understand that the value of max_wal_size parameter may be exceeded:

重要的是要理解可以超过max_wal_size参数的值：

The value of the max_wal_size parameter is only what's desirable rather than a strict limitation. The amount may happen to be greater.
max_wal_size参数的值仅是理想值，而不是严格的限制。该数量可能恰好更大。
The server is not eligible to erase WAL files that haven't been passed through the replication slots yet and haven't been archived yet if continuous archiving is in place. If this functionality is used, continuous monitoring is needed to avoid the server memory grow full.
如果进行了连续归档，则服务器不具备擦除尚未通过复制插槽传递的WAL文件的资格，并且尚未归档。如果使用此功能，则需要连续监视以避免服务器内存已满。

To complete the picture, not only the maximum, but the minimum amount can be specified by the min_wal_size parameter. The purpose of this setting is that the server does not delete files while their amount fits in min_wal_size, but only renames and reuses them. This allows saving on continuous creation and deletion of files.

要完成图片，不仅可以通过min_wal_size参数指定最大值，而且可以指定最小值。此设置的目的是，服务器不会在文件大小适合min_wal_size的情况下删除文件，而只会重命名并重新使用它们。这样可以节省连续创建和删除文件的时间。

It makes sense to tune the background writer only when the checkpointer is tuned. These processes together must manage to write dirty buffers before backend processes need them.

仅在调整检查指针时才调整后台编写器才有意义。这些过程必须一起设法在后端进程需要它们之前写入脏缓冲区。

The background writer works in rounds of bgwriter_lru_maxpages pages at a maximum and sleeps between rounds for bgwriter_delay units of time.

后台编写器最多在bgwriter_lru_maxpages页的回合中工作，并且在回合之间以bgwriter_delay时间单位Hibernate。

The number of pages to be written in one round is determined from the average number of buffers that were requested by backend processes starting with the previous round (the moving average is used here to smooth over the non-uniformity between rounds, but not depend on the far-off history). The computed number of buffers is multiplied by the bgwriter_lru_multiplier coefficient (but anyway will not be greater than bgwriter_lru_maxpages).

在一轮中要写入的页面数取决于后端处理从上一轮开始请求的平均缓冲区数(此处使用移动平均值来平滑各轮之间的不均匀性，但不取决于遥远的历史)。计算出的缓冲区数量乘以bgwriter_lru_multiplier系数(但无论如何不会大于bgwriter_lru_maxpages )。

The default values are: bgwriter_delay = 200 ms (most likely, too much — a lot of water will go under the bridge for 1/5 of a second), bgwriter_lru_maxpages = 100, bgwriter_lru_multiplier = 2.0 (let's try to respond to the demand in advance).

默认值为： bgwriter_delay = 200毫秒(最有可能的是，太多了-桥下流了很多水，持续时间为1/5秒)， bgwriter_lru_maxpages = 100， bgwriter_lru_multiplier = 2.0(让我们尝试响应以下需求提前)。

If the process did not find dirty buffers at all (that is, nothing is happening in the system), it «goes into hibernation», which it comes out of when a server process requests a buffer. After waking up, the process works as usual again.

如果该进程根本找不到脏缓冲区(也就是说，系统中什么也没有发生)，则它“进入Hibernate状态”，这是服务器进程请求缓冲区时产生的。唤醒后，该过程将照常进行。

监控方式 (Monitoring)

You can and need to tune the checkpointer and background writer from the monitoring feedback.

您可以并且需要根据监视反馈来调整检查点和背景编写器。

The checkpoint_warning parameter outputs a warning if checkpoints caused by exceeding the amount of WAL files are performed too often. Its default value is 30 seconds, and we need to adjust it to the value of checkpoint_timeout.

如果由于超出WAL文件数量而导致的检查点执行过多，则checkpoint_warning参数将输出警告。它的默认值为30秒，我们需要将其调整为checkpoint_timeout的值。

The log_checkpoints parameter (turned off by default) enables us to get information on the checkpoints being performed from the message log of the server. Let's turn it on.

log_checkpoints参数(默认情况下处于关闭状态)使我们能够从服务器的消息日志中获取有关正在执行的检查点的信息。让我们打开它。

=> ALTER SYSTEM SET log_checkpoints = on;
=> SELECT pg_reload_conf();

Now let's change something in the data and perform the checkpoint.

现在，让我们更改数据中的某些内容并执行检查点。

=> UPDATE chkpt SET n = n + 1;
=> CHECKPOINT;

In the message log we will see the information such as:

在消息日志中，我们将看到以下信息：

postgres$ tail -n 2 /var/log/postgresql/postgresql-11-main.log

2019-07-17 15:27:55.248 MSK [8962] LOG:  checkpoint starting: immediate force wait
2019-07-17 15:27:55.274 MSK [8962] LOG:  checkpoint complete: wrote 79 buffers (0.5%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.013 s, total=0.025 s; sync files=2, longest=0.011 s, average=0.006 s; distance=1645 kB, estimate=1645 kB

We can see here how many buffers were written, how the set of WAL files changed after the checkpoint, how long it took to perform the checkpoint and what the distance (in bytes) between the neighboring checkpoints is.

我们可以在此处看到写入了多少缓冲区，在检查点之后更改了WAL文件集，执行检查点花费了多长时间以及相邻检查点之间的距离(以字节为单位)。

But probably the most useful information is the statistics for the checkpointer and background writer processes in the pg_stat_bgwriter view. The view is shared by the two since formerly one process performed both tasks, then the responsibilities were split, but the view remained.

但是，最有用的信息可能是pg_stat_bgwriter视图中检查点和后台pg_stat_bgwriter进程的统计信息。该视图由两者共享，因为以前一个过程执行了两个任务，然后责任被分割，但视图仍然存在。

=> SELECT * FROM pg_stat_bgwriter \gx

-[ RECORD 1 ]---------+------------------------------
checkpoints_timed     | 0
checkpoints_req       | 1
checkpoint_write_time | 1
checkpoint_sync_time  | 13
buffers_checkpoint    | 79
buffers_clean         | 0
maxwritten_clean      | 0
buffers_backend       | 42
buffers_backend_fsync | 0
buffers_alloc         | 363
stats_reset           | 2019-07-17 15:27:49.826414+03

Here among the rest, we see the number of checkpoints performed:

在其余的部分中，我们看到执行的检查点数量：

checkpoints_timed — on schedule (on reaching checkpoint_timeout).
checkpoints_timed —按计划(到达checkpoint_timeout时 )。
checkpoints_req — on demand (including those performed on reaching max_wal_size).
checkpoints_req —按需(包括在达到max_wal_size时执行的检查 )。

A large value of checkpoint_req (compared to checkpoints_timed) indicates that checkpoints occur more often than expected.

的一个较大的值checkpoint_req (相比于checkpoints_timed )表示发生的检查点更经常超过预期。

The following is important information on the number of pages written:

以下是有关写入页数的重要信息：

buffers_checkpoint — by the checkpointer.
buffers_checkpoint —由检查点组成。
buffers_backend — by backend processes.
buffers_backend —通过后端进程。
buffers_clean — by the background writer.
buffers_clean —由后台作者buffers_clean 。

In a well-tuned system, the value of buffers_backend must be much less than the sum of buffers_checkpoint and buffers_clean.

在一个经过良好调整的系统中， buffers_backend的值必须大大小于buffers_checkpoint和buffers_clean的总和。

The value of maxwritten_clean will also help to tune the background writer. It shows how many times the process stopped because of exceeding the value of bgwriter_lru_maxpages.

maxwritten_clean的值也将有助于调整后台maxwritten_clean 。它显示了由于超出bgwriter_lru_maxpages的值而使进程停止了多少次。

You can reset the collected statistics in the call:

您可以在通话中重置收集的统计信息：

=> SELECT pg_stat_reset_shared('bgwriter');

Read on. 继续阅读。

翻译自: https://habr.com/en/company/postgrespro/blog/494464/

cullen2012

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
PostgreSQL中的WAL：3。检查点

buffer cache— one of the main objects of the shared memory— and concluded that to recover after failure when all the RAM contents get lost, the 缓冲区高速缓存的结构(共享内存的主要对象之一)，并得出结论，要在所有RAM内容丢失后发生故障后恢复，必须保留...
复制链接

扫一扫