pg_rewind

最新推荐文章于 2024-06-23 09:31:57 发布

duxiaohua15

最新推荐文章于 2024-06-23 09:31:57 发布

阅读量1.4k

点赞数

分类专栏：工具使用文章标签： postgresql pg-rewind

工具使用专栏收录该内容

4 篇文章 0 订阅

订阅专栏

pg_rewind
Name
pg_rewind – synchronize a PostgreSQL data directory with another data directory that was forked from the first one
Synopsis
pg_rewind–进行一个PostgreSQL数据目录和另外一个数据目录的同步。

pg_rewind [option…] {-D | –target-pgdata} directory {–source-pgdata=directory | –source-server=connstr}
Description

pg_rewind is a tool for synchronizing a PostgreSQL cluster with another copy of the same cluster, after the clusters’ timelines have diverged（timelines 发生分支）. A typical scenario is to bring an old master server back online after failover, as a standby that follows the new master.（典型的做法是，在failover成功后，将老的master服务器作为standby服务器）

The result is equivalent to replacing the target data directory with the source one（等价于用源端数据目录替换目标端数据目录）. All files are copied, including configuration files.（所有数据都需要拷贝，包括配置文件）。 The advantage of pg_rewind over taking a new base backup, or tools like rsync, is that pg_rewind does not require reading through all unchanged files in the cluster（pg_rewind不用拷贝集群中未改变的数据）. That makes it a lot faster when the database is large and only a small portion of it differs between the clusters（其在数据量非常大，而且只有少量数据变化的情况下，可以大大加快同步的时间）.

pg_rewind examines the timeline histories of the source and target clusters to determine the point where they diverged, and expects to find WAL in the target cluster’s pg_xlog directory reaching all the way back to the point of divergence（pg_rewind 检查源端和目标端的timeline历史，以决定发生日志分支的地方，并找出目标端集群pg_xlog目录日志发生分支的点）. In the typical failover scenario where the target cluster was shut down soon after the divergence, that is not a problem, but if the target cluster had run for a long time after the divergence, the old WAL files might not be present anymore典型的failover场景，目标端在发生日志分支后就关闭了，这个没有问题，但是如果目标集群在发生分支后又运行了很长的时间，老的WAL文件可能已经不在了。. In that case, they can be manually copied from the WAL archive to the pg_xlog directory这种情况可以将手工将WAL 文件从WAL归档目录拷贝到pg_xlog目录下. Fetching missing files from a WAL archive automatically is currently not supported.自动的从WAL归档目录取WAL日志，目前还不支持。

When the target server is started up for the first time after running pg_rewind, it will go into recovery mode and replay all WAL generated in the source server after the point of divergence当目标服务器在pg_rewind后第一次启动，它将进入recovery 模式，并且replay所有在源端产生的发生于分支后的WAL日志. If some of the WAL was no longer available in the source server when pg_rewind was run, and therefore could not be copied by pg_rewind session, it needs to be made available when the target server is started up如果某些WAL在pg_rewind运行时不能使用，因此而不能被pg_rewind session拷贝，需要在目标端服务器启动时保证它们可用. That can be done by creating a recovery.conf file in the target data directory with a suitable restore_command.可以通过在目标端数据目录下的recovery.conf文件中配置合适的restore_command命令来实现。

pg_rewind requires that the target server either has the wal_log_hints option is enabled in postgresql.conf or that data checksums were enabled when the cluster was initialized with initdb.pg_rewind需要目标端服务器启动了wal_log_hints选项（postgresql.conf文件中配置）或者在initdb时data checksums被起用。 Neither of these are currently on by default默认情况下它们都未启用. full_page_writes must also be enabled（full_page_writes 必须被起用）. That is the default（这个默认被启用）.
Options

pg_rewind accepts the following command-line arguments:

-D directory
–target-pgdata=directory

This option specifies the target data directory that is synchronized with the source. The target server must shut down cleanly before running pg_rewind

目标端服务器需要彻底关闭，当运行pg_rewind命令时。
–source-pgdata=directory

Specifies path to the data directory of the source server, to synchronize the target with. When --source-pgdata is used, the source server must be cleanly shut down.当--source-pgdata被使用时，源端服务器必须彻底关闭。

–source-server=connstr

Specifies a libpq connection string to connect to the source PostgreSQL server to synchronize the target with. The connection must be a normal (non-replication) connection with superuser access. The server must be up and running, and must not be in recovery mode.
指定一个libqp连接字符串，以用于连接到源PostgreSQL服务器，连接必须是一个使用superuser的normal连接（非relication）连接，服务器必须启动运行状态，且不能在recovery模式下。

-n
–dry-run

Do everything except actually modifying the target directory. 做所有在目标端确实需要改动的

-P
–progress

Enables progress reporting. Turning this on will deliver an approximate progress report while copying data over from the source cluster.
启用进度报告

–debug

Print verbose debugging output that is mostly useful for developers debugging pg_rewind.
打印调试信息

-V
–version

Display version information, then exit.

-?
–help

Show help, then exit.

Environment

When –source-server option is used, pg_rewind also uses the environment variables supported by libpq (see Section 31.14).
Notes
How it works

The basic idea is to copy everything from the new cluster to the old cluster, except for the blocks that we know to be the same.

Scan the WAL log of the old cluster, starting from the last checkpoint before the point where the new cluster's timeline history forked off from the old cluster扫描老集群的WAL日志，从新集群与老集群发生分支前的最后一个检查点开始. For each WAL record, make a note of the data blocks that were touched对于每个WAL记录，为其相关的数据打上标记. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off.这将产生一个新集群与老集群发生分支后的一个数据列表。

Copy all those changed blocks from the new cluster to the old cluster.

Copy all other files such as clog and configuration files from the new cluster to the old cluster, everything except the relation files.

Apply the WAL from the new cluster, starting from the checkpoint created at failover. (Strictly speaking, pg_rewind doesn't apply the WAL, it just creates a backup label file indicating that when PostgreSQL is started, it will start replay from that checkpoint and apply all the required WAL.)