pg 备份恢复(一)—— 热备份

pg通常使用以下几种方法进行备份,下面分别记录主要原理和用法:

  • 使用低级API备份
  • pg_basebackup工具
  • pg_rman工具

一、 使用低级API备份

pg 9.1前最常用的方法,使用pg_start_backup和pg_stop_backup这些低级API进行备份。如果希望用更灵活的方式(例如rsync、scp等)创建基础备份, 依然可以使用该方法 。 同时,使用低级API创建基础备份也是理解PIRT的关键和基础。

主要有三个步骤

  • 执行 pg_start_backup 命令开始执行备份
  • 创建数据目录的副本(通常为cp或scp)
  • 执行 pg stop_backup 命令结束备份

1. pg_start_backup

pg_start_backup进行创建基础备份的准备工作,包括 :

  • 判断 WAL归档是否已开启 

如果没有开启,备份依然会进行, 但在备份结束后会显示提醒信息:

NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup

意思是WAL归档未启用,必须确保通过其他方式复制所有必需的WAL以完成备份。对于较大的、写入频繁的生产环境数据库来说等pg_start_backup命令结束再去复制必需的WAL是不现实的,很可能那些文件已经被重用了,务必提前开启归档 。

  • 强制进入全页写模式

判断当前配置是否为全页写模式,如果当前full_page_writes设置为off,则强制更改为on

  • 创建一个检查点
  • 排他基础备份的情况下还会创建backup_label文件,backup_label文件包含以下几项 :
START WAL LOCATION: WAL起始位置
CHECKPOINT LOCATION:记录由命令创建的检查点的LSN位置
BACKUP METHOD:      值为pg_start_backup或pg_basebackup,若只是配置流复制则为streamed
BACKUP FROM:        是在主库还是从库做的基础备份
STARTTIME:          执行pg_start_backup的时间戳
LABEL:              在pg_start_backup中指定的标签

pg_start_backup函数定义如下

pg_start_backup(label text [,fast boolean[,exclusive boolean]])

字段含义:
label:    用户定义的备份标签,一般使用备份文件名加日期
fast:     是否尽快开始备份,默认值是false;若设置为true,会使用尽可能多的I/O执行Checkpoint
exclusive:指定是否为排他基础备份,即是否允许其他并发的备份同时进行,由于排他基础备份已经被废弃,通常设置为false

pg_is_in_backup函数

在系统管理函数中 ,还有一个pg_is_in_backup函数,用于检查当前是否在执行一个排他的备份,即是否有exclusive参数设置为TRUE的备份,不能用它检查是否有非排他的备份在进行

 

2. 使用命令创建数据目录的副本

使用rsync,tar,cp,scp等命令都可以创建数据目录的副本。在创建过程中可以排除pg_wal和pg_replslot目录、postmaster.opts文件、postmaster.pid文件,这些目录和文件对恢复并没有帮助。

 

3. 执行pg_stop_backup命令

执行pg_stop_backup 命令时,执行五个操作来结束备份:

  • 如果在执行pg_start_backup命令时,full-page-writes的值曾被强制修改,则恢复到之前的值
  • 写一个备份结束的XLOG记录
  • 切换WAL文件
  • 创建一个备份历史文件,该文件包含 backup_label文件的内容及执行pg_stop_backup的时间戳
  • 删除backup_label文件,backup_label文件对于从基本备份进行恢复是必需的,一旦进行复制,就不需要该文件了 。

下面是创建非排他基础备份的例子

--执行pg_start_backup开始备份
SELECT pg_start_backup('pgdata backup',false,false);

#创建数据目录的副本
cd /pgdata/10/backups
tar -cvf pgdata.tar.gz $PGDATA --exclude=$PGDATA/pg_wal

#若用于流复制,则远程拷贝至slave端并在备机端解压
scp pgdata.tar.gz  postgres@ip:/data/PRD/postgres/10.7/pg5432
tar xzvf pgdata.tar.gz

--执行pg_stop_backup结束备份
SELECT pg_stop_backup(false);

这样就完成了一个基础备份

 

二、 pg_basebackup工具

pg 9.1开始提供的基础备份工具,支持对主库发起一个在线基准备份,并自动进入备份模式进行数据库备份,备份完成后自动从备份模式退出,不需执行pg_start_backup()和 pg_stop_backup()显式声明进入和退出备份模式,极大简化了原有方式。

pg_basebackup依然是物理备份,会拷贝整个实例的数据。它使用replication协议连接到DB实例,因此使用该工具备份需要超级用户权限或REPLICATION权限。另外pg_basebackup将消耗至少一个WAL发送进程需要 ,注意max_wal_senders参数

pg_hba.conf必须允许replication连接,即需添加类似内容:

local replication  repuser                trust
local replication  repuser                ident
host replication  repuser   10.0.0.0/8    md5

使用pg_basebackup创建基础备份,只要一条命令,大大简化了操作步骤

pg_basebackup -Ft -Pv -Xf -z -ZS -D /pgdata/10/backups/

[postgres@pghostl ~]$ll -h /pgdata/10/backups/
total 3.8M
-rw-r--r-- 1 postgres postgres 3.8M Feb 11 01:26 base.tar.gz

使用pg_basebackup创建用于流复制的基础备份,只要在从库执行一条命令(主库提前配置好pg_hba.conf)

pg_basebackup -h主库ip -p5432 -U repuser -F p -P -v -Xfetch -R -D $PGDATA -l Replication

pg 9.2 开始支持在从库做备份,但有一些注意事项:

  • 不会创建备份历史文件
  • 不保证备份需要的所有WAL文件在备份结束时被归档。如果计划将该备份用于归档恢复且想要确保所需文件在那个时刻都可用,需要通过-x参数将它们也包括在备份中
  • 如果在在线备份期间后备库被提升为主库,备份会失败
  • 主库必须启用full_page_writes,且WAL文件不能被类似pg_compresslog的工具移除全页写信息

主要参数如下

-D directory或--pgdata=directory
指定备份输出目录(必须为空,若目录不存在会自动创建)

-F format或--format=format
选择输出格式(p为照主库原样输出,t或tar将打包输出到tar文件中)

-l label或--label=label
为备份设置标签,默认为"pg_basebackup base backup"。

-r rate或--max-rate=rate
从该服务器传输数据的最大速度,用于限制在运行服务器上的pg_basebackup产生的影响。单位是KB/s,加后缀M表示MB/s(范围:32KB/s-1024MB/s)。

-R或--write-recovery-conf
在输出目录中自动生成一个最小配置的recovery.conf以简化流复制配置。

-x或--xlog
备份时会将备份中产生的wal文件也自动备份出来,使用前需要设置wal_keep_segments参数,保证备份过程中需要的wal文件不会被覆盖。这个选项等效于-X fetch

-X method
method可取值为f(或fetch),s(或stream)。-X s表示备份开始后启动另一个流复制连接从主库接收wal日志,这种方式需要与主库建立两个连接。

-z或--gzip
输出tar文件时才可用,启用gzip压缩,生成.tar.gz文件

-Z level
与-z一起用,指定gzip压缩级别(1-9,9最大)

-P
实时打印备份进度

-v或--verbose
详细模式,与-P一起使用会显示当前正在被备份的文件信息。

-w或--no-password
指定不提示输入密码,这个选项对于批处理任务和脚本非常有用。

-W或--password
强制pg_basebackup提示输入密码。非必须,如果服务器要求密码认证,pg_basebackup将自动提示输入密码,但默认情况pg_basebackup需要浪费一次连接尝试来发现服务器需要输入密码,在某些情况下值得用-W来避免额外的连接尝试。

所有可用参数参考
http://www.postgres.cn/docs/9.6/app-pgbasebackup.html
 

三、 pg_rman工具

pg_rman是由C语言开发的备份工具,有三种备份模式:全备、增备、仅归档备份

使用前提

  • 开启归档     archive_mode = on
  • 配置csvlog  log_destination = 'stderr,csvlog'

 

1. 安装步骤

源码下载

下载版本需要对pg对应的版本。如pg版本是10.x则使用REL10_stable版
        https://github.com/ossc-db/pg_rman/tree/REL10_STABLE

帮助文档  https://github.com/ossc-db/pg_rman/blob/REL_10_STABLE/docs/index.html

解压安装文件

su - postgres
unzip pg_rman-REL10_STABLE.zip

编译安装

make & make install   # postgres用户操作

make日志最后输出:
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 backup.o catalog.o data.o delete.o dir.o init.o parray.o pg_rman.o restore.o show.o util.o validate.o xlog.o pgsql_src/pg_ctl.o pgut/pgut.o pgut/pgut-port.o -L/data/PRD/postgres/base/10.7/lib -lpgcommon -lpgport -L/data/PRD/postgres/base/10.7/lib -lpq -L/data/PRD/postgres/base/10.7/lib -Wl,--as-needed -Wl,-rpath,'/data/PRD/postgres/base/10.7/lib',--enable-new-dtags -Wl,--build-id  -lpgcommon -lpgport -lz -lreadline -lrt -lcrypt -ldl -lm -o pg_rman

make install日志最后输出:
/bin/mkdir -p '/data/PRD/postgres/base/10.7/bin'
/bin/install -c  pg_rman '/data/PRD/postgres/base/10.7/bin'

2. 初始化文件目录

pg_rman init -B /data/PRD/pgbackup

日志输出:
INFO: ARCLOG_PATH is set to '/software/PRD/postgres/10.7/pg5432/pg_archlog'
INFO: SRVLOG_PATH is set to '/data/PRD/postgres/10.7/pg5432/data/pg_log'

会在backup路径下生成一些文件,这些文件组成了pg的backup catalog:
$ ll
total 8
drwx------. 4 postgres dba  35 Aug  3 16:08 backup
-rw-r--r--. 1 postgres dba 121 Aug  3 16:08 pg_rman.ini
-rw-r--r--. 1 postgres dba  40 Aug  3 16:08 system_identifier
drwx------. 2 postgres dba   6 Aug  3 16:08 timeline_history

配置文件内容如下,可以把将来要用的配置写在这个配置文件中或者直接写在命令行里

cat pg_rman.ini

ARCLOG_PATH='/software/PRD/postgres/10.7/pg5432/pg_archlog'
SRVLOG_PATH='/data/PRD/postgres/10.7/pg5432/data/pg_log'

system_identifier与控制文件中存储的system id一致,用于区分备份的数据库是不是一个数据库,防止被冲。

$ cat system_identifier   
SYSTEM_IDENTIFIER='6318621837015461309'  

 

3. 将备份路径添加到postgres用户环境变量

vi .bash_profile
#添加:
BACKUP_PATH=/data/PRD/postgres/pgbackup/pgbk5432
export BACKUP_PATH

#执行
source .bash_profile

4. 常用备份命令

全量备份命令

pg_rman backup -b full -s -C -Z --keep-data-days=10 --keep-arclog-files=15 --keep-arclog-days=10 --keep-srvlog-files=10 --keep-srvlog-days=15 -h 127.0.0.1 -p 5432 -U postgres -d postgres

输出:
INFO: copying database files
INFO: copying archived WAL files
INFO: copying server log files
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
INFO: start deleting old archived WAL files from ARCLOG_PATH (keep files = 15, keep days = 10)
INFO: the threshold timestamp calculated by keep days is "2019-10-07 00:00:00"
INFO: start deleting old server files from SRVLOG_PATH (keep files = 10, keep days = 15)
INFO: the threshold timestamp calculated by keep days is "2019-10-02 00:00:00"
INFO: start deleting old backup (keep after = 2019-10-07 00:00:00)
INFO: does not include the backup just taken
INFO: backup "2019-10-17 15:00:01" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".
INFO: backup "2019-10-17 14:59:44" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".

备份校验

每次备份完,必须要做一次校验,否则备份集不可用来恢复,增量备份时也不会用它来做增量比较。

-bash-4.2$ pg_rman validate

输出:
INFO: validate: "2019-10-16 06:44:17" backup, archive log files and server log files by CRC
INFO: backup "2019-10-16 06:44:17" is valid

备份时pg_rman会记录每个备份文件的crc,以便validate进行校验。

例如某个备份集如下,各列含义:路径,文件类型,大小,CRC校验值,权限,时间,第四列即CRC校验值

$ less /data05/digoal/pgbbk/20160826/195809/file_database.txt  

.s.PGSQL.1921 ? 0 0 0777 2016-08-26 19:27:05  
.s.PGSQL.1921.lock f 55 590164837 0600 2016-08-26 19:27:05  
PG_VERSION f 12 3872055064 0600 2016-07-28 10:03:42  
backup_label f 167 2985542389 0600 2016-08-26 19:58:42  
backup_label.old f 155 4273989468 0600 2016-08-23 19:43:32  
base d 0 0 0700 2016-08-23 10:28:32  
base/1 d 0 0 0700 2016-08-24 16:17:02  
base/1/112 f 57 1147028285 0600 2016-07-28 10:03:42  
base/1/113 f 57 1147028285 0600 2016-07-28 10:03:42  
base/1/1247 F 8178 1875285513 0600 2016-07-29 13:51:29  
base/1/1247_fsm f 139 3668812536 0600 2016-07-28 10:03:43  

每个备份集都包含一个备份状态文件,如下

cat /data05/digoal/pgbbk/20160826/201955/backup.ini

# configuration
BACKUP_MODE=INCREMENTAL
FULL_BACKUP_ON_ERROR=false
WITH_SERVERLOG=true
COMPRESS_DATA=true
# result
TIMELINEID=1
START_LSN=46/df000108
STOP_LSN=46/df000210
START_TIME='2016-08-26 20:19:55'
END_TIME='2016-08-26 20:20:48'
RECOVERY_XID=3896508593
RECOVERY_TIME='2016-08-26 20:20:47'
TOTAL_DATA_BYTES=6196524307
READ_DATA_BYTES=3199287520
READ_ARCLOG_BYTES=33554754
READ_SRVLOG_BYTES=0
WRITE_BYTES=125955
BLOCK_SIZE=8192
XLOG_BLOCK_SIZE=8192
STATUS=OK

这个文件包含很重要的信息,比如LSN将在增量备份时用于对比数据块的LSN是否发生了变化,是否需要备份。

查看备份信息

-bash-4.2$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status 
=====================================================================
2019-10-17 15:15:25  2019-10-17 15:15:28  FULL  4570kB     1  DONE  <--- 校验前
2019-10-17 15:00:01  2019-10-17 15:00:02  ARCH    31kB     1  OK
2019-10-17 14:59:44  2019-10-17 14:59:47  FULL  6210kB     1  OK

-bash-4.2$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status 
=====================================================================
2019-10-17 15:15:25  2019-10-17 15:15:28  FULL  4570kB     1  OK  <--- 校验后
2019-10-17 15:00:01  2019-10-17 15:00:02  ARCH    31kB     1  OK
2019-10-17 14:59:44  2019-10-17 14:59:47  FULL  6210kB     1  OK

查看详细备份信息

-bash-4.2$ pg_rman show detail
======================================================================================================================
 StartTime           EndTime              Mode    Data  ArcLog  SrvLog   Total  Compressed  CurTLI  ParentTLI  Status 
======================================================================================================================
2019-10-17 15:15:25  2019-10-17 15:15:28  FULL    39MB    33MB   8936B  4570kB        true       1          0  OK
2019-10-17 15:00:01  2019-10-17 15:00:02  ARCH    ----    16MB   4612B    31kB        true       1          0  OK
2019-10-17 14:59:44  2019-10-17 14:59:47  FULL    39MB    33MB   3332B  6210kB        true       1          0  OK

增备命令

pg_rman backup  -b incremental -s -C -Z --keep-data-days=10  --keep-arclog-files=15 --keep-arclog-days=10 --keep-srvlog-files=10 --keep-srvlog-days=15 -h 127.0.0.1 -p 5432 -U postgres -d postgres

输出:
INFO: copying database files
INFO: copying archived WAL files
INFO: copying server log files
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
INFO: start deleting old archived WAL files from ARCLOG_PATH (keep files = 15, keep days = 10)
INFO: the threshold timestamp calculated by keep days is "2019-10-07 00:00:00"
INFO: start deleting old server files from SRVLOG_PATH (keep files = 10, keep days = 15)
INFO: the threshold timestamp calculated by keep days is "2019-10-02 00:00:00"
INFO: start deleting old backup (keep after = 2019-10-07 00:00:00)
INFO: does not include the backup just taken
INFO: backup "2019-10-17 15:15:25" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".
INFO: backup "2019-10-17 15:00:01" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".
INFO: backup "2019-10-17 14:59:44" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".

同样要做一次备份校验

-bash-4.2$ pg_rman validate

输出:
INFO: validate: "2019-10-17 15:20:50" backup, archive log files and server log files by CRC
INFO: backup "2019-10-17 15:20:50" is valid

仅归档备份命令

pg_rman backup -b archive -s -C -Z --keep-data-days=10 --keep-arclog-files=15 --keep-arclog-days=10 --keep-srvlog-files=10 --keep-srvlog-days=15 -h 127.0.0.1 -p 5432 -U postgres -d postgres

#可以看到几个命令就只有pg_rman backup -b 后的备份模式不同

备份校验

pg_rman validate

另外也可以利用pg_rman自己制作定期备份脚本

 

按指定时间从catalog删除备份集

例如只需要备份集能恢复到2016-08-26 19:59:00,在这个时间点以前,不需要用来恢复到这个时间点的备份全删掉。

$ /home/digoal/pgsql9.5/bin/pg_rman delete "2016-08-26 19:59:00" -B /data05/digoal/pgbbk  
WARNING: cannot delete backup with start time "2016-08-26 19:58:09"  
DETAIL: This is the latest full backup necessary for successful recovery.  
INFO: delete the backup with start time: "2016-08-26 19:56:54"  
INFO: delete the backup with start time: "2016-08-26 19:43:20"  
INFO: delete the backup with start time: "2016-08-26 19:39:50"  
INFO: delete the backup with start time: "2016-08-26 19:39:32"  

保留的备份集合可以将数据库恢复到2016-08-26 19:59:00

$ /home/digoal/pgsql9.5/bin/pg_rman show -B /data05/digoal/pgbbk  
==========================================================  
 StartTime           Mode  Duration    Size   TLI  Status   
==========================================================  
2016-08-26 20:19:55  INCR        0m   125kB     1  OK  
2016-08-26 19:58:09  FULL       11m  3094MB     1  OK  

物理删除已从catalog删除的备份集

$ /home/digoal/pgsql9.5/bin/pg_rman purge -B /data05/digoal/pgbbk  
INFO: DELETED backup "2016-08-26 19:56:54" is purged  
INFO: DELETED backup "2016-08-26 19:43:20" is purged  
INFO: DELETED backup "2016-08-26 19:39:50" is purged  
INFO: DELETED backup "2016-08-26 19:39:32" is purged  

如何利用备份进行完整恢复和基于时间点的恢复,参考下篇

 

5. pg_rman 软件限制

pg_rman的使用限制

  • Requires to read database cluster directory and write backup catalog directory.

    For example, you need to mount the disk where backup catalog is placed with NFS from database server.

    实际上不是必须的,如果没有指定原来的$PGDATA,则使用备份集的元数据。

  • Block sizes of pg_rman and server should be matched. BLCKSZ and XLOG_BLCKSZ also should be matched.

    编译pg_rman时,最好使用启动数据集的集群软件的pg_config。 确保块大小一致。

    因为需要做块的校验。 读取LSN等,都与块大小有关。

  • If there are some unreadable files/directories in database cluster directory, WAL directory or archived WAL directory, the backup or restore would be failed.

  • When taking an incremental backup, pg_rman check the timeline ID of the target database whether it is the same with the one of the full backup in backup list.

    But, pg_rman does not check whether the data itself is same with the full backup in backup list.

    So, you can take an incremental backup over the full backup against the database which has the same timeline ID but has different data.

从standby备份时的软件限制

Getting backup from standby-site, pg_rman has the follow restrictions too.

  • The environment of replication should be built right, or the backup will not finish.

  • You can’t get backups on master and standby at the same time.

    因为pg_rman使用的是exclusive bakcup,(pg_start_backup),所以同一时间,只能跑一次pg_start_backup。

    pg_basebackup则使用的是shared backup,可以跑多个。

  • You can’t get backups on multi standbys at the same time too.

    道理同上

  • Basically, the backup from standby-site is used for restoring on MASTER.

    pg_rman doesn’t treat the backup as restoring on standby automatically.

  • If you want to restore the backup on STANDBY, you have to manage archive logs with your self.

    因为备库不归档,所以从standby备份时,需要解决归档备份的问题。

    我在前面的文档中已经提及,包括解决思路。

使用快照备份限制

When using storage snapshot, pg_rman has the following restrictions too.

  • If your snapshot does not have any file update time, incremental backup is same with full backup.

  • Because pg_rman judges performing full backup or incremental backup by update time for files.

    If files don’t have update time because of storage snapshot specification, pg_rman performs full backup every time.

  • You can’t backup for one side works storage with split mirror snapshot.

  • Before you execute pg_rman, you should perform storage “RESYNC”.

  • After pg_rman performs backup with split mirror snapshot, storeage will be “SPLITTED”(works on one side).
    pg_rman perform SPLIT command for getting snapshot, but doesn’t perform RESYNC command.
  • You cant’t get snapshot from different vendor storages in a time.

  • You cant’t use some vendor storages which have different commands for getting snapshot.

  • The script and commands for getting storage snapshot should be executable.

  • It’s expected to have authority of root for getting snapshot or mounting volumes.

    So a user, performs pg_rman, is granted to execute any commands in the script.

  • If you use LVM(Logical Volume Manager), it’s needed root authority for mount, umount, lvcreate, lvremove, lvscan commands.
    You should granted to these commands with sudo command to non-password executable.

参考

http://mysql.taobao.org/monthly/2016/09/05/

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Hehuyi_In

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值