pg 备份恢复（二）—— 利用pg_rman进行完整恢复及基于时间点的恢复

最新推荐文章于 2024-05-16 05:46:54 发布

Hehuyi_In

最新推荐文章于 2024-05-16 05:46:54 发布

阅读量2.9k

点赞数 1

分类专栏：备份还原 PostgreSQL

本文链接：https://blog.csdn.net/Hehuyi_In/article/details/102644517

版权

PostgreSQL 同时被 2 个专栏收录

140 篇文章 113 订阅

订阅专栏

备份还原

31 篇文章 4 订阅

订阅专栏

利用上一篇提到的pg_rman来做备份恢复测试，相对而言它的命令更简单，在紧急恢复时更方便。

一、完整恢复测试

1. 在users库创建一个测试表

create table test1017(a int);
insert into test1017 values(1);  
insert into test1017 values(2);
insert into test1017 values(3);
select * from test1017;
 a 
---
 1
 2
 3
(3 rows)

2. 全备users库

pg_rman backup  -b full -s -C -Z --keep-data-days=10 --keep-arclog-files=15 --keep-arclog-days=10 --keep-srvlog-files=10 --keep-srvlog-days=15 -h 127.0.0.1 -p 5432 -U postgres -d users

输出：
INFO: copying database files
INFO: copying archived WAL files
INFO: copying server log files
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
INFO: start deleting old archived WAL files from ARCLOG_PATH (keep files = 15, keep days = 10)
INFO: the threshold timestamp calculated by keep days is "2019-10-07 00:00:00"
INFO: start deleting old server files from SRVLOG_PATH (keep files = 10, keep days = 15)
INFO: the threshold timestamp calculated by keep days is "2019-10-02 00:00:00"
INFO: start deleting old backup (keep after = 2019-10-07 00:00:00)
INFO: does not include the backup just taken
INFO: backup "2019-10-17 15:00:01" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".
INFO: backup "2019-10-17 14:59:44" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".

3. 备份验证

pg_rman validate;

输出：
INFO: validate: "2019-10-16 06:44:17" backup, archive log files and server log files by CRC
INFO: backup "2019-10-16 06:44:17" is valid

4. 再插入两条数据

insert into test1017 values(444);
insert into test1017 values(555);

users=# select * from test1017;
  a  
-----
   1
   2
   3
 444
 555
(5 rows)

5. 增备users库

pg_rman backup -b incremental -s -C -Z --keep-data-days=10 --keep-arclog-files=15 --keep-arclog-days=10 --keep-srvlog-files=10 --keep-srvlog-days=15 -h 127.0.0.1 -p 5432 -U postgres -d users 

输出：
INFO: copying database files
INFO: copying archived WAL files
INFO: copying server log files
INFO: backup complete
INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.
INFO: start deleting old archived WAL files from ARCLOG_PATH (keep files = 15, keep days = 10)
INFO: the threshold timestamp calculated by keep days is "2019-10-07 00:00:00"
INFO: start deleting old server files from SRVLOG_PATH (keep files = 10, keep days = 15)
INFO: the threshold timestamp calculated by keep days is "2019-10-02 00:00:00"
INFO: start deleting old backup (keep after = 2019-10-07 00:00:00)
INFO: does not include the backup just taken
INFO: backup "2019-10-17 15:15:25" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".
INFO: backup "2019-10-17 15:00:01" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".
INFO: backup "2019-10-17 14:59:44" should be kept
DETAIL: This is taken after "2019-10-07 00:00:00".

6. 备份验证

pg_rman validate;
INFO: validate: "2019-10-17 15:20:50" backup, archive log files and server log files by CRC
INFO: backup "2019-10-17 15:20:50" is valid

7. 检查备份信息

务必确认备份状态是ok的，万一是corrupt的后面恢复不了就悲剧了

-bash-4.2$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status 
=====================================================================
2019-10-17 15:20:50  2019-10-17 15:20:52  INCR    66kB     1  OK  <-- 测试的增备
2019-10-17 15:15:25  2019-10-17 15:15:28  FULL  4570kB     1  OK  <-- 测试的全备
2019-10-17 15:00:01  2019-10-17 15:00:02  ARCH    31kB     1  OK
2019-10-17 14:59:44  2019-10-17 14:59:47  FULL  6210kB     1  OK

查看备份文件内容

-bash-4.2$ cd /data/PRD/pgbackup/
-bash-4.2$ ls
20191017  backup  pg_rman.ini  pg_rman_log  system_identifier  test.file  timeline_history
-bash-4.2$ cd 20191017
-bash-4.2$ ls
145944  150001  151525  152050
-bash-4.2$ cd 151525   <--- 对应备份开始时间2019-10-17 15:15:25 
-bash-4.2$ ls
arclog  backup.ini  database  file_arclog.txt  file_database.txt  file_srvlog.txt  mkdirs.sh  srvlog
-bash-4.2$ cd database/
-bash-4.2$ ls
backup_label      pg_dynshmem    pg_multixact  pg_stat      PG_VERSION            postmaster.opts
base              pg_hba.conf    pg_notify     pg_stat_tmp  pg_wal                postmaster.pid
current_logfiles  pg_ident.conf  pg_replslot   pg_subtrans  pg_xact               repmgr
global            pg_log         pg_serial     pg_tblspc    postgresql.auto.conf
pg_commit_ts      pg_logical     pg_snapshots  pg_twophase  postgresql.conf

从输出信息及备份文件可以看出，这种备份方式也是copy文件

8. 完整恢复测试

模拟误删除os文件导致整个数据目录丢失，此时希望恢复到最近时间点（前提是增备之后的都归档没丢），故障前test1017有5条数据，恢复后也应该5条

关闭pg服务

pg_ctl stop -m fast

挪走$PGDATA目录模拟删除（不要真的删，万一真恢复不回来就要重装了...）

-bash-4.2$ pwd
/data/PRD/postgres/10.7/pg5432
-bash-4.2$ mv data data.bak
-bash-4.2$ ls
data.bak  pg_archlog  pgbackup

再起pg，肯定挂了

-bash-4.2$ postgres -D $PGDATA &
[1] 5259
-bash-4.2$ postgres: could not access directory "/data/PRD/postgres/10.7/pg5432/data": No such file or directory
Run initdb or pg_basebackup to initialize a PostgreSQL data directory.

[1]+  Exit 2                  postgres -D $PGDATA

开始恢复数据

mkdir data                 # postgres用户创建数据目录并授权
chmod 0700 data
cp -R pg_archlog pg_archlog_bak #备份下归档目录

使用pg_rman restore还原

注意pg_rman会自动生成recovery.conf文件，不需手动创建

-bash-4.2$ pg_rman restore        

输出：
WARNING: pg_controldata file "/data/PRD/postgres/10.7/pg5432/data/global/pg_control" does not exist
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of latest full backup as recovery target: 1
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2019-10-17 15:15:25"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2019-10-17 15:15:25" backup, archive log files and server log files by SIZE
INFO: backup "2019-10-17 15:15:25" is valid
INFO: restoring database files from the full mode backup "2019-10-17 15:15:25"
INFO: searching incremental backup to be restored
INFO: validate: "2019-10-17 15:20:50" backup, archive log files and server log files by SIZE
INFO: backup "2019-10-17 15:20:50" is valid
INFO: restoring database files from the incremental mode backup "2019-10-17 15:20:50"
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2019-10-17 15:20:50" is valid
INFO: restoring WAL files from backup "2019-10-17 15:20:50"
INFO: restoring online WAL files and server log files
INFO: generating recovery.conf
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started.

看到是还原到了2019-10-17 15:20:50，当pg服务启动时会自动进行Recovery

看一下pg_rman自动生成的recovery.conf文件

-bash-4.2$ cat recovery.done
# recovery.conf generated by pg_rman 1.3.8
restore_command = 'cp /data/PRD/postgres/10.7/pg5432/pg_archlog/%f %p'
recovery_target_timeline = '1'

可以看到数据文件已经回来了

-bash-4.2$ cd data
-bash-4.2$ ls
backup_label      pg_dynshmem    pg_multixact  pg_stat      PG_VERSION            postmaster.opts
base              pg_hba.conf    pg_notify     pg_stat_tmp  pg_wal                recovery.conf
current_logfiles  pg_ident.conf  pg_replslot   pg_subtrans  pg_xact               repmgr
global            pg_log         pg_serial     pg_tblspc    postgresql.auto.conf
pg_commit_ts      pg_logical     pg_snapshots  pg_twophase  postgresql.conf

启动pg服务

-bash-4.2$ postgres -D $PGDATA &
[1] 5702
-bash-4.2$ 2019-10-17 15:32:23 CST::@:[5702]: LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-10-17 15:32:23 CST::@:[5702]: LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-10-17 15:32:23 CST::@:[5702]: LOG:  redirecting log output to logging collector process
2019-10-17 15:32:23 CST::@:[5702]: HINT:  Future log output will appear in directory "/data/PRD/postgres/10.7/pg5432/data/pg_log".

查看日志

-bash-4.2$ tail -100f postgresql-17.log 
2019-10-17 15:32:23 CST::@:[5704]: LOG:  database system was interrupted; last known up at 2019-10-17 15:20:50 CST
2019-10-17 15:32:23 CST::@:[5704]: LOG:  starting archive recovery  <-- 开始归档恢复
2019-10-17 15:32:23 CST::@:[5704]: LOG:  restored log file "000000010000000000000007" from archive
2019-10-17 15:32:24 CST::@:[5704]: LOG:  redo starts at 0/7000028
2019-10-17 15:32:24 CST::@:[5704]: LOG:  consistent recovery state reached at 0/70000F8
2019-10-17 15:32:24 CST::@:[5702]: LOG:  database system is ready to accept read only connections
2019-10-17 15:32:24 CST::@:[5704]: LOG:  restored log file "000000010000000000000008" from archive
cp: cannot stat ‘/data/PRD/postgres/10.7/pg5432/pg_archlog/000000010000000000000009’: No such file or directory
2019-10-17 15:32:24 CST::@:[5704]: LOG:  redo done at 0/8000088
2019-10-17 15:32:24 CST::@:[5704]: LOG:  last completed transaction was at log time 2019-10-17 15:20:52.167637+08
2019-10-17 15:32:24 CST::@:[5704]: LOG:  restored log file "000000010000000000000008" from archive
cp: cannot stat ‘/data/PRD/postgres/10.7/pg5432/pg_archlog/00000002.history’: No such file or directory
2019-10-17 15:32:25 CST::@:[5704]: LOG:  selected new timeline ID: 2  <-- 产生新的时间线
2019-10-17 15:32:25 CST::@:[5704]: LOG:  archive recovery complete  <-- 归档恢复完成
cp: cannot stat ‘/data/PRD/postgres/10.7/pg5432/pg_archlog/00000001.history’: No such file or directory
2019-10-17 15:32:25 CST::@:[5708]: LOG:  checkpoint starting: end-of-recovery immediate wait
2019-10-17 15:32:25 CST::@:[5708]: LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s, total=0.023 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB
2019-10-17 15:32:25 CST::@:[5702]: LOG:  database system is ready to accept connections

检查测试表，5条数据，ok

users=# select * from test1017;
  a  
-----
   1
   2
   3
 444
 555
(5 rows)

二、基于时间点的恢复——PITR

这是工作中更常用的一种恢复，一般用于误删除/更改表后恢复到误操作之前，找回数据。若要恢复到最后一次备份之后的时间点，也需要保证归档没丢。

1. 再做一次全备

./pg_rman_backup_db.sh

这里有一个巨坑

如果之前有全备和归档，又没做过恢复，这个备份可以不做；
如果之前做过恢复，那么此时DB与之前的备份已经不在同一时间线上。恢复默认只沿着基础备份建立时时间线恢复而不会切换到新的时间线，如果不做处理，恢复结果将和前面的完整恢复一模一样，恢复不出新插入的数据。

此时有两种解决方法：

一是再做一个全备（就是我们选用的方法）
二是recovery.conf文件设置recovery_target_timeline='latest'，这样便可恢复到该离当前最近的时间线

-bash-4.2$ pg_rman show
=====================================================================
 StartTime           EndTime              Mode    Size   TLI  Status 
=====================================================================
2019-10-17 16:11:36  2019-10-17 16:11:39  FULL  4616kB     5  OK   <-----  注意此时时间线是5，因为之前还测过几次别的恢复
2019-10-17 16:04:24  2019-10-17 16:04:26  ARCH   207kB     4  OK
2019-10-17 15:20:50  2019-10-17 15:20:52  INCR    66kB     1  OK
2019-10-17 15:15:25  2019-10-17 15:15:28  FULL  4570kB     1  OK
2019-10-17 15:00:01  2019-10-17 15:00:02  ARCH    31kB     1  OK
2019-10-17 14:59:44  2019-10-17 14:59:47  FULL  6210kB     1  OK

2. 再插入一条数据

users=# insert into test1017 values(1017);
INSERT 0 1
users=# select now();
              now              
-------------------------------
 2019-10-17 16:12:57.859887+08
(1 row)

users=# select * from test1017;
  a   
------
    1
    2
    3
  444
  555
 1017
(6 rows)

保险起见做个Checkpoint再切下归档

users=# checkpoint;
CHECKPOINT
users=# 
users=# select pg_switch_wal();           
 pg_switch_wal 
---------------
 0/B000308
(1 row)

3. 误操作truncate了test1017表

users=# truncate table test1017;
TRUNCATE TABLE
users=# 
users=# select now();
              now              
-------------------------------
 2019-10-17 16:17:24.705675+08
(1 row)
users=# select * from test1017;
 a 
---
(0 rows)

根据最新数据插入时间（2019-10-17 16:12:57）及truncate时间（2019-10-17 16:17:24），恢复到2019-10-17 16:15:00应该能找回被truncate前数据。注意这个时间不是很准，最好再往前推几分钟，避免恢复后还是找不到被误操作的数据。

4. 关闭pg服务

pg_ctl stop -m fast

挪走$PGDATA目录

cd /data/PRD/postgres/10.7/pg5432
mv data data.bak

5. 还原到指定时间点

-bash-4.2$ pg_rman restore --recovery-target-time "2019-10-17 16:15:00"

输出：
WARNING: pg_controldata file "/data/PRD/postgres/10.7/pg5432/data/global/pg_control" does not exist
INFO: the recovery target timeline ID is not given
INFO: use timeline ID of latest full backup as recovery target: 5
INFO: calculating timeline branches to be used to recovery target point
INFO: searching latest full backup which can be used as restore start point
INFO: found the full backup can be used as base in recovery: "2019-10-17 16:11:36"
INFO: copying online WAL files and server log files
INFO: clearing restore destination
INFO: validate: "2019-10-17 16:11:36" backup, archive log files and server log files by SIZE
INFO: backup "2019-10-17 16:11:36" is valid
INFO: restoring database files from the full mode backup "2019-10-17 16:11:36"
INFO: searching incremental backup to be restored
INFO: searching backup which contained archived WAL files to be restored
INFO: backup "2019-10-17 16:11:36" is valid
INFO: restoring WAL files from backup "2019-10-17 16:11:36"
INFO: restoring online WAL files and server log files
INFO: generating recovery.conf
INFO: restore complete
HINT: Recovery will start automatically when the PostgreSQL server is started.

6. 启动pg服务

-bash-4.2$ postgres -D $PGDATA &
[1] 11868
-bash-4.2$ 2019-10-17 16:19:53 CST::@:[11868]: LOG:  listening on IPv4 address "0.0.0.0", port 5432
2019-10-17 16:19:53 CST::@:[11868]: LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-10-17 16:19:53 CST::@:[11868]: LOG:  redirecting log output to logging collector process
2019-10-17 16:19:53 CST::@:[11868]: HINT:  Future log output will appear in directory "/data/PRD/postgres/10.7/pg5432/data/pg_log".

查看日志

-bash-4.2$ tail -100f postgresql-17.log
2019-10-17 16:19:53 CST::@:[11870]: LOG:  database system was interrupted; last known up at 2019-10-17 16:11:37 CST
2019-10-17 16:19:54 CST::@:[11870]: LOG:  restored log file "00000005.history" from archive
2019-10-17 16:19:54 CST::@:[11870]: LOG:  starting point-in-time recovery to 2019-10-17 16:15:00+08
2019-10-17 16:19:54 CST::@:[11870]: LOG:  restored log file "00000005.history" from archive
2019-10-17 16:19:54 CST::@:[11870]: LOG:  restored log file "00000005000000000000000A" from archive
2019-10-17 16:19:54 CST::@:[11870]: LOG:  redo starts at 0/A000028
2019-10-17 16:19:54 CST::@:[11870]: LOG:  consistent recovery state reached at 0/A000130
2019-10-17 16:19:54 CST::@:[11868]: LOG:  database system is ready to accept read only connections
2019-10-17 16:19:54 CST::@:[11870]: LOG:  restored log file "00000005000000000000000B" from archive
2019-10-17 16:19:54 CST::@:[11870]: LOG:  restored log file "00000005000000000000000C" from archive
2019-10-17 16:19:54 CST::@:[11870]: LOG:  recovery stopping before commit of transaction 578, time 2019-10-17 16:17:18.160569+08
2019-10-17 16:19:54 CST::@:[11870]: LOG:  recovery has paused  <--  注意
2019-10-17 16:19:54 CST::@:[11870]: HINT:  Execute pg_wal_replay_resume() to continue.  <--  注意
2019-10-17 16:20:25 CST:[local]:[unknown]@[unknown]:[11911]: LOG:  connection received: host=[local]
2019-10-17 16:20:25 CST:[local]:postgres@users:[11911]: LOG:  connection authorized: user=postgres database=users

注意日志里以下信息
2019-10-17 16:19:54 CST::@:[11870]: LOG: recovery has paused
2019-10-17 16:19:54 CST::@:[11870]: HINT: Execute pg_wal_replay_resume() to continue.

这与recovery.conf文件的recovery_target_action参数设置有关（默认为pause），后面文章会再提到这个参数。

需要先执行pg_wal_replay_resume()，否则在数据库查询语句会被卡住

users=# select * from test1017;
^CCancel request sent
ERROR:  canceling statement due to user request

日志里会有阻塞信息

2019-10-17 16:20:25 CST:[local]:[unknown]@[unknown]:[11911]: LOG:  connection received: host=[local]
2019-10-17 16:20:25 CST:[local]:postgres@users:[11911]: LOG:  connection authorized: user=postgres database=users
2019-10-17 16:21:12 CST:[local]:postgres@users:[11911]: LOG:  process 11911 still waiting for AccessShareLock on relation 16416 of database 16385 after 1000.120 ms at character 15
2019-10-17 16:21:12 CST:[local]:postgres@users:[11911]: DETAIL:  Process holding the lock: 11870. Wait queue: 11911.
2019-10-17 16:21:12 CST:[local]:postgres@users:[11911]: STATEMENT:  select * from test1017;
2019-10-17 16:21:26 CST:[local]:[unknown]@[unknown]:[11977]: LOG:  connection received: host=[local]

7. 执行pg_wal_replay_resume()，再查询测试表

users=# select pg_wal_replay_resume();
 pg_wal_replay_resume 
----------------------
(1 row)

users=# select * from test1017;       
  a   
------
    1
    2
    3
  444
  555
 1017
(6 rows)

成功啦~

Hehuyi_In

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
0
评论
pg 备份恢复（二）—— 利用pg_rman进行完整恢复及基于时间点的恢复

利用上一篇提到的pg_rman来做备份恢复测试，相对而言它的命令更简单，在紧急恢复时更方便。一、完整恢复测试1. 在users库创建一个测试表create table test1017(a int);insert into test1017 values(1); insert into test1017 values(2);insert into test1017 valu...
复制链接

扫一扫