南大通用GBase 8c 备份恢复技术之PITR恢复

最新推荐文章于 2024-09-08 10:40:56 发布

GBASE数据库

最新推荐文章于 2024-09-08 10:40:56 发布

阅读量1k

点赞数 8

文章标签：数据库 GBASE南大通用国产数据库 GBase

本文链接：https://blog.csdn.net/weixin_47390342/article/details/140978757

版权

原文链接：https://www.gbase.cn/community/post/4214
更多精彩内容尽在南大通用GBase技术社区，南大通用致力于成为用户最信赖的数据库产品供应商。

1 PITR 介绍

当南大通用GBase 8c因网络或主机宕机故障损坏或丢失数据时，可以通过PITR(point-in-time recovery)及时恢复。PITR是基于全量的物理备份数据文件和已归档的 wal 日志进行数据恢复，可以将数据恢复到备份归档之后的任意时间点。仅支持恢复到物理备份数据之后的某一时间点，仅主节点可以进行 PITR，备机需要全量 build 达成与主机数据同步。

PITR 恢复配置，以recovery.conf 文件中相关恢复参数为准，说明如下：

restore_command = ‘cp /mnt/server/archivedir/%f %p’ 这个 SHELL 命令是获取 WAL 文件系列中已归档的 WAL 文件。字符串中的任何一个%f 是用归档检索中的文件名替换，并且%p 是用服务器上的复制目的地的路径名替换。
archive_cleanup_command = ‘pg_archivecleanup /mnt/server/archivedir %r’ 在每次重启时会执行这个 shell 命令。 archive_cleanup_command 为清理备库不需要的归档 WAL 文件提供一个机制。任何一个%r 由包含最新可用重启点的文件名代替。这是最早的文件，因此必须保留以允许恢复能够重新启动，因此所有早于%r 的文件可以安全的移除。
recovery_target_time = ‘2020-01-01 12:00:00’ 此参数声明还原到一个指定时间戳。
recovery_target_xid = ‘3000’ 这个参数声明还原到一个事务 ID。
recovery_target_lsn = ‘0/0FFFFFF’ 这个参数声明还原到日志的指定 LSN 点。
recovery_target_inclusive = true 声明是否在指定恢复目标（true）之后停止，或在这（false）之前停止。改声明仅支持恢复目标为 recovery_target_time、recovery_target_xid 和 recovery_target_lsn 的配置。

2 实操测试环境信息

（1）环境介绍：

机器

服务类型

数据文件

172.16.5.108

数据库服务

数据目录：

/opt/database/install/data/dn/

172.16.5.109

远程备份服务器

归档路径：

/home/gbase/backup_wal/wal

备份文件路径：

/home/gbase/backup

数据库版本：GBase8cV5 S5.0.0B28

3 前期操作

（1）创建备份用户

创建user_back用户用于备份，并设置远程备份权限：

[gbase@gbase8c_5_108 ~]$ gsql -p 5432 -d postgres -r
gsql ((single_node GBase8cV5 S5.0.0B28 build 51dce1ce) compiled at 2024-06-21 17:06:17 commit 0 last mr 587 )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

postgres=# create user user_backup  sysadmin password 'gbase;123';
NOTICE:  The encrypted password contains MD5 ciphertext, which is not secure.
CREATE ROLE
postgres=#

[gbase@gbase8c_5_109 ~]$ gs_guc reload -Z datanode -N all -I all -h "host    replication     user_backup  172.16.5.109/32  sha256"
The gs_guc run with the following arguments: [gs_guc -Z datanode -N all -I all -h host    replication     user_backup  172.16.5.109/32  sha256 reload ].
Begin to perform the total nodes: 1.
Popen count is 1, Popen success count is 1, Popen failure count is 0.
Begin to perform gs_guc for datanodes.
Command count is 1, Command success count is 1, Command failure count is 0.
Total instances: 1.
ALL: Success to perform gs_guc!
[gbase@gbase8c_5_109 ~]$

（2）设置归档路径

开启归档并设置归档命令：


[gbase@gbase8c_5_108 dn]$ gs_guc reload -Z datanode -N all -I all -c "archive_mode=on"
The gs_guc run with the following arguments: [gs_guc -Z datanode -N all -I all -c archive_mode=on reload ].
NOTICE: When wal_level set to minimal, parameters archive_mode can not be used.
Begin to perform the total nodes: 1.
Popen count is 1, Popen success count is 1, Popen failure count is 0.
Begin to perform gs_guc for datanodes.
Command count is 1, Command success count is 1, Command failure count is 0.

Total instances: 1. Failed instances: 0.
ALL: Success to perform gs_guc!


[gbase@gbase8c_5_108 dn]$ gs_guc reload -Z datanode -N all -I all -c "archive_command='scp %p 172.16.5.109:/home/gbase/backup_wal/wal/%f'"
The gs_guc run with the following arguments: [gs_guc -Z datanode -N all -I all -c archive_command='scp %p 172.16.5.109:/home/gbase/backup_wal/wal/%f' reload ].
Begin to perform the total nodes: 1.
Popen count is 1, Popen success count is 1, Popen failure count is 0.
Begin to perform gs_guc for datanodes.
Command count is 1, Command success count is 1, Command failure count is 0.

Total instances: 1. Failed instances: 0.
ALL: Success to perform gs_guc!

（3）初始化备份目录

初始化备份目录，在172.16.5.109节点上执行如下操作：

[gbase@gbase8c_5_109 ~]$ gs_probackup init -B /home/gbase/backup
INFO: Backup catalog '/home/gbase/backup' successfully inited
[gbase@gbase8c_5_109 ~]$ gs_probackup add-instance -B /home/gbase/backup/ -D /opt/database/install/data/dn --instance 102_bak --remote-host=172.16.5.108
LOG: Start SSH client process, pid 32606
INFO: Instance '102_bak' successfully inited
[gbase@gbase8c_5_109 ~]$ cd backup

4 全量备份

（1）使用 gs_basebackup 进行物理全备

[gbase@gbase8c_5_109 ~]$  gs_probackup backup -B /home/gbase/backup/ --instance 102_bak -b FULL -U user_backup -d postgres -W 'gbase;123' -p 5432 --remote-host=172.16.5.108
INFO: Backup start, gs_probackup version: 2.4.2, instance: 102_bak, backup ID: SHR2BB, start_time:1722869111, backup mode: FULL, wal mode: STREAM, remote: true, compress-algorithm: none, compress-level: 1
LOG: Backup destination is initialized
LOG: This openGauss instance was initialized with data block checksums. Data block corruption will be detected
LOG: Start SSH client process, pid 19710
LOG: Database backup start
LOG: started streaming WAL at 0/3C000000 (timeline 1)
[2024-08-05 22:45:12]: check identify system success
[2024-08-05 22:45:12]: send START_REPLICATION 0/3C000000 success
[2024-08-05 22:45:12]: keepalive message is received
LOG: SSH process 19710 is terminated with status 0
INFO: PGDATA size: 651MB
INFO: Start transferring data files
LOG: Start SSH client process, pid 19738
LOG: Creating page header map "/home/gbase/backup/backups/102_bak/SHR2BB/page_header_map"
[2024-08-05 22:45:13]: keepalive message is received
[2024-08-05 22:45:18]: keepalive message is received
[2024-08-05 22:45:23]: keepalive message is received
[2024-08-05 22:45:28]: keepalive message is received
LOG: SSH process 19738 is terminated with status 0
INFO: Data files are transferred, time elapsed: 17s
INFO: wait for pg_stop_backup()
INFO: pg_stop backup() successfully executed
LOG: stop_lsn: 0/3C0001E8
LOG: Looking for LSN 0/3C0001E8 in segment: 00000001000000000000003C
LOG: Found WAL segment: /home/gbase/backup/backups/102_bak/SHR2BB/database/pg_xlog/00000001000000000000003C
LOG: Thread [0]: Opening WAL segment "/home/gbase/backup/backups/102_bak/SHR2BB/database/pg_xlog/00000001000000000000003C"
LOG: Found LSN: 0/3C0001E8
LOG: finished streaming WAL at 0/3D000000 (timeline 1)
LOG: Getting the Recovery Time from WAL
LOG: Thread [0]: Opening WAL segment "/home/gbase/backup/backups/102_bak/SHR2BB/database/pg_xlog/00000001000000000000003C"
INFO: Syncing backup files to disk
INFO: Backup files are synced, time elapsed: 0
INFO: Validating backup SHR2BB
INFO: Backup SHR2BB data files are valid
INFO: Backup SHR2BB resident size: 667MB
INFO: Backup SHR2BB completed
[gbase@gbase8c_5_109 ~]$

（2）查看备份文件


[gbase@gbase8c_5_109 ~]$ gs_probackup show -B /home/gbase/backup/ --instance 102_bak  --remote-host=172.16.5.108
===================================================================================================================================
 Instance Version  ID      Recovery Time           Mode WAL Mode  TLI  Time  Data   WAL  Zratio Start LSN   Stop LSN    Status
===================================================================================================================================
 102_bak  9.2      SHR2BB  2024-08-05 22:45:29+08  FULL STREAM    1/0   23s 651MB  16MB    1.00 0/3C000028  0/3C0001E8  OK
[gbase@gbase8c_5_109 ~]$

（3）备份后检查备份文件的有效性validate

[gbase@gbase8c_5_109 ~]$ gs_probackup validate -B /home/gbase/backup/ --instance 102_bak --remote-host=172.16.5.108
INFO: Validate backups of the instance '102_bak'
WARNING: Process 14058 which used backup SHR2BB no longer exists
INFO: Validating backup SHR2BB
INFO: Backup SHR2BB data files are valid
LOG: Thread [1]: Opening WAL segment "/home/gbase/backup/backups/102_bak/SHR2BB/database/pg_xlog/00000001000000000000003C"
INFO: Backup SHR2BB WAL segments are valid
INFO: All backups are valid
[gbase@gbase8c_5_109 ~]$

（4）登录数据库进行如下一系列操作操作：

T1时间：创建test_backup_tb表

[gbase@gbase8c_5_108 ~]$ gsql -p 5432 -d postgres -r
gsql ((single_node GBase8cV5 S5.0.0B28 build 51dce1ce) compiled at 2024-06-21 17:06:17 commit 0 last mr 587 )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

postgres=# \d+ test_addback_tb
Did not find any relation named "test_addback_tb".
postgres=#
postgres=#
postgres=# begin ;
BEGIN
postgres=#  create table test_back_tb(id int,info timestamp);  --创建test_back_tb                                                                                  CREATE TABLE
postgres=# SELECT txid_current(); -- 查看事务号
 txid_current
--------------
       161156
(1 row)

postgres=# SELECT pg_current_xlog_location(); --查看lsn
 pg_current_xlog_location
--------------------------
 0/3D002D70
(1 row)

postgres=# commit;
COMMIT
postgres=# select now();   -- 记录时间
              now
-------------------------------
 2024-08-05 22:58:33.684047+08
(1 row)

postgres=#

T2时间：向test_back_tb插入1000条数据：

postgres=# begin ;
BEGIN
postgres=# insert into test_back_tb select generate_series(1,1000),now();  --向test_back_tb插入1000条数据
INSERT 0 1000
postgres=# SELECT txid_current(); -- 查看事务号
 txid_current
--------------
       161157
(1 row)

postgres=# SELECT pg_current_xlog_location(); --查看lsn
 pg_current_xlog_location
--------------------------
 0/3D01AC18
(1 row)

postgres=# commit;
COMMIT
postgres=# select pg_switch_xlog();
 pg_switch_xlog
----------------
 0/3D01ACC0
(1 row)

postgres=# select now();   -- 记录时间
              now
-------------------------------
 2024-08-05 23:00:45.544532+08
(1 row)

postgres=#

T3时间：继续插入201条数据

postgres=# begin ;
BEGIN
postgres=#  insert into test_back_tb select generate_series(1001,1201),now();  --继续test_back_tb插入201条数据
INSERT 0 201
postgres=# SELECT txid_current(); -- 查看事务号
 txid_current
--------------
       161158
(1 row)

postgres=# SELECT pg_current_xlog_location(); --查看lsn
 pg_current_xlog_location
--------------------------
 0/3E004E00
(1 row)

postgres=# commit;
COMMIT
postgres=# select pg_switch_xlog();
 pg_switch_xlog
----------------
 0/3E004EA8
(1 row)

postgres=# select now();   -- 记录时间
              now
-------------------------------
 2024-08-05 23:02:15.192332+08
(1 row)

postgres=#

T4时间：继续向test_back_tb插入101条数据

postgres=#  begin ;
BEGIN
postgres=# insert into test_back_tb select generate_series(1201,1301),now();  --继续test_back_tb插入101条数据
INSERT 0 101
postgres=# SELECT txid_current(); -- 查看事务号
 txid_current
--------------
       161159
(1 row)

postgres=# SELECT pg_current_xlog_location(); --查看lsn
 pg_current_xlog_location
--------------------------
 0/3F002868
(1 row)

postgres=# commit;
COMMIT
postgres=# select pg_switch_xlog();
 pg_switch_xlog
----------------
 0/3F002A38
(1 row)

postgres=#  select now();   -- 记录时间
             now
------------------------------
 2024-08-05 23:04:23.28832+08
(1 row)

postgres=#

5 PITR恢复

（1）全量恢复

停止数据，并将原库内数据删除，随后使用gs_probackup restore将数据恢复到全量备份时刻。

[gbase@gbase8c_5_108 ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.


[gbase@gbase8c_5_108 data]$ cd /opt/database/install/data
[gbase@gbase8c_5_108 data]$ ls
dn
[gbase@gbase8c_5_108 data]$ rm -rf dn/
[gbase@gbase8c_5_108 data]$ ls
[gbase@gbase8c_5_108 data]$

gs_proback restore全量恢复：

[gbase@gbase8c_5_109 backup_wal]$ gs_probackup show -B /home/gbase/backup/ --instance 102_bak --remote-host=172.16.5.108
===================================================================================================================================
 Instance  Version ID      Recovery Time           Mode WAL Mode  TLI  Time  Data   WAL  Zratio Start LSN   Stop LSN    Status
===================================================================================================================================
 102_bak   9.2     SHR2BB  2024-08-05 22:45:29+08  FULL  STREAM   1/0   23s  651MB 16MB    1.00  0/3C000028 0/3C0001E8  OK
[gbase@gbase8c_5_109 backup_wal]$ gs_probackup restore -B /home/gbase/backup/ --instance 102_bak -i SHR2BB --remote-host=172.16.5.108
LOG: Start SSH client process, pid 14060
LOG: Restore begin.
LOG: there is no file tablespace_map
LOG: check tablespace directories of backup SHR2BB
LOG: check external directories of backup SHR2BB
WARNING: Process 19704 which used backup SHR2BB no longer exists
INFO: Validating backup SHR2BB
INFO: Backup SHR2BB data files are valid
LOG: Thread [1]: Opening WAL segment "/home/gbase/backup/backups/102_bak/SHR2BB/database/pg_xlog/00000001000000000000003C"
INFO: Backup SHR2BB WAL segments are valid
INFO: Backup SHR2BB is valid.
INFO: Restoring the database from backup at 2024-08-05 22:45:11+08
LOG: there is no file tablespace_map
LOG: Restore directories and symlinks...
LOG: SSH process 14060 is terminated with status 0
INFO: Start restoring backup files. PGDATA size: 667MB
LOG: Start thread 1
LOG: Start SSH client process, pid 14197
LOG: SSH process 14197 is terminated with status 0
INFO: Backup files are restored. Transfered bytes: 667MB, time elapsed: 18s
INFO: Restore incremental ratio (less is better): 100% (667MB/667MB)
INFO: Syncing restored files to disk
LOG: Start SSH client process, pid 14856
INFO: Restored backup files are synced, time elapsed: 2s
LOG: SSH process 14856 is terminated with status 0
INFO: Restore of backup SHR2BB completed.
[gbase@gbase8c_5_109 backup_wal]$

（2）PITR数据恢复

基于时间点恢复到T1时刻全量恢复后，不启动数据库，先编辑recovery.conf文件，指定恢复到T1时刻，并将recovery.conf放到数据库目录下（/opt/database/install/data/dn/）;随后启动数据库，查看数据库内数据恢复至T1时刻。

[gbase@gbase8c_5_108 ~]$ cat /opt/database/install/data/dn/recovery.conf
restore_command = 'scp 172.16.5.109:/home/gbase/backup_wal/wal/%f %p'
#archive_cleanup_command = 'pg_archivecleanup /home/gbase/backup/wal/102_bak  %r'
#recovery_target_xid = '118329'     ## 恢复到指定时间点 test_addback_time 1000条数据
recovery_target_time='2024-08-05 22:58:33.684047'
#recovery_target_lsn='0/31004CE0'
recovery_target_inclusive = true

启动数据库：

[gbase@gbase8c_5_108 ~]$ gs_om -t start
Starting cluster.
=========================================
[SUCCESS] gbase8c_5_108
2024-08-05 23:17:29.191 66b0ed09.1 [unknown] 140681269002752 [unknown] 0 dn_6001 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2024-08-05 23:17:29.191 66b0ed09.1 [unknown] 140681269002752 [unknown] 0 dn_6001 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2024-08-05 23:17:29.194 66b0ed09.1 [unknown] 140681269002752 [unknown] 0 dn_6001 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4472 Mbytes) is larger.
=========================================
Successfully started.

查看此时数据库状态：

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

postgres=#



#查看数据目录，会发现在数据目录下生成一个文件recovery.done
[gbase@gbase8c_5_108 ~]$ cat /opt/database/install/data/dn/recovery.done
restore_command = 'scp 172.16.5.109:/home/gbase/backup_wal/wal/%f %p'
#archivease/backup/wal/102_bak  %r'
#recovery_target_xid = '118329'     ## 恢复到指定时间点 test_addback_time 1000条数据
recovery_target_time='2024-08-05 22:58:33.684047'
#recovery_target_lsn='0/31004CE0'
recovery_target_inclusive = true

[gbase@gbase8c_5_108 ~]$

基于XID恢复到T2时刻

关闭数据库，重复上述步骤，编辑recovery.conf文件，恢复类型选择基于xid。

[gbase@gbase8c_5_108 ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.


[gbase@gbase8c_5_108 ~]$ cat /opt/database/install/data/dn/recovery.conf
restore_command = 'scp 172.16.5.109:/home/gbase/backup_wal/wal/%f %p'
#archive_cleanup_command = 'pg_archivecleanup /home/gbase/backup/wal/102_bak  %r'
recovery_target_xid = '161157'      ## 恢复到指定时间点 test_addback_time 1000条数据
#recovery_target_time='2024-08-05 22:58:33.684047'
#recovery_target_lsn='0/31004CE0'
recovery_target_inclusive = true

[gbase@gbase8c_5_108 ~]$

开启数据库，校验数据恢复至T2时间

[gbase@gbase8c_5_108 ~]$ gsql -p 5432 -d postgres -r
gsql ((single_node GBase8cV5 S5.0.0B28 build 51dce1ce) compiled at 2024-06-21 17:06:17 commit 0 last mr 587 )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

postgres=# select count(*) from test_back_tb ;
 count
-------
  1000
(1 row)

postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

postgres=#

基于LSN恢复到T3时刻

关闭数据库，重复上述步骤，编辑recovery.conf文件，恢复类型选择基于xid。

[gbase@gbase8c_5_108 ~]$ gs_om -t stop
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.
[gbase@gbase8c_5_108 ~]$ vim recovery.conf
[gbase@gbase8c_5_108 ~]$ vim recovery.conf
[gbase@gbase8c_5_108 ~]$ cp recovery.conf /opt/database/install/data/dn/
[gbase@gbase8c_5_108 ~]$ cat /opt/database/install/data/dn/recovery.
cat: /opt/database/install/data/dn/recovery.: No such file or directory
[gbase@gbase8c_5_108 ~]$ cat /opt/database/install/data/dn/recovery.conf
restore_command = 'scp 172.16.5.109:/home/gbase/backup_wal/wal/%f %p'
#archive_cleanup_command = 'pg_archivecleanup /home/gbase/backup/wal/102_bak  %r'
#recovery_target_xid = '161157'     ## 恢复到指定时间点 test_addback_time 1000条数据
#recovery_target_time='2024-08-05 22:58:33.684047'
recovery_target_lsn='0/3E004E00'
recovery_target_inclusive = true

[gbase@gbase8c_5_108 ~]$

启动数据库校验数据：

[gbase@gbase8c_5_108 ~]$ gs_om -t start
Starting cluster.
=========================================
[SUCCESS] gbase8c_5_108
2024-08-05 23:30:22.969 66b0f00e.1 [unknown] 140439867388416 [unknown] 0 dn_6001 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2024-08-05 23:30:22.969 66b0f00e.1 [unknown] 140439867388416 [unknown] 0 dn_6001 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2024-08-05 23:30:22.971 66b0f00e.1 [unknown] 140439867388416 [unknown] 0 dn_6001 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4472 Mbytes) is larger.
=========================================
Successfully started.
[gbase@gbase8c_5_108 ~]$ gsql -p 5432 -d postgres -r
gsql ((single_node GBase8cV5 S5.0.0B28 build 51dce1ce) compiled at 2024-06-21 17:06:17 commit 0 last mr 587 )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

postgres=# select count(*) from test_back_tb ;
 count
-------
  1201
(1 row)

（3）恢复完成之后，向外提供服务

发现数据恢复到了最新之后，数据库就不会处于恢复状态，可以进行正常的读写；如果数据库处于恢复状态，想让它向外提供服务的话，可以执行如下命令：

select pg_is_in_recovery();  #查看数据库是否处于恢复状态
select pg_xlog_replay_resume(); # 停止恢复状态

原文链接：https://www.gbase.cn/community/post/4214
更多精彩内容尽在南大通用GBase技术社区，南大通用致力于成为用户最信赖的数据库产品供应商。

GBASE数据库

关注

8
点赞
踩
23

收藏

觉得还不错? 一键收藏
0
评论
南大通用GBase 8c 备份恢复技术之PITR恢复

当南大通用GBase 8c因网络或主机宕机故障损坏或丢失数据时，可以通过PITR(point-in-time recovery)及时恢复。PITR是基于全量的物理备份数据文件和已归档的 wal 日志进行数据恢复，可以将数据恢复到备份归档之后的任意时间点。仅支持恢复到物理备份数据之后的某一时间点，仅主节点可以进行 PITR，备机需要全量 build 达成与主机数据同步。
复制链接

扫一扫