PostgreSQL12通过pg_resetwal进行误删恢复

面子拿钱砸

已于 2023-09-08 16:24:42 修改

阅读量1.2w

点赞数 4

分类专栏： postgresql 文章标签： postgresql 数据库 database

于 2022-02-10 11:38:22 首次发布

本文链接：https://blog.csdn.net/weixin_42583514/article/details/122857340

版权

postgresql 专栏收录该内容

7 篇文章

订阅专栏

本文详细介绍了PostgreSQL的pg_resetwal工具，用于在数据损坏时恢复。它能重置WAL日志位置，但不适用于生产环境的数据恢复。在测试环境中，通过pg_resetwal可以恢复delete误操作，但无法恢复truncate操作，因为truncate会改变表的物理文件。在恢复过程中，需要确定误操作的事务号，然后使用pg_resetwal回滚到指定事务，最后重启数据库验证恢复效果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

1、pg_resetwal

1、pg_resetwal

pg_resetwal 主要的工作的目的是为了在系统数据损坏的时候，通过pg_resetwal来重新标志日志的位置，让系统可以重启运行并且重置pg_control文件等

注意点：

1、这个是整个库进行了回滚，回滚到指定的事务LSN 号的位置，所以生存环境不会使用这个方法进行恢复。

2、pg_resetwal恢复之后，WAL日志整体被清理，原有的日志（单机）会被归并，原有的日志都没有了。

3、truncate无法进行恢复，因为truncate之后表对应的物理文件发生改变，回滚之后会报错ERROR: could not open file "base/13593/24761": No such file or directory ，且这样回滚之后，就只能通过备份恢复数据库了。

1.1、pg_resetwal 语法解析

[paas-sotc-telepgtest-001][telepg][/app/telepg/pg_data]$pg_resetwal --help
pg_resetwal resets the PostgreSQL write-ahead log.

Usage:
  pg_resetwal [OPTION]... DATADIR

Options:
  -c, --commit-timestamp-ids=XID,XID
                                 set oldest and newest transactions bearing
                                 commit timestamp (zero means no change)
 [-D, --pgdata=]DATADIR          data directory
  -e, --epoch=XIDEPOCH           set next transaction ID epoch
  -f, --force                    force update to be done
  -l, --next-wal-file=WALFILE    set minimum starting location for new WAL
  -m, --multixact-ids=MXID,MXID  set next and oldest multitransaction ID
  -n, --dry-run                  no update, just show what would be done
  -o, --next-oid=OID             set next OID
  -O, --multixact-offset=OFFSET  set next multitransaction offset
  -V, --version                  output version information, then exit
  -x, --next-transaction-id=XID  set next transaction ID
      --wal-segsize=SIZE         size of WAL segments, in megabytes
  -?, --help                     show this help, then exit

Report bugs to <pgsql-bugs@lists.postgresql.org>.

1.2、pg_resetwal 恢复数据

1.2.1、创建需要的表和数据

postgres=# insert into test1 values(1,'asd');
INSERT 0 1
postgres=# insert into test1 values(2,'qwe');
INSERT 0 1
postgres=# insert into test1 values(3,'zxc');
INSERT 0 1
postgres=# select * from test1;              
 id | name 
----+------
 10 | aaaa
  1 | asd
  2 | qwe
  3 | zxc
(4 rows)

1.2.2、delete误删和恢复

delete数据

postgres=# delete from test1 where id>1;
DELETE 3
postgres=# select * from test1;         
 id | name 
----+------
  1 | asd
(1 row)

1.2.2.1、恢复数据

1.2.2.1.1、先确认当前的日志的以及LSN号

postgres=# select pg_current_wal_lsn(),pg_walfile_name(pg_current_wal_lsn()),pg_walfile_name_offset(pg_current_wal_lsn());
 pg_current_wal_lsn |     pg_walfile_name      |       pg_walfile_name_offset       
--------------------+--------------------------+------------------------------------
 20/2615C090        | 000000020000002000000026 | (000000020000002000000026,1425552)
(1 row)

1.2.2.1.2、查找误操作的事务号

这种方式仅限测试环境使用，生产环境通过这个方法比较难找到对应的事务ID；况且生产环境也不会使用pg_resetwal进行恢复误删数据

首先可以根据时间来判断大致是哪个WAL日志文件，再进行逐个日志分析查找

[pgtest-001]$pg_waldump 000000020000002000000026 >wal.log
pg_waldump: fatal: error in WAL record at 20/2615C140: invalid record length at 20/2615C178: wanted 24, got 0

有报错，先切换wal日志再进行waldump

postgres=# select pg_switch_wal();
 pg_switch_wal 
---------------
 20/2615C190
(1 row)

解析wal日志

pg_waldump 000000020000002000000026 >wal.log

查找20/2615C090之前，相关的delete的情况

vi wal.log
/20\/2615C090

rmgr: Heap        len (rec/tot):     54/    54, tx:        863, lsn: 20/2615BF68, prev 20/2615BF30, desc: DELETE off 2 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/13593/24761 blk 0
rmgr: Heap        len (rec/tot):     54/    54, tx:        863, lsn: 20/2615BFA0, prev 20/2615BF68, desc: DELETE off 4 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/13593/24761 blk 0
rmgr: Heap        len (rec/tot):     54/    54, tx:        863, lsn: 20/2615BFD8, prev 20/2615BFA0, desc: DELETE off 5 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/13593/24761 blk 0
rmgr: Transaction len (rec/tot):     46/    46, tx:        863, lsn: 20/2615C028, prev 20/2615BFD8, desc: COMMIT 2022-02-10 09:38:19.038597 CST
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 20/2615C058, prev 20/2615C028, desc: RUNNING_XACTS nextXid 864 latestCompletedXid 863 oldestRunningXid 864
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn: 20/2615C090, prev 20/2615C058, desc: RUNNING_XACTS nextXid 864 latestCompletedXid 863 oldestRunningXid 864
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 20/2615C0C8, prev 20/2615C090, desc: CHECKPOINT_ONLINE redo 20/2615C090; tli 2; prev tli 2; fpw true; xid 0:864; oid 32956; multi 1; offset 0; oldest xid 479 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 864; online

找到了事务ID863为delete的事务
在这里插入图片描述

1.2.2.1.3、确认事务号之后，进行恢复

1、先关闭数据库

[pgtest-001]$pg_ctl stop -D $PG_DATA
waiting for server to shut down.... done
server stopped

目前的wal日志情况
在这里插入图片描述

2、恢复到指定事务号

[pgtest-001]$pg_resetwal -x 863 -D $PG_DATA
Write-ahead log reset

此时的wal日志情况，发现历史的wal都被清理了

3、启动数据库，并校验是否已经恢复

启动数据库

[pgtest-001]$pg_ctl start -D $PG_DATA  
waiting for server to start....2022-02-10 10:13:43.903 CST [346771] LOG:  starting PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2022-02-10 10:13:43.903 CST [346771] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-02-10 10:13:43.904 CST [346771] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-02-10 10:13:43.916 CST [346771] LOG:  redirecting log output to logging collector process
2022-02-10 10:13:43.916 CST [346771] HINT:  Future log output will appear in directory "log".
 done
server started

查看误删数据已经恢复

postgres=# select * from test1;
 id | name 
----+------
 10 | aaaa
  1 | asd
  2 | qwe
  3 | zxc
(4 rows)

1.2.3、truncate 恢复

实验结果：

truncate之后，物理文件被删除了，即使回滚了事务也无法恢复

postgres=# select * from test1;
ERROR:  could not open file "base/13593/24761": No such file or directory

1、先truncate

postgres=# truncate table test1;
TRUNCATE TABLE

#先切换一个wal日志
postgres=# select pg_switch_wal();
 pg_switch_wal 
---------------
 20/2615C190
(1 row)

2、查找truncate的事务号

pg_waldump 000000020000002000000050 >wal.log

在这里插入图片描述

3、恢复数据库

[pgtest-001]$pg_ctl stop -D $PG_DATA
waiting for server to shut down.... done
server stopped

[pgtest-001]$pg_resetwal -x 863 -D $PG_DATA
Write-ahead log reset

[pgtest-001]$pg_ctl start -D $PG_DATA
waiting for server to start....2022-02-10 10:25:13.703 CST [347853] LOG:  starting PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2022-02-10 10:25:13.703 CST [347853] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2022-02-10 10:25:13.705 CST [347853] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-02-10 10:25:13.717 CST [347853] LOG:  redirecting log output to logging collector process
2022-02-10 10:25:13.717 CST [347853] HINT:  Future log output will appear in directory "log".
 done
server started

4、核对数据是否恢复

发现truncate之后，物理文件被删除了，即使回滚了事务也无法恢复

postgres=# select * from test1;
ERROR:  could not open file "base/13593/24761": No such file or directory

这个报错了，先drop表，再就能正常使用表了

postgres=# drop table test1;
ERROR:  tuple concurrently updated
postgres=# select * from test1;
 id | name 
----+------
(0 rows)