文章目录
1、pg_resetwal
pg_resetwal 主要的工作的目的是为了在系统数据损坏的时候,通过pg_resetwal来重新标志日志的位置,让系统可以重启运行并且重置pg_control文件等
注意点:
1、这个是整个库进行了回滚,回滚到指定的事务LSN 号的位置,所以生存环境不会使用这个方法进行恢复。
2、pg_resetwal恢复之后,WAL日志整体被清理,原有的日志(单机)会被归并,原有的日志都没有了。
3、truncate无法进行恢复,因为truncate之后表对应的物理文件发生改变,回滚之后会报错ERROR: could not open file "base/13593/24761": No such file or directory
,且这样回滚之后,就只能通过备份恢复数据库了。
1.1、pg_resetwal 语法解析
[paas-sotc-telepgtest-001][telepg][/app/telepg/pg_data]$pg_resetwal --help
pg_resetwal resets the PostgreSQL write-ahead log.
Usage:
pg_resetwal [OPTION]... DATADIR
Options:
-c, --commit-timestamp-ids=XID,XID
set oldest and newest transactions bearing
commit timestamp (zero means no change)
[-D, --pgdata=]DATADIR data directory
-e, --epoch=XIDEPOCH set next transaction ID epoch
-f, --force force update to be done
-l, --next-wal-file=WALFILE set minimum starting location for new WAL
-m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID
-n, --dry-run no update, just show what would be done
-o, --next-oid=OID set next OID
-O, --multixact-offset=OFFSET set next multitransaction offset
-V, --version output version information, then exit
-x, --next-transaction-id=XID set next transaction ID
--wal-segsize=SIZE size of WAL segments, in megabytes
-?, --help show this help, then exit
Report bugs to <pgsql-bugs@lists.postgresql.org>.
1.2、pg_resetwal 恢复数据
1.2.1、创建需要的表和数据
postgres=# insert into test1 values(1,'asd');
INSERT 0 1
postgres=# insert into test1 values(2,'qwe');
INSERT 0 1
postgres=# insert into test1 values(3,'zxc');
INSERT 0 1
postgres=# select * from test1;
id | name
----+------
10 | aaaa
1 | asd
2 | qwe
3 | zxc
(4 rows)
1.2.2、delete误删和恢复
- delete数据
postgres=# delete from test1 where id>1;
DELETE 3
postgres=# select * from test1;
id | name
----+------
1 | asd
(1 row)
1.2.2.1、恢复数据
1.2.2.1.1、先确认当前的日志的以及LSN号
postgres=# select pg_current_wal_lsn(),pg_walfile_name(pg_current_wal_lsn()),pg_walfile_name_offset(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+------------------------------------
20/2615C090 | 000000020000002000000026 | (000000020000002000000026,1425552)
(1 row)
1.2.2.1.2、查找误操作的事务号
这种方式仅限测试环境使用,生产环境通过这个方法比较难找到对应的事务ID;况且生产环境也不会使用pg_resetwal进行恢复误删数据
首先可以根据时间来判断大致是哪个WAL日志文件,再进行逐个日志分析查找
[pgtest-001]$pg_waldump 000000020000002000000026 >wal.log
pg_waldump: fatal: error in WAL record at 20/2615C140: invalid record length at 20/2615C178: wanted 24, got 0
有报错,先切换wal日志再进行waldump
postgres=# select pg_switch_wal();
pg_switch_wal
---------------
20/2615C190
(1 row)
解析wal日志
pg_waldump 000000020000002000000026 >wal.log
查找20/2615C090之前,相关的delete的情况
vi wal.log
/20\/2615C090
rmgr: Heap len (rec/tot): 54/ 54, tx: 863, lsn: 20/2615BF68, prev 20/2615BF30, desc: DELETE off 2 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/13593/24761 blk 0
rmgr: Heap len (rec/tot): 54/ 54, tx: 863, lsn: 20/2615BFA0, prev 20/2615BF68, desc: DELETE off 4 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/13593/24761 blk 0
rmgr: Heap len (rec/tot): 54/ 54, tx: 863, lsn: 20/2615BFD8, prev 20/2615BFA0, desc: DELETE off 5 flags 0x00 KEYS_UPDATED , blkref #0: rel 1663/13593/24761 blk 0
rmgr: Transaction len (rec/tot): 46/ 46, tx: 863, lsn: 20/2615C028, prev 20/2615BFD8, desc: COMMIT 2022-02-10 09:38:19.038597 CST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 20/2615C058, prev 20/2615C028, desc: RUNNING_XACTS nextXid 864 latestCompletedXid 863 oldestRunningXid 864
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 20/2615C090, prev 20/2615C058, desc: RUNNING_XACTS nextXid 864 latestCompletedXid 863 oldestRunningXid 864
rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 20/2615C0C8, prev 20/2615C090, desc: CHECKPOINT_ONLINE redo 20/2615C090; tli 2; prev tli 2; fpw true; xid 0:864; oid 32956; multi 1; offset 0; oldest xid 479 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 864; online
找到了事务ID863为delete的事务
1.2.2.1.3、确认事务号之后,进行恢复
1、先关闭数据库
[pgtest-001]$pg_ctl stop -D $PG_DATA
waiting for server to shut down.... done
server stopped
目前的wal日志情况
2、恢复到指定事务号
[pgtest-001]$pg_resetwal -x 863 -D $PG_DATA
Write-ahead log reset
此时的wal日志情况,发现历史的wal都被清理了
3、启动数据库,并校验是否已经恢复
启动数据库
[pgtest-001]$pg_ctl start -D $PG_DATA
waiting for server to start....2022-02-10 10:13:43.903 CST [346771] LOG: starting PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2022-02-10 10:13:43.903 CST [346771] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-02-10 10:13:43.904 CST [346771] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-02-10 10:13:43.916 CST [346771] LOG: redirecting log output to logging collector process
2022-02-10 10:13:43.916 CST [346771] HINT: Future log output will appear in directory "log".
done
server started
查看误删数据已经恢复
postgres=# select * from test1;
id | name
----+------
10 | aaaa
1 | asd
2 | qwe
3 | zxc
(4 rows)
1.2.3、truncate 恢复
实验结果:
truncate之后,物理文件被删除了,即使回滚了事务也无法恢复
postgres=# select * from test1;
ERROR: could not open file "base/13593/24761": No such file or directory
1、先truncate
postgres=# truncate table test1;
TRUNCATE TABLE
#先切换一个wal日志
postgres=# select pg_switch_wal();
pg_switch_wal
---------------
20/2615C190
(1 row)
2、查找truncate的事务号
pg_waldump 000000020000002000000050 >wal.log
3、恢复数据库
[pgtest-001]$pg_ctl stop -D $PG_DATA
waiting for server to shut down.... done
server stopped
[pgtest-001]$pg_resetwal -x 863 -D $PG_DATA
Write-ahead log reset
[pgtest-001]$pg_ctl start -D $PG_DATA
waiting for server to start....2022-02-10 10:25:13.703 CST [347853] LOG: starting PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2022-02-10 10:25:13.703 CST [347853] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-02-10 10:25:13.705 CST [347853] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-02-10 10:25:13.717 CST [347853] LOG: redirecting log output to logging collector process
2022-02-10 10:25:13.717 CST [347853] HINT: Future log output will appear in directory "log".
done
server started
4、核对数据是否恢复
发现truncate之后,物理文件被删除了,即使回滚了事务也无法恢复
postgres=# select * from test1;
ERROR: could not open file "base/13593/24761": No such file or directory
这个报错了,先drop表,再就能正常使用表了
postgres=# drop table test1;
ERROR: tuple concurrently updated
postgres=# select * from test1;
id | name
----+------
(0 rows)