当PG_WAL目录的文件被删除时的处理
环境:pg14.13
模拟删除wal日志:
pg_ctl stop -m fast
cd pg_wal
rm 00*
启动数据库:
pg_ctl -l logfile start
输出:
waiting for server to start.... stopped waiting
pg_ctl: could not start server
Examine the log output.
查看日志文件logfile:
15:45:40.518 CST [48241] LOG: starting PostgreSQL 14.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
15:45:40.519 CST [48241] LOG: listening on IPv6 address "::1", port 5432
15:45:40.519 CST [48241] LOG: listening on IPv4 address "127.0.0.1", port 5432
15:45:40.520 CST [48241] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
15:45:40.545 CST [48242] LOG: database system was shut down at 2024-10-10 15:44:21 CST
15:45:40.545 CST [48242] LOG: invalid primary checkpoint record
15:45:40.545 CST [48242] PANIC: could not locate a valid checkpoint record
15:45:40.854 CST [48241] LOG: startup process (PID 48242) was terminated by signal 6: Aborted
15:45:40.854 CST [48241] LOG: aborting startup due to startup process failure
15:45:40.868 CST [48241] LOG: database system is shut down
故障处理:
当pg_wal目录的文件被删除导致checkpoint损坏时,可以使用postgres用户身份执行pg_resetwal命令,来修复问题并重新启动数据库。
pg_resetwal -f /data/postgres/14.13/data/
随后成功启动数据库:
pg_ctl -l logfile start