Postgresql 版本 10.4
系统平台: contos6.5
1.开启postgres的WAL归档
WAL归档是记录对数据库的一些操作的序列文件,postgres将最近的WAL文件保存在 pg_wal 中,所以我们还需要设置在产生WAL文件时将其持续保存到远程存储或者在本地的另外一个地方。WAL文件每个16MB,postgres自动进行分割。
1.1.在数据目录/opt/pg10.4/data 中找到posstgresql.conf配置文件(数据目录可能不同),这里需要设置参数:
-
wal_level
设置保存操作日志的具体程度级别,要设置WAL归档至少要设置为replica -
archive_mode
是否启用WAL归档 , 设置为on -
archive_command
执行本地shell命令来归档已完成的WAL文件段,
%p表示每一个产生的WAL文件位置,%f表示该文件名, -
以下命令表示把新产生的WAL文件转储到指定目录中
后面的部分表示将正常输出,错误输出重定向到日志文件中,方便查看错误情况
'cp %p /opt/pg10.4/archivedir/%f > /opt/pg10.4/archive.log 2>&1'
注意:确保/opt/pg10.4/archivedir里面是空的
-
archive_timeout
archive_timeout强制服务器定期切换到新的WAL段文件,以至于WAL文件不会太少,这是可选设置,不强求。 -
演示
[peter@localhost bin]$ cd /opt/pg
pg10.4/ pg11.1/ pg9.6.0/
[peter@localhost bin]$ cd /opt/pg10.4/
[peter@localhost pg10.4]$ ls
bin data env.sh include lib share
[peter@localhost pg10.4]$ cd data
[peter@localhost data]$ ls
base pg_ident.conf pg_serial pg_tblspc postgresql.auto.conf
global pg_logical pg_snapshots pg_twophase postgresql.conf
pg_commit_ts pg_multixact pg_stat PG_VERSION postgresql.conf~
pg_dynshmem pg_notify pg_stat_tmp pg_wal postmaster.opts
pg_hba.conf pg_replslot pg_subtrans pg_xact
[peter@localhost data]$ vim postgresql.conf
1.2.在成功配置posstgresql之后,在/opt/pg10.4/archivedir下就能看到WAL归档文件,如果看不到,不用急,因为测试数据少,配置之后数据库操作不多,等一会就能看到。
这里新建一个表,里面插入5条数据,做数据备份/恢复验证使用。
create table test(id int,name varchar);
insert into test values(1,'name1');
insert into test values(2,'name2');
insert into test values(3,'name3');
insert into test values(4,'name4');
insert into test values(5,'name5');
2.进行文件系统级别的备份
对postgres的数据库的数据目录 (我的是/opt/pg10.4/data)进行压缩备份,作为系统的base backup
2.1.在开始备份前,首先要在日志文件中做好标签
-
使用默认账号,登入psql,输入
select pg_start_backup('fullbackup');
-
演示:
[peter@localhost Desktop]$ cd /opt/pg10.4/bin/
[peter@localhost bin]$ ./runsql.sh
psql (10.4)
Type "help" for help.
postgres=# select pg_start_backup('fullbackup');
pg_start_backup
.-----------------
0/2000028
(1 row)
postgres=#
做完备份开始标志之后我们可以对数据库做一些操作,这里我们再插入一条id为6的数据,到时候验证在备份过程中插入的数据最后是否能通过wal日志还原回来。
insert into test values(6,'name6');
2.2.开始备份文件系统,在另一终端中,压缩备份postgres的数据目录
[peter@localhost pg10.4]$ tar -cvzf /opt/pg10.4/pgdatabp.tar /opt/pg10.4/data/
[peter@localhost pg10.4]$ ls
archivedir bin data include share
archive.log env.sh lib pgdatabp.tar
2.3.结束备份,在刚刚的数据库连接中输入
select pg_stop_backup();
- 演示:
postgres=# select pg_stop_backup();
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
pg_stop_backup
.----------------
0/20003C8
(1 row)
postgres=#
- 这时候我们可以看到之前的archivedir里面就有wal的日志文件了。
[peter@localhost archivedir]$ ls
000000010000000000000001 000000010000000000000002.00000028.backup
000000010000000000000002
[peter@localhost archivedir]$ ls -l
total 32772
-rw-------. 1 peter peter 16777216 Mar 20 18:38 000000010000000000000001
-rw-------. 1 peter peter 16777216 Mar 20 18:44 000000010000000000000002
-rw-------. 1 peter peter 294 Mar 20 18:44 000000010000000000000002.00000028.backup
[peter@localhost archivedir]$
2.4.到此时已经完成数据库的基本备份(连同整个数据库的配置,数据等),可以进行一些数据库操作,记录这些操作进行的时间,以便之后进行恢复系统可以看到结果
这里我们插入两条数据,id为7和8。这里的两条记录插入的时间我没有记录,待会恢复的时候我可以直接恢复到最新。
insert into test values(7,'name7');
insert into test values(8,'name8');
3.恢复系统
使用之前备份过的data文件夹替换原来的,添加recovery.conf脚本,重启postgres服务
3.1.假设系统意外崩溃
[peter@localhost data]$ kill -9 $(head -1 /opt/pg10.4/data/postmaster.pid)
##### 3.2.先将旧的data目录重命名为data_bad,然后用之前备份的data目录替换.
- 演示:
[peter@localhost pg10.4]$ mv data data_bad
[peter@localhost pg10.4]$ ls
archivedir bin data_bad include share
archive.log env.sh lib pgdatabp.tar
[peter@localhost pg10.4]$ tar -xvzf padatabp.tar
[peter@localhost pg10.4]$ ls
archivedir bin data_bad include opt share
archive.log env.sh lib pgdatabp.tar
[peter@localhost pg10.4]$ mv opt/pg10.4/data/ ./
[peter@localhost pg10.4]$ ls
archivedir bin data_bad include opt share
archive.log data env.sh lib pgdatabp.tar
[peter@localhost pg10.4]$
替换之后,进到data目录,需要删除postmaster.pid文件。这时候可以启动数据库了,但是里面的数据是备份时候的数据,并不包括id是7和8的数据。
##### 3.3.将data_bad内尚未归档的WAL文件手动归档至新的data目录中的pg_wal目录
- 将data_bad中的pg_wal下所有内容放到data目录下的pg_wal。
[peter@localhost data]$ rm -rf postmaster.pid
[peter@localhost data]$ cd pg_wal
[peter@localhost pg_wal]$ cp -Rf /opt/pg10.4/data_bad/pg_wal/** ./
[peter@localhost pg_wal]$ ls
000000010000000000000002 000000010000000000000003
000000010000000000000002.00000028.backup archive_status
[peter@localhost pg_wal]$
##### 3.4.在新的data目录内创建一个recovery.conf配置文件
[peter@localhost data]$ vim recovery.conf
restore_command = ‘cp /opt/pg10.4/archivedir/%f %p > /opt/pg10.4/recovery.log 2>&1’
recovery_target_timeline = ‘latest’ //这里是恢复到最后的日志块。
如果需要恢复到确定的时间点也可以这样写:
restore_command = ‘cp /opt/pg10.4/archivedir/%f %p > /opt/pg10.4/recovery.log 2>&1’
recovery_target_time = ‘2019-03-20 18:44:00’
##### 3.5.最后启动数据库
[peter@localhost bin]$ ./start_db.sh
2019-03-20 19:16:43.677 PDT [13039] LOG: listening on IPv4 address “0.0.0.0”, port 5432
2019-03-20 19:16:43.677 PDT [13039] LOG: listening on IPv6 address “::”, port 5432
2019-03-20 19:16:43.678 PDT [13039] LOG: listening on Unix socket “/tmp/.s.PGSQL.5432”
2019-03-20 19:16:43.700 PDT [13040] LOG: database system was interrupted; last known up at 2019-03-20 18:38:35 PDT
2019-03-20 19:16:43.733 PDT [13040] LOG: starting archive recovery
2019-03-20 19:16:43.764 PDT [13040] LOG: restored log file “000000010000000000000002” from archive
2019-03-20 19:16:43.787 PDT [13040] LOG: redo starts at 0/2000028
2019-03-20 19:16:43.788 PDT [13040] LOG: consistent recovery state reached at 0/20003C8
2019-03-20 19:16:43.788 PDT [13039] LOG: database system is ready to accept read only connections
2019-03-20 19:16:43.790 PDT [13040] LOG: invalid record length at 0/3000360: wanted 24, got 0
2019-03-20 19:16:43.790 PDT [13040] LOG: redo done at 0/3000328
2019-03-20 19:16:43.790 PDT [13040] LOG: last completed transaction was at log time 2019-03-20 18:47:50.812009-07
2019-03-20 19:16:43.802 PDT [13040] LOG: selected new timeline ID: 2
2019-03-20 19:16:43.843 PDT [13040] LOG: archive recovery complete
2019-03-20 19:16:43.949 PDT [13039] LOG: database system is ready to accept connections
从日志中可以看出已经恢复完成,最后查看数据库中的数据。发现全量备份之后添加的7和8两条数据也在其中。