postgres+pgpool实现在线恢复,实际上说白了就是手动恢复变成自动恢复。通过调用命令和脚本,pgpool控制多个node实现备份,恢复,重启,加载。
background
postgres 8.4
pgpool-II 3.4.2
复制模式+load balance
基于PIRT 实现在线恢复
安装好c语言函数
postgres实例都以archmod=on 模式运行
两台机器要实现postgres 用户ssh 无密码互访问
配置好pcp
配置好脚本,命令(脚本和命令参见pgpool手册,不过要小心调试)
from 192.168.0.10
to 192.168.0.11
1,准备,查看状态
#su - postgres
-bash-4.1$ psql -p 9999 -h
192.168.0.10
Password:
psql (8.4.20)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role
---------+----------------+------+--------+-----------+--------
0 | 192.168.0.10 | 5433 | 2 | 0.500000 | master
1 | 192.168.0.11 | 5433 | 2 | 0.500000 | slave
(2 rows)
Password:
psql (8.4.20)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role
---------+----------------+------+--------+-----------+--------
0 | 192.168.0.10 | 5433 | 2 | 0.500000 | master
1 | 192.168.0.11 | 5433 | 2 | 0.500000 | slave
(2 rows)
2,打开日志
[root@iZ2864nri9hZ ~]# tail -f /var/log/pgpool/pgpool.log
[root@iZ2864nri9hZ ~]# cd /data/
[root@iZ2864nri9hZ data]# tail -f archive/recovery.log
[root@iZ2864nri9hZ data]# tail -f archive/recovery.log
3,停掉192.168.0.10 的postgres,并删除实例
[root@iZ28bh1trh9Z data]# ./postgrestop.sh
[root@iZ28bh1trh9Z data]# ps -ef |grep postgres
root 26503 26432 0 09:23 pts/1 00:00:00 grep postgres
[root@iZ28bh1trh9Z data]# ps -ef |grep postgres
root 26503 26432 0 09:23 pts/1 00:00:00 grep postgres
postgres=# show pool_nodes;
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role
---------+----------------+------+--------+-----------+--------
0 | 192.168.0.10 | 5433 | 2 | 0.500000 | master
1 | 192.168.0.11 | 5433 | 3 | 0.500000 | slave
(2 rows)
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role
---------+----------------+------+--------+-----------+--------
0 | 192.168.0.10 | 5433 | 2 | 0.500000 | master
1 | 192.168.0.11 | 5433 | 3 | 0.500000 | slave
(2 rows)
这里看到 pgpool连接会断掉,并且 detatch 1 node 为 status=3
同时,如果有web程序连接pgpool,连接也会断,不过web程序会自动重连,重连需要几秒钟,应该是可以配置的??
删除 pgdata_backup, pgdata.tar.gz
备份pgdata
root@iZ28bh1trh9Z data]# rm -rf pgdata_backup
[root@iZ28bh1trh9Z data]# mv pgdata pgdata_backup
[root@iZ28bh1trh9Z data]# rm pgdata.tar.gz
[root@iZ28bh1trh9Z data]# mv pgdata pgdata_backup
[root@iZ28bh1trh9Z data]# rm pgdata.tar.gz
4,开始 online recovery
退出 pgpool的登录
postgres=# \q
运行命令
[root@iZ2864nri9hZ data]# pcp_recovery_node -d 50 localhost 9898 postgres postgres 1
DEBUG: send: tos="R", len=46
DEBUG: recv: tos="r", len=21, data=AuthenticationOK
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="c", len=20, data=CommandComplete
DEBUG: send: tos="X", len=4
DEBUG: send: tos="R", len=46
DEBUG: recv: tos="r", len=21, data=AuthenticationOK
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="c", len=20, data=CommandComplete
DEBUG: send: tos="X", len=4
运行过程中要查看
/var/log/pgpool/pgpool.log
/data/archive/recovery.log
5,结束检查
[root@iZ28bh1trh9Z data]# pwd
/alidata/data
[root@iZ28bh1trh9Z data]# ls
checkpgpool.sh pgbackup.sh pgdata pgdata_backup pgdata.tar.gz pgpoolstart.sh pgstartup.log postgrestart.sh postgrestop.sh
/alidata/data
[root@iZ28bh1trh9Z data]# ls
checkpgpool.sh pgbackup.sh pgdata pgdata_backup pgdata.tar.gz pgpoolstart.sh pgstartup.log postgrestart.sh postgrestop.sh
[root@iZ28bh1trh9Z data]# ps -ef |grep postgres
root 26550 1020 0 09:38 ? 00:00:00 sshd: postgres [priv]
postgres 26552 26550 0 09:38 ? 00:00:00 sshd: postgres@notty
postgres 26556 1 0 09:38 ? 00:00:00 /usr/bin/postgres -D /data/pgdata
postgres 26561 26556 0 09:38 ? 00:00:00 postgres: logger process
postgres 26577 26556 0 09:38 ? 00:00:00 postgres: writer process
postgres 26578 26556 0 09:38 ? 00:00:00 postgres: wal writer process
postgres 26579 26556 0 09:38 ? 00:00:00 postgres: autovacuum launcher process
postgres 26580 26556 0 09:38 ? 00:00:00 postgres: archiver process last was 00000002.history
postgres 26581 26556 0 09:38 ? 00:00:00 postgres: stats collector process
postgres 26594 26556 0 09:40 ? 00:00:00 postgres: postgres postgres 192.168.0.10 (35555) idle
root 26609 26432 0 09:48 pts/1 00:00:00 grep postgres
root 26550 1020 0 09:38 ? 00:00:00 sshd: postgres [priv]
postgres 26552 26550 0 09:38 ? 00:00:00 sshd: postgres@notty
postgres 26556 1 0 09:38 ? 00:00:00 /usr/bin/postgres -D /data/pgdata
postgres 26561 26556 0 09:38 ? 00:00:00 postgres: logger process
postgres 26577 26556 0 09:38 ? 00:00:00 postgres: writer process
postgres 26578 26556 0 09:38 ? 00:00:00 postgres: wal writer process
postgres 26579 26556 0 09:38 ? 00:00:00 postgres: autovacuum launcher process
postgres 26580 26556 0 09:38 ? 00:00:00 postgres: archiver process last was 00000002.history
postgres 26581 26556 0 09:38 ? 00:00:00 postgres: stats collector process
postgres 26594 26556 0 09:40 ? 00:00:00 postgres: postgres postgres 192.168.0.10 (35555) idle
root 26609 26432 0 09:48 pts/1 00:00:00 grep postgres
-bash-4.1$ psql -p 9999 -h 192.168.0.10
Password:
psql (8.4.20)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role
---------+----------------+------+--------+-----------+--------
0 | 192.168.0.10 | 5433 | 2 | 0.500000 | master
1 | 192.168.0.11 | 5433 | 2 | 0.500000 | slave
(2 rows)
可以看到 两个node都已经启动
6,清理现场