集群准备
其中watchdog端口9000,pcp端口9898,psql服务连接端口9999
查看集群节点状态 show pool_nodes;
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | false
(2 rows)
Pgpool服务挂了
vip目前在master端,先测试非vip端pgpool服务不可用,再测试vip端pgpool服务不可用。
非vip端pgpool不可用
停止pgpool服务
systemctl stop pgpool.service
查看节点状态,服务未中断
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | false
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | true
(2 rows)
vip端pgpool不可用
停止pgpool服务
$ systemctl stop pgpool.service
此时 非vip端的pgpool 会主动接管vip继续对外提供服务
$ ifconfig eth0:0
eth0:0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.100 netmask 255.255.255.0 broadcast 192.168.1.255
ether fa:f2:7d:42:75:00 txqueuelen 1000 (Ethernet)
查看节点状态,服务被接管后,继续对外提供服务
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | false
(2 rows)
PostgreSQL挂了
standby不可用
停止postgresql服务
$ pg_ctl stop
waiting for server to shut down.... done
server stopped
此时集群节点状态会发生变化,standby节点变为down,此时集群仍然可以继续对外提供服务。
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | down | 0.500000 | standby | 0 | false
(2 rows)
重新启动 standby 节点,模拟节点修复,重新将该节点加入集群
$ pg_ctl start
... ... ...
done
server started
$ pcp_attach_node -h 192.168.1.100 -p 9898 -U postgres -n 1
Password:
pcp_attach_node -- Command Successful
standby 重新加入集群后,状态变为up,继续对外服务。
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | false
(2 rows)
primary不可用
当前 192.168.1.2 是 primary
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | standby | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | primary | 0 | false
(2 rows)
停止192.168.1.2 上面的 postgresql服务
$ pg_ctl stop
waiting for server to shut down.... done
server stopped
此时查看节点状态,集群检查到 192.168.1.2 服务不可用后,自动切换 192.168.1.3 为 primary,继续对外提供服务,此时集群可读写。
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | down | 0.500000 | standby | 0 | false
(2 rows)
postgres=# create database kong;
CREATE DATABASE
postgres=# drop database kong;
DROP DATABASE
修复节点 192.168.1.2 的服务,然后重新加入集群作为 standby 继续对外提供服务。如果主备时间线不同步,需要用 pg_rewind 命令修复时间线,然后再启动服务。
$ mv recovery.done recovery.conf
$ pg_ctl start
$ pcp_attach_node -h 192.168.1.100 -p 9898 -U postgres -n 1
Password:
pcp_attach_node -- Command Successful
节点成功加入集群,其状态为 up
$ psql -p 9999 -h 192.168.1.100 -U postgres
Password for user postgres:
psql (10.6, server 10.9)
Type "help" for help.
postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node
---------+---------------+------+--------+-----------+---------+------------+-------------------
0 | 192.168.1.3 | 5432 | up | 0.500000 | primary | 5 | true
1 | 192.168.1.2 | 5432 | up | 0.500000 | standby | 0 | false
(2 rows)