pgpool 之二 stream replication 模式 + slave down up

os: centos7.4
postgresql:9.6.8
pgpool:3.7.3

采用 streaming replication mode 模式,这是比较通用的方案。
The streaming replication mode can be used with PostgreSQL servers operating streaming replication. In this mode, PostgreSQL is responsible for synchronizing databases. This mode is widely used and most recommended way to use Pgpool-II. Load balancing is possible in the mode. The sample configuration file is $prefix/etc/pgpool.conf.sample-stream.

这篇文章主要是记录 pool 管理的 postgresql slave 实例宕掉后的恢复过程.slave 所在的操作系统宕掉后,需要首先恢复操作系统.

ip 规划

pgpool    192.168.56.100

pgsql1    192.168.56.101
pgsql2    192.168.56.102

online recovery

Streaming replication mode 模式
参考<<pgpool 之十一 参数文件 pgpool.conf>>

online recovery 的步骤如下(参考:http://www.pgpool.net/docs/37/en/html/runtime-online-recovery.html)

Online recovery is performed in two phases. The first phase is called “first stage” and the second phase is called “second stage”. You need to provide scripts for each stage. Only replication_mode requires the second stage. For other modes including streaming replication mode the second stage is not performed and you don’t need to provide a script for the stage in recovery_2nd_stage_command. i.e. you can safely leave it as an empty string.

Connections from cliens are not allowd only in the second stage while the data can be updated or retrieved during the first statge.

Pgpool-II performs the follows steps in online recovery:

  1. CHECKPOINT.

  2. Execute first stage of online recovery.

  3. Wait until all client connections have disconnected (only in replication_mode).

  4. CHECKPOINT (only in replication_mode). specified).

  5. Execute second stage of online recovery (only in replication_mode).

  6. Start up postmaster (perform pgpool_remote_start)

  7. Node attach

一定要配置 online recovery 参数,否则后面使用 pcp_recovery_node 会报如下错误

$ pcp_recovery_node -d -h 192.168.56.100 -p 9898 -U postgres -W -n 1
Password: 
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="E", len=95
ERROR:  node recovery failed, unable to connect to master node: 0 

DEBUG: send: tos="X", len=4

pgpool的使用

pgsql1(master) 上创建用户和数据库

postgres=# create user peiyb with password 'peiybpeiyb';
CREATE ROLE
postgres=# create database peiybdb owner = peiyb;
CREATE DATABASE

连接pgpool

$ pg_md5 -h
$ pg_md5 -m -p -u peiyb
$ cat pool_passwd
peiyb:md5bd0875843854575a4b7328813ea498cb

$ psql -h 192.168.56.100 -p 9999 -d peiybdb -U peiyb
Password for user peiyb:
psql (9.6.9)
Type "help" for help.

peiybdb=>
peiybdb=> show pool_version;
     pool_version     
----------------------
 3.7.3 (amefuriboshi)
(1 row)

peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 0          | false             | 0
 1       | pgsql2   | 5432 | up     | 0.500000  | standby | 0          | true              | 0
(2 rows)

关闭 pgsql2 的 slave

pgsql2的slave关闭,停1分钟,再启动

peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 8          | true              | 0
 1       | pgsql2   | 5432 | down   | 0.500000  | standby | 0          | false             | 0
(2 rows)

此时,登录pgpool能看到 pgsql2,但是 show pool_nodes 的status 依然为 down
把pgsql1、pgsql2节点的 log_connectionslog_disconnections 都设置为 on后,从日志的数据来看,pgsql2 的slave关闭再启动后,pgpool 并没有去处理。
此时需要执行 pcp_recovery_node 把备库节点的状态变为正常。

$ which pcp_recovery_node
/usr/pgpool/pgpool3.7.3/bin/pcp_recovery_node

$ pcp_recovery_node --help
pcp_recovery_node - recover a node
Usage:
pcp_recovery_node [OPTION...] [node-id]
Options:
  -U, --username=NAME    username for PCP authentication
  -h, --host=HOSTNAME    pgpool-II host
  -p, --port=PORT        PCP port number
  -w, --no-password      never prompt for password
  -W, --password         force password prompt (should happen automatically)
  -n, --node-id=NODEID   ID of a backend node
  -d, --debug            enable debug message (optional)
  -v, --verbose          output verbose messages
  -?, --help             print this help
  

在 slave 节点执行 pcp_recovery_node 修复

$ pcp_recovery_node -d -h 192.168.56.100 -p 9898 -U pgpool -W -n 1
Password: 
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="E", len=117
ERROR:  executing remote start failed with error: "ERROR:  pgpool_remote_start failed"
DEBUG: send: tos="X", len=4

分析pgsql1的日志后发现

sh: /var/lib/pgsql/9.6/data/pgpool_remote_start: No such file or directory
< 2018-05-15 14:00:29.546 CST > ERROR:  pgpool_remote_start failed
< 2018-05-15 14:00:29.546 CST > STATEMENT:  SELECT pgpool_remote_start('pgsql2', '/var/lib/pgsql/9.6/data');

说明在/var/lib/pgsql/9.6/data/下缺少 pgpool_remote_start 这个sh脚本文件
参考 pgpool 节点的模板文件

# cat /tmp/pgpool-II-3.7.3/src/sample/pgpool_remote_start

pgpool_remote_start 脚本

pgsql1、pgsql2上创建 pgpool_remote_start 脚本文件
参考<<pgpool 之七 脚本 pgpool_remote_start>>

再次执行 pcp_recovery_node

[postgres@pgpool etc]$ pcp_recovery_node -d -h 192.168.56.100 -p 9898 -U pgpool -W -n 1
Password: 
DEBUG: recv: tos="m", len=8
DEBUG: recv: tos="r", len=21
DEBUG: send: tos="D", len=6
DEBUG: recv: tos="c", len=20
pcp_recovery_node -- Command Successful
DEBUG: send: tos="X", len=4

$ psql -h 192.168.56.100 -p 9999 -d peiybdb -U peiyb
peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | pgsql2   | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

可以看到两个节点都是up状态。

pgsql slave 手动处理

如果手动处理好 pgslq slave,查询 pg_stat_replication 的 state = streaming 状态,这时就不需要执行 pcp_recovery_node,仅需要执行 pcp_attach_node

$ pcp_attach_node  -d -h 192.168.56.100 -p 9898 -U pgpool -W -n 1
Password: 
pcp_attach_node -- Command Successful

status = waiting 的处理方式

show pool_nodes 时 status 可能显示 waiting,如下:

postgres=# show pool_nodes;
 node_id | hostname | port | status  | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+---------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up      | 0.500000  | primary | 0          | true              | 0
 1       | pgsql2   | 5432 | waiting | 0.500000  | standby | 0          | false             | 0
(2 rows)

这时需要重新登录一下 pgpool 即可

$ psql -h 192.168.56.100 -p 9999 -d peiybdb -U peiyb
peiybdb=> show pool_nodes;
 node_id | hostname | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | pgsql1   | 5432 | up     | 0.500000  | primary | 0          | true              | 0
 1       | pgsql2   | 5432 | up     | 0.500000  | standby | 0          | false             | 0
(2 rows)

参考:
http://www.pgpool.net/docs/37/en/html/runtime-online-recovery.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

数据库人生

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值