PostgreSQL cluster role switchover between primary and standby

虽然PostgreSQL没有明确指出 primary 和 standby 节点的角色可以相互切换。(即standby promote成primary ,primary demote成standby .不需要重建集群)
但是经过实际测试,是可以实现的。
以下测试数据库版本9.0.2 . 低版本可能有一定差异。(比如有一个版本改进了standby shutdown后自动获取checkpoint的功能,最初的9.0是没有的,没有这个功能重启后需要用到一些已经APPLY的WAL,带来比较麻烦的问题。有兴趣的朋友可以参考我以前写的BLOG)
首先来看几个和XLOG相关的东西,要确保安全的SWITCHOVER,必须确保一下提及的数据在primary和standby切换前是一致的。
1. primary 和 standby 关闭后使用pg_controldata输出的checkpoint location (-m immediate关闭的话这个值不可信,因此必须确保都是-m fast 或者 smart模式关闭的,fast和smart关闭都是干净的关闭模式,起来时不需要recovery)

具体的操作步骤:
1. on primary  
pg_ctl stop -m fast $PGDATA  
确认返回成功后
查看pg_controldata输出  
Latest checkpoint location:           1B/6C000020
Latest checkpoint's REDO location:    1B/6C000020
2. on standby
pg_ctl stop -m fast $PGDATA
确认返回成功后
查看pg_controldata输出  
Latest checkpoint location:           1B/6C000020
Latest checkpoint's REDO location:    1B/6C000020
3. 确保standby和primary 的pg_controldata    checkpoint的输出结果一致。就可以切换standby 和 primary了。
4. 启动standby,并且touch一个在recovery.conf中的trigger文件,使standby promote.为primary库
5. 修改原primary库的配置文件 并新建recovery.conf,需要注意的是新增一条
recovery_target_timeline = 'latest' ,否则会报类似如下错误timeline 6 of the primary does not match recovery target timeline 5。(其他配置略,清参考我以前写的BLOG。)
6. 启动原primary , 此时切换完成,primary 变standby ,standby 变primary ,而且可以再次切换。

切换完成后查看pg_controldata的输出,TimeLineID 将变大。
TimeLineID递增后,在新的PRIMARY 库会出现类似如下进程
postgres: archiver process   last was 00000005.history
而TimeLineID可能就等于5.
与此同时,新的standby TimeLineID还是老的,因为还没有更新过来。
要更新过来的话,要么等CHECKPOINT,要么关闭主节点,然后再关闭副节点。起来之后就同步了。

下面来分析一下,一个archive和stream都打开的主副数据库集群.
1. 主节点在干净的关闭数据库时干了些啥?
对于一个STREAM复制的环境,主节点上面有一个WAL_SENDER进程,干净关闭数据库时,postmaster对这个进程的处理方法和普通的SERVER PROCESS不一样,数据库关闭时首先发信号给server process关闭,然后写shutdown checkpoint ,(有兴趣的朋友可以看一看CHECKPOINT的源码,CHECKPOINT分为好几种),然后postmaster告知wal_sender shutdown_checkpoint,并且把截止这个CHECKPOINT的XLOG发送给standby上面的wal_receiver进程。
从而主节点干净的关闭数据库时,standby节点可以接收到shutdown_checkpoint,达到一致的目的。
原文引用:
Therefore postmaster treats walsenders like the pgarch process,
and instructs them to terminate at PM_SHUTDOWN_2 phase, after all regular
backends have died and bgwriter has written the shutdown checkpoint.

2. 副节点在干净的关闭数据库时干了些啥?
standby节点干净的关闭数据库,直接关闭,目前的版本不尝试且不等待正在从archive ,pg_xlog,或wal_sender恢复完成。但是已经接收到的CHECKPOINT将被写入持久化存储,因此关闭后pg_controldata与主节点一致.


干净的关闭后的primary 和standby 库的pg_controldata输入示例 :  
primary :
postgres@digoal-172-16-3-39-> pg_controldata 
pg_control version number:            903
Catalog version number:               201008051
Database system identifier:           5556352067747738614
Database cluster state:               shut down
pg_control last modified:             Wed 11 May 2011 08:52:38 PM CST
Latest checkpoint location:           1B/70000020
Prior checkpoint location:            1B/6C000078
Latest checkpoint's REDO location:    1B/70000020
Latest checkpoint's TimeLineID:       5
Latest checkpoint's NextXID:          0/4369
Latest checkpoint's NextOID:          2074455
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        655
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Wed 11 May 2011 08:52:38 PM CST
Minimum recovery ending location:     0/0
Backup start location:                0/0
Current wal_level setting:            hot_standby
Current max_connections setting:      2000
Current max_prepared_xacts setting:   50
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 1048576
WAL block size:                       8192
Bytes per WAL segment:                67108864
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value


standby : 
postgres@digoal-172-16-3-33-> pg_controldata 
pg_control version number:            903
Catalog version number:               201008051
Database system identifier:           5556352067747738614
Database cluster state:               shut down in recovery
pg_control last modified:             Wed 11 May 2011 08:52:52 PM CST
Latest checkpoint location:           1B/70000020
Prior checkpoint location:            1B/6C000020
Latest checkpoint's REDO location:    1B/70000020
Latest checkpoint's TimeLineID:       5
Latest checkpoint's NextXID:          0/4369
Latest checkpoint's NextOID:          2074455
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        655
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Wed 11 May 2011 08:52:38 PM CST
Minimum recovery ending location:     1B/6C000020
Backup start location:                0/0
Current wal_level setting:            hot_standby
Current max_connections setting:      2000
Current max_prepared_xacts setting:   50
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 1048576
WAL block size:                       8192
Bytes per WAL segment:                67108864
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value


对于一个因异常而FAILOVER的库,因为failover时不能确保primary 和 standby 的checkpoint一致,所以,在primary数据库服务器修复后,不能直接转换为standby,而需要一个checkpoint小于standby切换为primary时的standby库的checkpoint的基础备份,并且需要足够的archive log来做apply,
因此failover后主备集群的重新建立的速度或简易程度,大部分决定于平时对standby和archive log的备份。


# 补充
# 2011-06-09 今天对一个生产系统做了切换,版本是9.0.3
# 切换后,起主节点的时候一直处于等待状态
postgres: startup process   waiting for 00000001000001200000001C
# 日志里面报错如下:
cp: cannot stat `/opt/pg_arch/00000002.history': No such file or directory
# 00000002.history这个文件来自于新的primary节点pg_xlog目录,拷贝到/opt/pg_arch/00000002.history,另外
拷贝00000001000001200000001C到/opt/pg_arch/.
# 拷贝完后重启老的primary节点.
2011-06-09 14:31:57.095 CST,,,1357,,4df068dd.54d,1,,2011-06-09 14:31:57 CST,,0,LOG,00000,"database system was shut down in recovery at 2011-06-09 14:31:53 CST",,,,,,,,,""
2011-06-09 14:31:57.099 CST,,,1357,,4df068dd.54d,2,,2011-06-09 14:31:57 CST,,0,LOG,00000,"restored log file ""00000002.history"" from archive",,,,,,,,,""
2011-06-09 14:31:57.105 CST,,,1357,,4df068dd.54d,3,,2011-06-09 14:31:57 CST,,0,LOG,00000,"restored log file ""00000002.history"" from archive",,,,,,,,,""
2011-06-09 14:31:57.105 CST,,,1357,,4df068dd.54d,4,,2011-06-09 14:31:57 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2011-06-09 14:31:57.275 CST,,,1357,,4df068dd.54d,5,,2011-06-09 14:31:57 CST,,0,LOG,00000,"restored log file ""00000001000001200000001C"" from archive",,,,,,,,,""
2011-06-09 14:31:58.120 CST,,,1357,,4df068dd.54d,6,,2011-06-09 14:31:57 CST,1/0,0,LOG,00000,"consistent recovery state reached at 120/70000078",,,,,,,,,""
2011-06-09 14:31:58.120 CST,,,1357,,4df068dd.54d,7,,2011-06-09 14:31:57 CST,1/0,0,LOG,00000,"invalid record length at 120/70000078",,,,,,,,,""
2011-06-09 14:31:58.121 CST,,,1355,,4df068dc.54b,1,,2011-06-09 14:31:56 CST,,0,LOG,00000,"database system is ready to accept read only connections",,,,,,,,,""
2011-06-09 14:31:58.126 CST,,,1365,,4df068de.555,1,,2011-06-09 14:31:58 CST,,0,LOG,00000,"streaming replication successfully connected to primary",,,,,,,,,""
2011-06-09 14:31:59.320 CST,,,1357,,4df068dd.54d,8,,2011-06-09 14:31:57 CST,1/0,0,LOG,00000,"redo starts at 120/70000078",,,,,,,,,""

成功的切换为standby节点.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值