kingbaseV8R6读写分离集群主备库时间线不同问题解决方法
主备时间线不同问题
方法一:提升主库时间线
描述:主库时间线小于备库时间线,主库有物理备份,在不重做备机的情况下恢复集群
时间线的确定
sys_controldata / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data
Latest checkpoint TimeLineID: 7
Latest checkpoint TimeLineID: 8
确定集群状态
repmgr cluster show
"""
1 | node111 | primary | * running |
2 | node112 | standby | ! running as primary |
WARNING: following issues were detected
- node " node112" (ID: 2) is registered as standby but running as primary
- node " node112" (ID: 2) is not attached to its upstream node " node111" (ID: 1)
"""
关闭集群
sys_monitor. sh stop
提升主库时间线
touch / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data / standby. signal
sys_ctl - D / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data start
sys_ctl promote - D / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data /
检查主备库时间线
sys_controldata / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data
备库执行执行rejoin 操作,将备库重新加入集群
repmgr - h 192.168 .174 .111 - U esrep - d esrep - p 54321 node rejoin
确定集群状态
repmgr cluster show
方法二:降低备库时间线
查看时间线
sys_controldata / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data | grep TimeLineID
Latest checkpoint 's TimeLineID: 14
Latest checkpoint' s PrevTimeLineID: 14
sys_controldata / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data | grep TimeLineID
Latest checkpoint 's TimeLineID: 13
Latest checkpoint' s PrevTimeLineID: 13
修复备库
ls sys_wal/ * . history
cat 0000000 E. history
"
执行结果:
1 0/90000A0 no recovery target specified
2 0/C0000A0 no recovery target specified
3 0/1B0000A0 no recovery target specified
4 0/1E0000A0 no recovery target specified
5 0/210000A0 no recovery target specified
6 0/220000A0 no recovery target specified
7 0/250000A0 no recovery target specified
9 0/260000A0 no recovery target specified
11 0/290000A0 no recovery target specified
12 0/2A0000A0 no recovery target specified
13 0/2A00EA30 no recovery target specified
"
"
cp /home/kingbase/kbbr_repo/archive/kingbase/12-1/* /home/kingbase/cluster/R6C5B23CLS/cls/kingbase/data/sys_wal
"
sys_ctl - m fast - D / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data stop
sys_rewind - n - D / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data /
"
执行结果:
datadir_source = /home/kingbase/cluster/R6C5B23CLS/cls/kingbase/data
sys_rewind: servers diverged at WAL location 0/2A00EA30 on timeline 13
sys_rewind: rewinding from last common checkpoint at 0/2A00E988 on timeline 13
sys_rewind: find last common checkpoint start time from 2022-01-19 15:29:33.495709 CST to 2022-01-19 15:29:33.560238 CST, in " 0.064529 " seconds.
sys_rewind: rewind start wal location 0/2A00E958 (file 0000000D000000000000002A), end wal location 0/2A01C500 (file 0000000D000000000000002A). time from 2022-01-19 15:29:33.495709 CST to 2022-01-19 15:29:37.178985 CST, in " 3.683276 " seconds.
sys_rewind: Done!
"
sys_rewind - D / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data /
"
执行结果:
datadir_source = /home/kingbase/cluster/R6C5B23CLS/cls/kingbase/data
sys_rewind: servers diverged at WAL location 0/2A00EA30 on timeline 13
sys_rewind: rewinding from last common checkpoint at 0/2A00E988 on timeline 13
sys_rewind: find last common checkpoint start time from 2022-01-19 15:32:25.914143 CST to 2022-01-19 15:32:25.946141 CST, in " 0.031998 " seconds.
sys_rewind: update the control file: minRecoveryPoint is '0/2A02AD30', minRecoveryPointTLI is '13', and database state is 'in archive recovery'
sys_rewind: rewind start wal location 0/2A00E958 (file 0000000D000000000000002A), end wal location 0/2A02AD30 (file 0000000D000000000000002A). time from 2022-01-19 15:32:25.914143 CST to 2022-01-19 15:32:29.572945 CST, in " 3.658802 " seconds.
sys_rewind: Done!
"
sys_controldata / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data | grep TimeLineID
Latest checkpoint 's TimeLineID: 13
Latest checkpoint' s PrevTimeLineID: 13
vi kingbase. auto. conf
primary_conninfo = 'user=esrep connect_timeout=10 host=192.168.174.111 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node112'
recovery_target_timeline = 'latest'
restore_command = 'cp /home/kingbase/kbbr_repo/archive/kingbase/12-1/%f %p'
mkdir / home/ kingbase/ kbbr_repo/ archive/ kingbase/ 12 - 1 / error_tl_14
mv 0000000 E* error_tl_14
touch / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data / standby. signal
启动备库
sys_ctl start - D / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data /
select client_addr, sync_state from sys_stat_replication;
client_addr | sync_state
192.168 .174 .112 | async
查看集群状态
repmgr cluster show
1 | node111 | primary | * running | |
2 | node112 | standby | running | ! node111 |
WARNING: following issues were detected
- node "node112" ( ID: 2 ) is not attached to its upstream node "node111" ( ID: 1 )
将备机加入集群
sys_ctl stop - D / home/ kingbase/ cluster/ R6C5B23CLS/ cls/ kingbase/ data
repmgr - h 192.168 .174 .111 - U esrep - d esrep - p 54321 node rejoin
查看集群状态
1 | node111 | primary | * running | |
2 | node112 | standby | running | node111 |
注:数据有风险,操作需谨慎!