问题描述
当集群状态如下:
1 | node1 | standby | ! running as primary | | default | 100 | 5 | user=esrep dbname=esrep port=54356 host=10.10.8.43 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | primary | * running | | default | 100 | 4 | user=esrep dbname=esrep port=54356 host=10.10.8.44 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
./repmgr node rejoin -Uesrep -d esrep -h 10.10.8.44 -p 54356 --force-rewind
ERROR: this node's timeline is ahead of the rejoin target node's timelineDETAIL: this node's timeline is 5, rejoin target node's timeline is 4
原因分析
因为备机出现故障,导致时间线出现变化,状态发生变化后,时间线加1。导致备机的时间线比主机的高。按理说集群应该主机的时间线是最新的,否则会导致数据差异。
解决方案
添加参数--no-check-wal 强制拉起。命令如下:
./repmgr node rejoin -Uesrep -d esrep -h 10.10.8.44 -p 54356 --force-rewind --no-check-wal
参考资料
更多金仓数据库KingbaseES信息,详见 KingbaseES官方手册