KingbaseES V8R6 集群运维案例--备库timeline not contain minimum recovery point故障

案例现象:
KingbaseES V8R6集群备库启动后,加入集群失败,sys_log日志信息提示,如下图所示:


适用版本:kingbaseES V8R6

一、问题分析
在timeline对应的history文件中会记录每次timeline切换时所对应的lsn,如下图所示,在sys_wal目录下:

.......
-rw------- 1 kingbase kingbase 1.2K Feb 21 09:44 0000001B.history
-rw------- 1 kingbase kingbase 1.2K Feb 21 09:44 0000001C.history
-rw------- 1 kingbase kingbase 1.2K Feb 21 09:44 0000001D.history
.....

查看timelinehistory文件的信息:

如下所示,history文件最左列显示timeline_id,第二列显示timeline切换时对应的lsn。
[kingbase@node102 sys_wal]$ cat  0000001D.history
.......
24      4/4D0000A0      no recovery target specified 
25      4/51001D00      no recovery target specified
26      4/520000A0      no recovery target specified
27      4/540000A0      no recovery target specified
28      4/580000A0      no recovery target specified

二、问题解决
备库启动数据库服务时,读取控制文件,从检查点对应的lsn开始,执行recovery,一直读取到到当前数据库最大timeline所对应的wal日志,完成数据的一致性恢复,启动实例。
从备库sys_log日志信息看,timeline(29)history文件记录的lsn满足不了数据库的recovery,备库timeline(29)对应的history文件是“0000001D.history”(1D(16)=29(10)),可以将此timeline对应的history文件改名或删除,再重启备库数据库服务连接集群主库。

三、备库recovery过程

1、查看备库checkpiont信息

[kingbase@node102 bin]$ ./sys_controldata -D /data/kingbase/r6ha/data/
sys_control version number:            1201
Catalog version number:               202112261
Database system identifier:           7080367334319169673
Database cluster state:               in archive recovery
sys_control last modified:             Wed 01 Mar 2023 11:30:13 AM CST
Latest checkpoint location:           6/7F019F30
Latest checkpoint's REDO location:    6/7F019F00        #recovery起始lsn
Latest checkpoint's REDO WAL file:    00000031000000060000007F

2、查看备库启动后sys_log日志

2023-03-01 11:30:13.814 CST,,,2220,,63fec6c4.8ac,3,,2023-03-01 11:30:12 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2023-03-01 11:30:13.814 CST,,,2220,,63fec6c4.8ac,4,,2023-03-01 11:30:12 CST,,0,DEBUG,00000,"backup time 2023-03-01 11:30:11 CST in file ""backup_label""",,,,,,,,,""
2023-03-01 11:30:13.815 CST,,,2220,,63fec6c4.8ac,5,,2023-03-01 11:30:12 CST,,0,DEBUG,00000,"checkpoint record is at 6/7F019F30",,,,,,,,,""
2023-03-01 11:30:13.815 CST,,,2220,,63fec6c4.8ac,6,,2023-03-01 11:30:12 CST,,0,DEBUG,00000,"redo record is at 6/7F019F00; shutdown false",,,,,,,,,""
2023-03-01 11:30:13.818 CST,,,2220,,63fec6c4.8ac,19,,2023-03-01 11:30:12 CST,1/0,0,LOG,00000,"redo starts at 6/7F019F00",,,,,,,,,""
......
2023-03-01 11:31:49.379 CST,,,2220,,63fec6c4.8ac,10240281,,2023-03-01 11:30:12 CST,1/0,0,LOG,00000,"consistent recovery state reached at 6/AB19B9F0",,,,,,,,,""

如下图所示:

3、查看恢复完成对应的wal日志Tips:
recovery完成时的timeline是31(16进制)(timeline=49),对应的wai日志文件0000003100000006000000AB。

-rw------- 1 kingbase kingbase  16M Mar  1 11:30 0000003100000006000000A9
-rw------- 1 kingbase kingbase  16M Mar  1 11:30 0000003100000006000000AA
-rw------- 1 kingbase kingbase  16M Mar  1 11:36 0000003100000006000000AB
-rw------- 1 kingbase kingbase 2.1K Mar  1 11:30 00000031.history

查看timeline对应history文件:

[kingbase@node102 sys_wal]$ cat 00000031.history
1       0/690000A0      no recovery target specified
2       0/6A0000A0      no recovery target specified
3       1/C50089F0      no recovery target specified
......
47      6/790014E8      no recovery target specified
48      6/7A0000A0      no recovery target specified

查看当前timeline:

prod=# select timeline_id from sys_control_checkpoint();
 timeline_id
-------------
          49
(1 row)

---如上所示,备库在启动数据库服务后,读取控制文件获取到检查点对应lsn后,开始应用wal日志,直到应用到最新的timeline所对应的wal日志文件达到数据一致性后,停止恢复。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值