KingbaseES R6集群归档备份故障分析解决案例

案例说明:
在使用ps工具查看主库进程,发现主库‘archiver’进程失败,检查sys_log日志可以发现归档失败的信息。通过sys_log日志提取归档语句手工执行归档操作,提示“当前数据库启动的data目录和sys_rman.conf配置的‘kb1-path'参数指定的路径不一致”。后查看备份配置文件sys_rman.conf发现,是因为此主机测试过单实例库的备份,导致sys_rman.conf文件被修改,因此导致集群的归档失败。重新在集群环境下初始化备份,归档自动恢复。

数据库版本:

test=# select version();
                                                       version
----------------------------------------------------------------------------------------------------------------------
 KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

集群架构:

一、故障现象
=查看主库数据库进程,archiver进程失败=

[kingbase@node101 sys_wal]$ ps -ef |grep kingbase
root      1101     1  0 09:34 ?        00:00:00 sys_securecmdd: /home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_securecmdd -f /etc/.kes/securecmdd_config [listener] 0 of 128-256 startups
kingbase  2789     1  0 09:35 ?        00:00:09 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A daemon -f /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf
kingbase 13069     1  0 10:35 ?        00:00:01 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/kha/kingbase/data
kingbase 13070 13069  0 10:35 ?        00:00:00 kingbase: logger
kingbase 13072 13069  0 10:35 ?        00:00:00 kingbase: checkpointer
kingbase 13073 13069  0 10:35 ?        00:00:00 kingbase: background writer
kingbase 13074 13069  0 10:35 ?        00:00:00 kingbase: walwriter
kingbase 13075 13069  0 10:35 ?        00:00:00 kingbase: autovacuum launcher
kingbase 13076 13069  0 10:35 ?        00:00:00 kingbase: archiver   failed on 00000005.history
kingbase 13077 13069  0 10:35 ?        00:00:00 kingbase: stats collector
kingbase 13078 13069  0 10:35 ?        00:00:00 kingbase: ksh writer
kingbase 13079 13069  0 10:35 ?        00:00:00 kingbase: ksh collector
kingbase 13080 13069  0 10:35 ?        00:00:00 kingbase: kwr collector
kingbase 13081 13069  0 10:35 ?        00:00:00 kingbase: logical replication launcher
kingbase 13096 13069  0 10:35 ?        00:00:07 kingbase: system esrep 192.168.1.101(18471) idle
kingbase 13099     1  0 10:35 ?        00:00:13 /home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgrd -d -v -f /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf

二、故障分析

1、查看主库sys_log日志

2022-04-26 11:26:49.072 CST,,,13076,,62675a58.3314,202,,2022-04-26 10:35:04 CST,,0,LOG,00000,"archive command failed with exit code 32","The failed archive command was: export TZ=Asia/Shanghai;/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_rman --config /home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase archive-push sys_wal/00000005.history",,,,,,,,""
2022-04-26 11:26:50.079 CST,,,13076,,62675a58.3314,203,,2022-04-26 10:35:04 CST,,0,LOG,00000,"archive command failed with exit code 32","The failed archive command was: export TZ=Asia/Shanghai;/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_rman --config /home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase archive-push sys_wal/00000005.history",,,,,,,,"

=== 从日志信息获知,wal日志文件归档失败。===

2、查看归档失败的wal日志文件

=查看wal日志文件是否因为被删除导致归档失败。=

[kingbase@node101 sys_wal]$ ls -lh *.history
-rw------- 1 kingbase kingbase  41 Mar 29 16:44 00000002.history
-rw------- 1 kingbase kingbase  83 Apr  8 10:49 00000003.history
-rw------- 1 kingbase kingbase 126 Apr 25 10:49 00000004.history
-rw------- 1 kingbase kingbase 169 Apr 25 11:13 00000005.history
-rw------- 1 kingbase kingbase 212 Apr 25 11:23 00000006.history
-rw------- 1 kingbase kingbase 255 Apr 25 11:33 00000007.history

3、查看归档配置

4、手工执行归档命令

=== 手工执行归档命令,通过错误信息判断归档失败的原因。===

[kingbase@node101 sys_wal]$ /home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_rman --config /home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase archive-push sys_wal/00000005.history
2022-04-26 11:28:57.575 P00   INFO: archive-push command begin 2.27: [sys_wal/00000005.history] --compress-level=3 --compress-type=gz --config=/home/kingbase/kbbr_repo/sys_rman.conf --exec-id=27788-e059621b --kb1-path=/data/kingbase/v8r6_041/data1 --log-level-console=info --log-level-file=info --log-path=/opt/Kingbase/ES/V8R6_041/Server/log --log-subprocess --process-max=4 --repo1-path=/home/kingbase/kbbr_repo --stanza=kingbase
ERROR: [032]: Kingbase working directory '/home/kingbase/cluster/R6HA/kha/kingbase/data/sys_wal' is not the same as option kb1-path '/data/kingbase/v8r6_041/data1'

5、查看备份配置文件sys_rman.conf(归档时会读取此配置文件)

=== 通过以上分析,得出故障原因,是因为此主机曾经测试过单实例的sys_bakcup.sh的备份,导致sys_rman.conf文件被修改,在集群环境下归档时,读取此文件’kb1-path'配置错误,导致归档失败。===

三、故障解决方案

1、重新在集群环境下初始化sys_backup.sh

[kingbase@node101 bin]$ ./sys_backup.sh init
# generate single sys_rman.conf...DONE
# update single archive_command with sys_rman.archive-push...DONE
# create stanza and check...(maybe 60+ seconds)
# create stanza and check...DONE
# initial first full backup...(maybe several minutes)
# initial first full backup...DONE
# Initial sys_rman OK.
'sys_backup.sh start' should be executed when need back-rest feature.

2、查看sys_rman.conf配置

[kingbase@node101 bin]$ cat /home/kingbase/kbbr_repo/sys_rman.conf
# Genarate by script at 20220426113905, should not change manually
[kingbase]
kb1-path=/home/kingbase/cluster/R6HA/kha/kingbase/data
kb1-port=54321
kb1-user=system

[global]
repo1-path=/home/kingbase/kbbr_repo
repo1-retention-full=5
log-path=/home/kingbase/cluster/R6HA/kha/kingbase/log
log-level-file=info
log-level-console=info
log-subprocess=y
process-max=4
#### default gz, support: gz none
compress-type=gz
compress-level=3
band-width=0
cmd-ssh=/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_securecmd

3、检查归档进程

[kingbase@node101 bin]$ ps -ef |grep kingbase
/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_securecmdd -f /etc/.kes/securecmdd_config [listener] 0 of 128-256 startups
kingbase  2789     1  0 09:35 ?        00:00:11 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A daemon -f /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf
kingbase 13069     1  0 10:35 ?        00:00:01 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kingbase -D /home/kingbase/cluster/R6HA/kha/kingbase/data
kingbase 13070 13069  0 10:35 ?        00:00:00 kingbase: logger
kingbase 13072 13069  0 10:35 ?        00:00:00 kingbase: checkpointer
kingbase 13073 13069  0 10:35 ?        00:00:00 kingbase: background writer
kingbase 13074 13069  0 10:35 ?        00:00:00 kingbase: walwriter
kingbase 13075 13069  0 10:35 ?        00:00:00 kingbase: autovacuum launcher
kingbase 13076 13069  0 10:35 ?        00:00:00 kingbase: archiver   last was 00000007000000000000003A.00000028.backup
kingbase 13077 13069  0 10:35 ?        00:00:00 kingbase: stats collector
kingbase 13078 13069  0 10:35 ?        00:00:00 kingbase: ksh writer
kingbase 13079 13069  0 10:35 ?        00:00:00 kingbase: ksh collector
kingbase 13080 13069  0 10:35 ?        00:00:00 kingbase: kwr collector
kingbase 13081 13069  0 10:35 ?        00:00:00 kingbase: logical replication launcher
kingbase 13096 13069  0 10:35 ?        00:00:08 kingbase: system esrep 192.168.1.101(18471) idle
kingbase 13099     1  0 10:35 ?        00:00:15 /home/kingbase/cluster/R6HA/kha/kingbase/bin/repmgrd -d -v -f /home/kingbase/cluster/R6HA/kha/kingbase/bin/../etc/repmgr.conf

4、查看sys_log日志

5、查看已经归档wal日志

=从以上信息可知,归档失败问题已经解决。=

四、总结
在kingbaseES 数据库启动sys_backup.sh执行物理备份后,wal日志的归档是由sys_rman命令来执行,归档时会读取数据库归档配置及sys_rman.conf的配置文件,所以在数据库出现归档错误时要检查数据库的归档配置及sys_rman.conf文件的配置。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
KingbaseES是一款国产的关系型数据库,它完全兼容Oracle数据库的SQL语法和体系结构,同时具有高性能、高可靠性和高兼容性等优点。以下是KingbaseESLinux下的登录方式以及安装及初始操作指南: 1. 登录KingbaseES数据库Linux终端中输入以下命令,登录KingbaseES数据库: ``` kbsql -h 主机名 -p 端口号 -d 数据库名 -u 用户名 -w 密码 ``` 其中,主机名是KingbaseES服务器的IP地址或主机名;端口号是数据库监听的端口,默认为54321;数据库名是要连接的数据库名称;用户名和密码为数据库的登录凭证。 2. 安装KingbaseES数据库 KingbaseES数据库的安装方式与Oracle数据库类似,可以通过安装包进行安装。安装前需要先安装依赖包,例如: ``` yum -y install libaio ``` 然后下载KingbaseES安装包并解压缩,进入解压后的目录,执行以下命令进行安装: ``` ./install.sh ``` 按照提示完成安装即可。 3. 进行初始操作 安装完成后,可以进行一些初始操作,例如创建用户、创建表空间等。在登录数据库后,可以使用以下命令创建用户: ``` CREATE USER 用户名 IDENTIFIED BY 密码; ``` 创建表空间的命令如下: ``` CREATE TABLESPACE 表空间名 DATAFILE '文件路径' SIZE 大小; ``` 以上是KingbaseESLinux下的登录方式以及安装及初始操作指南。需要注意的是,在使用KingbaseES时需要按照Oracle数据库的规范进行操作,否则可能会导致不兼容或错误。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值