案例说明:
在KingbaseES R3集群中,kingbasecluster进程会通过test库访问,连接后台数据库服务测试;如果删除test数据库,导致后台数据库服务访问失败,在集群主备切换时,无法访问后台数据库服务,导致切换失败。修改集群HAmodule.conf配置文件相关参数后,可以解决集群test库被删除导致主备切换失败问题。
测试数据库版本:
prod=# select version();
version
-------------------------------------------------------------------------------------------------------------------------
Kingbase V008R003C002B0270 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)
一、查看集群访问test库配置
[kingbase@node1 etc]$ cat HAmodule.conf |grep -i test
#database instance built-in database.example:KB_DATANAME="TEST"
KB_DATANAME="TEST"
二、查看kingbase_monitor.sh访问test库信息
=可以从kingbase_monitor.sh start的启动过程,看到对test库的访问=
[kingbase@node1 bin]$ sh -x kingbase_monitor.sh restart > ~/kmon.txt
[kingbase@node1 ~]$ cat kmon.txt |grep -i test
+ param='KB_DATANAME="TEST"'
+ paramValue='"TEST"'
+ '[' -z '"TEST"' ']'
+ eval 'KB_DATANAME="TEST"'
++ KB_DATANAME=TEST
++ /home/kingbase/cluster/kha/db/bin/ksql 'host=192.168.7.248 port=54321 user=SUPERMANAGER_V8ADMIN password=xxxx dbname=TEST connect_timeout=10' -Aqtc 'select count(*)=1 from sys_stat_replication;'
++ /home/kingbase/cluster/kha/db/bin/ksql 'host=192.168.7.248 port=54321 user=SUPERMANAGER_V8ADMIN password=xxxx dbname=TEST connect_timeout=10' -Aqtc 'select sys_xlog_location_diff(sys_current_xlog_flush_location(), write_location)<=16777216 from sys_stat_replication;'
二、集群删除test库测试(主库)
# 查看database cluster
[kingbase@node3 bin]$ ./ksql -U system -W 123456 prod
ksql (V008R003C002B0270)
Type "help" for help.
prod=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+-------------+-------------+--------------------
prod | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
SAMPLES | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
SECURITY | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
TEMPLATE0 | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/system +
| | | | | system=CTcb/system
TEMPLATE1 | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/system +
| | | | | system=CTcb/system
TEMPLATE2 | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/system +
| | | | | system=CTcb/system
TEST | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(7 rows)
# 主库判断
prod=# select sys_is_in_recovery();
sys_is_in_recovery
--------------------
f
(1 row)
# 查看流复制状态
prod=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin |
state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+--
25795 | 10 | system | node243 | 192.168.7.243 | | 45418 | 2021-03-01 12:49:12.263710+08 | | s
treaming | 0/E0001B0 | 0/E0001B0 | 0/E0001B0 | 0/E000178 | 0 | async
(1 row)
# 主库删除test库:
prod=# drop database test;
DROP DATABASE
prod=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+-------------+-------------+--------------------
prod | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
SAMPLES | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
SECURITY | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
TEMPLATE0 | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/system +
| | | | | system=CTcb/system
TEMPLATE1 | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/system +
| | | | | system=CTcb/system
TEMPLATE2 | system | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/system +
| | | | | system=CTcb/system
(6 rows)
# 备库查看:
[kingbase@node3 bin]$ ./ksql -U system -W 123456 prod
ksql (V008R003C002B0270)
Type "help" for help.
prod=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+-------------+-------------+--------------------