KingbaseES R6 集群备库网卡down测试案例

数据库版本:

test=# select version();
                                                       version
----------------------------------------------------------------------------------------------------------------------
 KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

主机节点信息:

[kingbase@node101 bin]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.101   node101  ,  #主库
192.168.1.102   node102     #备库

集群节点信息:

ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node101 | primary | * running |          | running | 11180 | no      | n/a
 2  | node102 | standby |   running | node101  | running | 9242  | no      | 0 second(s) ago

一、查看集群状态及配置信息

1、集群节点状态

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                         
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby |   running | node101  | default  | 100      | 1        | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、集群配置信息

二、将备库网卡down测试

1、备库网卡down[root@node102 ~]# ifconfig enp0s3 down

2、查看备库messages日志

3、备库hamgr.log

=日志信息显示repmgrd服务被close,无法提供正常的服务。=

4、主库查看集群节点状态

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                 
----+---------+---------+---------------+----------+----------+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running     |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=5 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby | ? unreachable | node101  | default  | 100      | ?        | host=192.168.1.102 user=system dbname=esrep port=5 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

WARNING: following issues were detected
  - unable to connect to node "node102" (ID: 2)
  - node "node102" (ID: 2) is registered as an active standby but is unreachable

=== 从以上信息所示,集群没有触发主备库的切换操作。===

三、备库网卡恢复正常(up)

1、查看集群状态信息

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=54321nect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby |   running | node101  | default  | 100      | 1        | host=192.168.1.102 user=system dbname=esrep port=54321nect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、查看备库hamgr.log

=如下日志所示,备库网卡恢复正常后,备库通过接收wal日志流执行recovery,和主库同步。=

[2022-03-29 16:11:45] [INFO] node "node102" (ID: 2) monitoring upstream node "node101" (ID: 1) in normal state
[2022-03-29 16:11:45] [ERROR] unable to determine if server is in recovery
[2022-03-29 16:11:45] [DETAIL]
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

[2022-03-29 16:11:45] [DETAIL] query text is:
SELECT pg_catalog.pg_is_in_recovery()
[2022-03-29 16:11:47] [NOTICE] upstream is available but upstream connection has gone away, resetting
[2022-03-29 16:12:24] [ERROR] is_rep_sync_streaming(): get 2 tuples
[2022-03-29 16:12:45] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:45] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:12:47] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:47] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:12:49] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:49] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:16:47] [INFO] node "node102" (ID: 2) monitoring upstream node "node101" (ID: 1) in normal state

四、总结

1、对于备库,如果网卡down引起的网络故障,并不会触发集群的主备切换。当网卡正常后,集群恢复正常。
 2、如果备库的数据库服务down,在recovery=‘automatic | standby’配置时,会自动恢复备库的数据库服务。
 3、本案例是在一主一备的架构下的测试,如果是一主多备的架构,对于同步状态是‘sync’的备库网卡down,会导致其他的备库进行竞选,将同步状态提升为‘sync’。
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值