集群down机的应急和恢复测试(非重做备机)

1. 集群的两台服务器的状态

实例

正常情况主备

ip

端口

node1

主机

192.168.6.6

9088

node2

备机

192.168.6.7

9088

2. 测试的步骤

  • down掉node1
  • 观察node2的状态
  • 在node2未自动切换的时候手动将node2调整为单机状态,模拟紧急使用
  • 模拟不紧急时,将node2升级为主机,并恢复节点node1

3. 主机down机后手动操纵备机使备机快速进入可使用状态

[gbasedbt@node01 install]$ onstat -g dri
On-Line (Prim) -- Up 00:16:11 -- 1650580 Kbytes

Data Replication at 0x4cf1a028:
  Type           State        Paired server        Last DR CKPT (id/pg)    Supports Proxy Writes
  primary        on           node2                         9 / 1          NA

  DRINTERVAL   0
  DRTIMEOUT    30
  DRAUTO       0
  DRLOSTFOUND  /opt/GBASE/gbase/etc/dr.lostfound
  DRIDXAUTO    0
  ENCRYPT_HDR  0
  Backlog      0
  Last Send    2024/06/17 22:01:20
  Last Receive 2024/06/17 22:01:20
  Last Ping    2024/06/17 22:01:05
  Last log page applied(log id,page): 9,2


[root@node01 GBASE]# onstat -
On-Line (Prim) -- Up 00:14:11 -- 1650580 Kbytes

[root@node01 GBASE]# su - gbasedbt
上一次登录:一 6月 17 21:45:54 CST 2024pts/0 上
[gbasedbt@node01 ~]$ onclean -ky
onclean: Cleaning up processes and resources for 'node1'...
 - Looking for the master daemon process: 13760
 - Looking for the shmem key: 52934803
 - Looking for the shmem key: 52934804
 - Looking for semaphore ID: 10
 - Looking for the shmem key: 52934801
 - Looking for the shmem key: 52934802
[gbasedbt@node01 ~]$
--主备集群之间由健康检查判断集群是否正常,由于心跳检查是多次连接,每次连接之间有数秒的间隔,所以主机down到备机切换之间有健康检查时间,这段时间备机显示集群是正常的
[gbasedbt@node02 ~]$ onstat -g dri
Read-Only (Sec) -- Up 00:01:22 -- 1635008 Kbytes

Data Replication at 0x4c13d028:
  Type           State        Paired server        Last DR CKPT (id/pg)    Supports Proxy Writes
  HDR Secondary  on           node1                         9 / 1          N

  DRINTERVAL   0
  DRTIMEOUT    30
  DRAUTO       0
  DRLOSTFOUND  /opt/GBASE/gbase/etc/dr.lostfound
  DRIDXAUTO    0
  ENCRYPT_HDR  0
  Backlog      0
  Last Send    2024/06/17 22:02:04
  Last Receive 2024/06/17 22:02:04
  Last Ping    2024/06/17 22:01:59
  Last log page applied(log id,page): 0,0
  • 本次模拟主机down机,备机还没有发现的情况下,将备机恢复使用
[gbasedbt@node02 ~]$ onstat -g dri
Read-Only (Sec) -- Up 00:01:22 -- 1635008 Kbytes

Data Replication at 0x4c13d028:
  Type           State        Paired server        Last DR CKPT (id/pg)    Supports Proxy Writes
  HDR Secondary  on           node1                         9 / 1          N

  DRINTERVAL   0
  DRTIMEOUT    30
  DRAUTO       0
  DRLOSTFOUND  /opt/GBASE/gbase/etc/dr.lostfound
  DRIDXAUTO    0
  ENCRYPT_HDR  0
  Backlog      0
  Last Send    2024/06/17 22:02:04
  Last Receive 2024/06/17 22:02:04
  Last Ping    2024/06/17 22:01:59
  Last log page applied(log id,page): 0,0

[gbasedbt@node02 ~]$ onstat -
Read-Only (Sec) -- Up 00:01:55 -- 1635008 Kbytes

[gbasedbt@node02 ~]$ onmode -d standard
[gbasedbt@node02 ~]$ onstat -
On-Line -- Up 00:02:21 -- 1635008 Kbytes

4. 备机变成单机状态后需要升为主机并恢复集群

[gbasedbt@node02 ~]$ onmode -d primary node1
[gbasedbt@node02 ~]$ onstat -
On-Line (Prim) -- Up 00:02:38 -- 1635008 Kbytes

--node1节点执行oninit -PHY执行物理日志恢复
[gbasedbt@node01 node1_dbs]$ oninit -PHY
[gbasedbt@node01 node1_dbs]$ onstat -m
Fast Recovery -- Up 00:00:13 -- 1650580 Kbytes

Message Log File: /opt/GBASE/gbase/tmp/online_node1.log
06/17/24 22:49:31  SQL_FEAT_CTRL value set to 0x8008
06/17/24 22:49:31  SQL_DEF_CTRL value set to 0x4b0
06/17/24 22:49:31  GBase Database Server Version 12.10.FC4G1AEE Software Serial Number AAA#B000000
06/17/24 22:49:32  GBase Database Server Initialized -- Shared Memory Initialized.

06/17/24 22:49:32  Started 1 B-tree scanners.
06/17/24 22:49:32  B-tree scanner threshold set at 5000.
06/17/24 22:49:32  B-tree scanner range scan size set to -1.
06/17/24 22:49:32  B-tree scanner ALICE mode set to 6.
06/17/24 22:49:32  B-tree scanner index compression level set to med.
06/17/24 22:49:32  DR: Reservation of the last logical log for log backup turned on
06/17/24 22:49:32  Data replication type and state information reset. To start DR, use
          the 'onmode -d' command and wait for the pair to be operational,
          before shutting down the database server

06/17/24 22:49:32  Physical Recovery Started at Page (3:394).
06/17/24 22:49:32  Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored.
06/17/24 22:49:32  Dataskip is now OFF for all dbspaces
06/17/24 22:49:32  Restartable Restore has been ENABLED
06/17/24 22:49:32  Recovery Mode
--查看节点,发现为快速恢复阶段
[gbasedbt@node01 node1_dbs]$ onstat -
Fast Recovery -- Up 00:00:21 -- 1650580 Kbytes

--将node1节点当成备机加入节点
[gbasedbt@node01 node1_dbs]$ onmode -d secondary node2
[gbasedbt@node01 node1_dbs]$ onstat -
Read-Only (Sec) -- Up 00:02:04 -- 2188180 Kbytes

[gbasedbt@node01 node1_dbs]$ onstat -g dri
Read-Only (Sec) -- Up 00:04:31 -- 2188180 Kbytes

Data Replication at 0x4cf1a028:
  Type           State        Paired server        Last DR CKPT (id/pg)    Supports Proxy Writes
  HDR Secondary  on           node2                         9 / 5          N

  DRINTERVAL   0
  DRTIMEOUT    30
  DRAUTO       2
  DRLOSTFOUND  /opt/GBASE/gbase/etc/dr.lostfound
  DRIDXAUTO    0
  ENCRYPT_HDR  0
  Backlog      0
  Last Send    2024/06/17 22:50:42
  Last Receive 2024/06/17 22:50:44
  Last Ping    2024/06/17 22:53:35
  Last log page applied(log id,page): 0,0

  • 7
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值