目录
一、场景假设:
1)、测试环境
主机名 | 实例名/监视器 | 初始状态 |
dm21 | GRP1_RT_01 | 备库 |
dm22 | GRP1_RT_02 | 主库 |
它们是实时主备集群关系,有主机名为dm23的确认监示器。
2)、测试场景
一开始都是运行正常的,某天备库主机因硬件损坏导致突然down机,其修复时间未知,在备库未修复好的这段时间里需要对主库进行维护(比如调整参数等等),重新启动主库,如何使之可以继续正常提供服务?哪么后继备库硬件修复好后,又如何重新加入到集群中?哪么在这其中watcher和库又发生了哪些动作?
二、开始测试:
1、备库down机后watcher和库的动作
为了测试环境真实,现在假设备库因硬件损坏导致突然down机,在主库上做如下事务(注意:据dm的原理来说只有做事务才能看到后继效果):
SQL> insert into t(i) values(2);
同时在监示器上显示变化如下:
[monitor] 2022-04-24 13:14:29: Dmwatcher process GRP1_RT_02 status switching [OPEN-->MON CONFIRM]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-24 13:14:35 MON CONFIRM OK GRP1_RT_02 SUSPEND PRIMARY VALID 30 66101 66104
[monitor] 2022-04-24 13:14:30: Dmwatcher process GRP1_RT_02 status switching [MON CONFIRM-->FAILOVER]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-24 13:14:36 FAILOVER OK GRP1_RT_02 SUSPEND PRIMARY VALID 30 66101 66104
[monitor] 2022-04-24 13:14:32: Dmwatcher process GRP1_RT_02 status switching [FAILOVER-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-04-24 13:14:38 OPEN OK GRP1_RT_02 OPEN PRIMARY VALID 30 66104 66104
主库查询:
SQL> select * from v$arch_status ;
LINEID ARCH_TYPE ARCH_DEST ARCH_STATUS ARCH_SRC
---------- --------- ---------- ----------- ----------
1 REALTIME GRP1_RT_01 INVALID GRP1_RT_02
2 LOCAL /dm/arch VALID GRP1_RT_02
可知:在备库突然down机时,在主库做事务,主库的watcher本身自己会变为FAILOVER的状态,并且将主库暂时变成suspend的状态,此时主库会将备库的规档修改为失效,之后watcher又会将主库变为OPEN的状态!
2、重启主库,使之可以继续正常提供服务
在备库未修复好的这段时间里需要对主库进行维护(比如调整参数等等),重新启动主库。
[root@dm22 ~]# /dm/dmdbms/bin/DmWatcherServiceDM stop
[root@dm22 ~]# /dm/dmdbms/bin/DmServiceDM stop
[root@dm22 ~]# reboot
此时重启主库是只能启到mount状态!
[dmdba@dm22 ~]$ disql sysdba/SYSDBA:32142
Server[LOCALHOST:32142]:mode is primary, state is mount
login used time : 1.409(ms)
disql V8
SQL>
此时即使是启动守护进程,也不能拉起主机为open(估计是跟规档模式有关!),经实验,要等备库正常启动了,主库才能被watcher拉到open!
哪么问题来了,现在备库还未修复,如何使主库可以继续正常提供服务?答案是:只能停watcher,让主库变成单机normal,即是:做 ./DmWatcherServiceDM stop;ALTER DATABASE NORMAL;ALTER DATABASE OPEN;【注意:如果不做NORMAL开库,直接做force open的话,则主库会挂起,库也不能用!】
其测试如下:
1)、如果不做NORMAL开库,直接做force open的话,则主库会挂起,库也不能用!
[dmdba@dm22 ~]$ disql sysdba/SYSDBA:32142
Server[LOCALHOST:32142]:mode is primary, state is mount
login used time : 1.409(ms)
disql V8
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 1);
DMSQL executed successfully
used time: 5.406(ms). Execute id is 100.
SQL> alter database open;
alter database open;
[-516]:Error in line: 1
None normal mode open force needed.
used time: 0.490(ms). Execute id is 0.
SQL>
SQL>
SQL> select * from v$instance;
LINEID NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION DB_VERSION START_TIME STATUS$ MODE$ OGUID
---------- ---------- ------------- --------------- --------- -------------------------- ------------------- ------------------- ------- ------- -----------
DSC_SEQNO DSC_ROLE
----------- --------
1 GRP1_RT_02 GRP1_RT_02 1 dm22 DM Database Server x64 V8 DB Version: 0x7000c 2022-04-24 13:33:20 MOUNT PRIMARY 453331
0 NULL
used time: 2.679(ms). Execute id is 101.
SQL> alter database open force;
alter database open force;
[-720]:Error in line: 1
Dmwatcher is active, or current configuration(ALTER_MODE_STATUS) not allowed to alter database.
used time: 0.321(ms). Execute id is 0.
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 1);
DMSQL executed successfully
used time: 4.043(ms). Execute id is 102.
[root@dm22 ~]# /dm/dmdbms/bin/DmWatcherServiceDM stop
SQL> alter database open force;
executed successfully
used time: 379.897(ms). Execute id is 0.
SQL> select * from v$instance;
LINEID NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION DB_VERSION START_TIME STATUS$ MODE$ OGUID
---------- ---------- ------------- --------------- --------- -------------------------- ------------------- ------------------- ------- ------- -----------
DSC_SEQNO DSC_ROLE
----------- --------
1 GRP1_RT_02 GRP1_RT_02 1 dm22 DM Database Server x64 V8 DB Version: 0x7000c 2022-04-24 13:33:20 SUSPEND PRIMARY 453331
0 NULL
used time: 0.948(ms). Execute id is 104.
SQL> alter database open;
alter database open;
[-516]:Error in line: 1
None normal mode open force needed.
used time: 0.339(ms). Execute id is 0.
对于这一点,还可以看到:当库为primary时,OPEN 状态SUSPEND 是不能相互转换的!
2)、做NORMAL开库,使主库可以继续正常提供服务。
[root@dm22 ~]# /dm/dmdbms/bin/DmWatcherServiceDM stop
[dmdba@dm22 ~]$ disql sysdba/SYSDBA:32142
Server[LOCALHOST:32142]:mode is primary, state is mount
login used time : 3.562(ms)
disql V8
SQL> ALTER DATABASE NORMAL;
executed successfully
used time: 22.176(ms). Execute id is 0.
SQL> ALTER DATABASE OPEN;
executed successfully
used time: 386.850(ms). Execute id is 0.
SQL> select * from v$instance;
LINEID NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION DB_VERSION START_TIME STATUS$ MODE$ OGUID
---------- ---------- ------------- --------------- --------- -------------------------- ------------------- ------------------- ------- ------ -----------
DSC_SEQNO DSC_ROLE
----------- --------
1 GRP1_RT_02 GRP1_RT_02 1 dm22 DM Database Server x64 V8 DB Version: 0x7000c 2022-04-24 15:49:41 OPEN NORMAL 453331
0 NULL
used time: 2.003(ms). Execute id is 300.
SQL> insert into t(i) values(5);
affect rows 1
used time: 1.896(ms). Execute id is 301.
SQL> commit;
executed successfully
used time: 0.878(ms). Execute id is 302.
SQL>
3、备库硬件修复好后,又如何重新加入到集群中
当备库修好了过后,启动备库有时依然能同步数据!如果不能同步数据的话,备库只能重建,按重建备库方法后,将现在NORMAL的库变成primary即可。
三、总结
1、对于此种测试场景的结论是:只能停watcher,让主库变成单机normal继续提供服务。
2、当库为primary时,OPEN 状态SUSPEND 是不能相互转换的(SUSPEND是不能转化为OPEN的)!
更多学习内容参考:达梦数据库 - 新一代大型通用关系型数据库 | 达梦云适配中心 https://eco.dameng.com