【达梦】重启主库观察守护进程的状态变化

1.重启主库,观察守护进程的状态变化


[monitor]         2024-07-27 14:58:50: dwmon_tcp_recv failed, close port, vio:0, mid:1722055233, errno:104, code:-6007
[monitor]         2024-07-27 14:58:50: dwmon tcp port vio(0) close, inst_name:SSPUDB1, ip:192.168.1.20, port:15238, n_fixed:1.
[SSPUDB2]          2024-07-27 14:58:50.970 [INFO] dmwatcher P0000018005 T0000000000000018009  没有收到远程守护进程(SSPUDB1)消息,原状态为(OPEN),距进程为ERROR状态
[SSPUDB2]          2024-07-27 14:58:51.021 [INFO] dmwatcher P0000018005 T0000000000000018009  Instance: 守护进程状态(ERROR) 实例状态(OK) 实例名(SSALID) POCNT(5) FLSN(42270) CLSN(42270) SLSN(42270) SSLSN(42270)
[monitor]         2024-07-27 14:59:00: <RECEIVE TIMEOUT SSPUDB1>
[monitor]         2024-07-27 14:59:00: 接收守护进程(SSPUDB1)消息超时
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2024-07-27 14:58:50  ERROR          OK        SSPUDB1          OPEN        PRIMARY   VALID    5        42270           42270           

[monitor]         2024-07-27 14:59:00: </RECEIVE TIMEOUT SSPUDB1>

[monitor]         2024-07-27 14:59:00: [!!! 实例SSPUDB1[PRIMARY, OPEN, ISTAT_SAME:TRUE]故障,实例SSPUDB2[STANDBY, OPEN, ISTAT_SAME:TRUE]符合自动接管条件 !!!]

[monitor]         2024-07-27 14:59:00: 检测到PRIMARY实例故障,开始对组(GRP1)执行自动接管

[monitor]         2024-07-27 14:59:00: <AUTO TAKEOVER SSPUDB2>
[monitor]         2024-07-27 14:59:00: 通知组(GRP1)当前活动的守护进程设置MID
[monitor]         2024-07-27 14:59:00: Begin to wait site(SSPUDB2) complete...
[monitor]         2024-07-27 14:59:00: [!!! dwmon_tcp_msg_send, get master tcp port failed, send cmd msg(cmd:5, name_sendto:GRP1) to dmwatcher() !!!]
[monitor]         2024-07-27 14:59:01: Wait site(SSPUDB2) finished, code=0!
[monitor]         2024-07-27 14:59:01: 通知组(GRP1)当前活动的守护进程设置MID成功
[monitor]         2024-07-27 14:59:01: 开始使用实例SSPUDB2接管
[monitor]         2024-07-27 14:59:01: 通知守护进程SSPUDB2切换TAKEOVER状态
[monitor]         2024-07-27 14:59:01: 守护进程(SSPUDB2)状态切换 [OPEN-->TAKEOVER]
[monitor]         2024-07-27 14:59:02: 切换守护进程SSPUDB2为TAKEOVER状态成功
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2开始执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句成功
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2开始执行SP_APPLY_KEEP_PKG()语句
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2执行SP_APPLY_KEEP_PKG()语句成功
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2开始执行ALTER DATABASE MOUNT语句
[monitor]         2024-07-27 14:59:02: Begin to wait site(SSPUDB2) complete...
[monitor]         2024-07-27 14:59:02: Wait site(SSPUDB2) finished, code=0!
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2执行ALTER DATABASE MOUNT语句成功
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2开始执行ALTER DATABASE PRIMARY语句
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2执行ALTER DATABASE PRIMARY语句成功
[monitor]         2024-07-27 14:59:02: 通知实例SSPUDB2修改所有归档状态无效
[monitor]         2024-07-27 14:59:02: 修改所有实例归档为无效状态成功
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2开始执行ALTER DATABASE OPEN FORCE语句
[monitor]         2024-07-27 14:59:02: ohis_inst_info_copy_low, inst(SSPUDB2) apply info changed, old info[p_db_magic:1684231299, n_apply_ep:1], new info to set[p_db_magic:1771613664, n_apply_ep:0]!
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2执行ALTER DATABASE OPEN FORCE语句成功
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2开始执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句
[monitor]         2024-07-27 14:59:02: 实例SSPUDB2执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句成功
[monitor]         2024-07-27 14:59:02: 通知守护进程SSPUDB2切换OPEN状态
[monitor]         2024-07-27 14:59:02: 守护进程(SSPUDB2)状态切换 [TAKEOVER-->OPEN]
[monitor]         2024-07-27 14:59:03: 切换守护进程SSPUDB2为OPEN状态成功
[monitor]         2024-07-27 14:59:03: 通知组(GRP1)的守护进程执行清理操作
[monitor]         2024-07-27 14:59:03: Notify instance(SSPUDB2) to clear monitor info and wait complete!
[monitor]         2024-07-27 14:59:03: Begin to wait site(SSPUDB2) complete...
[monitor]         2024-07-27 14:59:03: dwmon_cmd_msg_send_low failed, get tcp_port failed(inst_name:, ip:192.168.1.20, port:15238)!
[monitor]         2024-07-27 14:59:03: [!!! dwmon_tcp_msg_send to master tcp_port failed, code:-6010, (inst_name:, ip:192.168.1.20, port:15238, vio:0) !!!]
[monitor]         2024-07-27 14:59:04: 清理守护进程(SSPUDB2)请求成功
[monitor]         2024-07-27 14:59:04: Wait site(SSPUDB2) finished, code=0!
[monitor]         2024-07-27 14:59:04: 使用实例SSPUDB2接管成功

[monitor]         2024-07-27 14:59:04: </AUTO TAKEOVER SSPUDB2>

[monitor]         2024-07-27 14:59:04: 组(GRP1)使用实例SSPUDB2自动接管成功

[monitor]         2024-07-27 14:59:16: dmmonitor create link to dmwatcher success, mid:1722055233, dmwatcher ip:192.168.1.20, dmwatcher port:15238, vio:3, inst_name:SSPUDB1
[monitor]         2024-07-27 14:59:16: <MON CHECK SSPUDB1>
[monitor]         2024-07-27 14:59:16: 守护进程(SSPUDB1)状态切换 [NONE-->STARTUP]
[monitor]         2024-07-27 14:59:16: </MON CHECK SSPUDB1>

[monitor]         2024-07-27 14:59:16: [!!! 组(GRP1)中存在多个PRIMARY&OPEN实例,不符合自动接管条件 !!!]

[monitor]         2024-07-27 14:59:16: ohis_inst_info_copy_low, inst(SSPUDB1) apply info changed, old info[p_db_magic:1684231299, n_apply_ep:0], new info to set[p_db_magic:0, n_apply_ep:0]!
[SSPUDB2]          2024-07-27 14:59:17.573 [INFO] dmwatcher P0000018005 T0000000000000035510  Instance: 守护进程状态(ERROR) 实例状态(OK) 实例名(SSPUDB1) 模式(PRIMARY) 实例状态(OPEN) 归档状态(INVALID) POCNT(5) FLSN(42270) CLSN(42270) SLSN(42270) SSLSN(42270)
[SSPUDB1]          2024-07-27 14:59:17.587 [INFO] dmwatcher P0000001318 T0000000000000001453  接收到远程守护进程广播消息,实例状态为:
[SSPUDB1]          2024-07-27 14:59:17.587 [INFO] dmwatcher P0000001318 T0000000000000001453  Instance: 守护进程状态(OPEN) 实例状态(OK) 实例名(SSPUDB2) 模式(PRIMARY) 实例状态(OPEN) 归档状态(INVALID) POCNT(6) FLSN(42639) CLSN(42639) SLSN(42639) SSLSN(42639)
[SSPUDB2]          2024-07-27 14:59:25.577 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,原状态是:
[monitor]         2024-07-27 14:59:26: ohis_inst_info_copy_low, inst(SSPUDB1) apply info changed, old info[p_db_magic:0, n_apply_ep:0], new info to set[p_db_magic:1684231299, n_apply_ep:0]!
[SSPUDB1]          2024-07-27 14:59:26.594 [INFO] dmwatcher P0000001318 T0000000000000002081  服务器端(SSPUDB1)公钥发生变化,广播新值给监视器
[SSPUDB2]          2024-07-27 14:59:26.579 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,原状态是:
[monitor]         2024-07-27 14:59:26: ohis_inst_info_copy_low, inst(SSPUDB1) apply info changed, old info[p_db_magic:1684231299, n_apply_ep:0], new info to set[p_db_magic:1684231299, n_apply_ep:1]!
[SSPUDB1]          2024-07-27 14:59:26.750 [INFO] dmwatcher P0000001318 T0000000000000001371  设置GRP1守护进程为STARTUP(SUB:STARTUP)状态
[SSPUDB2]          2024-07-27 14:59:26.732 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,原状态是:
[SSPUDB2]          2024-07-27 14:59:26.833 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,新状态是:
[monitor]         2024-07-27 14:59:26: <MON CHECK SSPUDB1>
[monitor]         2024-07-27 14:59:26: 守护进程(SSPUDB1)状态切换 [STARTUP-->UNIFY EP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2024-07-27 14:59:26  UNIFY EP       OK        SSPUDB1          MOUNT       STANDBY   INVALID  5        42270           42270           

[monitor]         2024-07-27 14:59:26: </MON CHECK SSPUDB1>

[SSPUDB1]          2024-07-27 14:59:26.955 [INFO] dmwatcher P0000001318 T0000000000000001371  设置GRP1守护进程为UNIFY EP(SUB:STARTUP)状态
[SSPUDB2]          2024-07-27 14:59:26.934 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,原状态是:
[SSPUDB2]          2024-07-27 14:59:27.035 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,新状态是:
[monitor]         2024-07-27 14:59:27: <MON CHECK SSPUDB1>
[monitor]         2024-07-27 14:59:27: 守护进程(SSPUDB1)状态切换 [UNIFY EP-->STARTUP]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2024-07-27 14:59:27  STARTUP        OK        SSPUDB1          OPEN        STANDBY   INVALID  5        42270           42270           

[monitor]         2024-07-27 14:59:27: </MON CHECK SSPUDB1>

[SSPUDB1]          2024-07-27 14:59:27.164 [INFO] dmwatcher P0000001318 T0000000000000001371  设置GRP1守护进程为STARTUP(SUB:STARTUP)状态
[SSPUDB2]          2024-07-27 14:59:27.136 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,原状态是:
[SSPUDB2]          2024-07-27 14:59:27.237 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,新状态是:
[monitor]         2024-07-27 14:59:27: <MON CHECK SSPUDB1>
[monitor]         2024-07-27 14:59:27: 守护进程(SSPUDB1)状态切换 [STARTUP-->OPEN]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2024-07-27 14:59:27  OPEN           OK        SSPUDB1          OPEN        STANDBY   INVALID  5        42270           42270           

[monitor]         2024-07-27 14:59:27: </MON CHECK SSPUDB1>

[SSPUDB1]          2024-07-27 14:59:27.354 [INFO] dmwatcher P0000001318 T0000000000000001371  设置GRP1守护进程为OPEN(SUB:STARTUP)状态
[SSPUDB2]          2024-07-27 14:59:27.338 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,原状态是:
[SSPUDB2]          2024-07-27 14:59:27.439 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,新状态是:
[SSPUDB2]          2024-07-27 14:59:27.540 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,原状态是:
[SSPUDB2]          2024-07-27 14:59:27.641 [INFO] dmwatcher P0000018005 T0000000000000035510  远程实例的模式、状态或者归档状态发生变化,新状态是:
[SSPUDB1]          2024-07-27 14:59:28.785 [INFO] dmwatcher P0000001318 T0000000000000001453  远程实例的模式、状态或者归档状态发生变化,原状态是:
[SSPUDB1]          2024-07-27 14:59:28.785 [INFO] dmwatcher P0000001318 T0000000000000001453  Instance: 守护进程状态(OPEN) 实例状态(OK) 实例名(SSPUDB2) 模式(PRIMARY) 实例状态(OPEN) 归档状态(INVALID) POCNT(6) FLSN(42642) CLSN(42643) SLSN(42643) SSLSN(42643)
[SSPUDB1]          2024-07-27 14:59:28.840 [INFO] dmwatcher P0000001318 T0000000000000001453  远程实例的模式、状态或者归档状态发生变化,新状态是:
[monitor]         2024-07-27 14:59:29: <MON CHECK SSPUDB2>
[monitor]         2024-07-27 14:59:29: 守护进程(SSPUDB2)状态切换 [OPEN-->RECOVERY]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2024-07-27 14:59:29  RECOVERY       OK        SSPUDB2          OPEN        PRIMARY   VALID    6        42643           42643           

[monitor]         2024-07-27 14:59:29: </MON CHECK SSPUDB2>

[SSPUDB2]          2024-07-27 14:59:29.929 [INFO] dmwatcher P0000018005 T0000000000000018009  检测到实例(SSPUDB1)可恢复,执行恢复流程
[SSPUDB2]          2024-07-27 14:59:29.929 [INFO] dmwatcher P0000018005 T0000000000000018009  开始向实例(SSPUDB1)发送归档日志
[monitor]         2024-07-27 14:59:30: ohis_inst_info_copy_low, inst(SSPUDB1) apply info changed, old info[p_db_magic:1684231299, n_apply_ep:1], new info to set[p_db_magic:1771613664, n_apply_ep:1]!
[SSPUDB2]          2024-07-27 14:59:30.951 [INFO] dmwatcher P0000018005 T0000000000000018009  检测到实例(SSPUDB1)发送归档成功,设置为当前恢复实例
[SSPUDB2]          2024-07-27 14:59:30.951 [INFO] dmwatcher P0000018005 T0000000000000018009  向实例(SSPUDB1)发送归档日志成功,实例(SSPUDB2)转入suspend状态
[SSPUDB2]          2024-07-27 14:59:31.185 [INFO] dmwatcher P0000018005 T0000000000000018009  发送归档完毕,设置实例(SSPUDB1)归档有效
[SSPUDB2]          2024-07-27 14:59:31.941 [INFO] dmwatcher P0000018005 T0000000000000018009  不存在可恢复备库
[monitor]         2024-07-27 14:59:32: <MON CHECK SSPUDB2>
[monitor]         2024-07-27 14:59:32: 守护进程(SSPUDB2)状态切换 [RECOVERY-->OPEN]
                  WTIME                WSTATUS        INST_OK   INAME            ISTATUS     IMODE     RSTAT    N_OPEN   FLSN            CLSN            
                  2024-07-27 14:59:32  OPEN           OK        SSPUDB2          OPEN        PRIMARY   VALID    6        42644           42644           

[monitor]         2024-07-27 14:59:32: </MON CHECK SSPUDB2>

[SSPUDB2]          2024-07-27 14:59:32.191 [INFO] dmwatcher P0000018005 T0000000000000018009  设置GRP1守护进程为OPEN(SUB:STARTUP)状态
[SSPUDB1]          2024-07-27 14:59:32.207 [INFO] dmwatcher P0000001318 T0000000000000001453  远程实例的模式、状态或者归档状态发生变化,原状态是:
[SSPUDB1]          2024-07-27 14:59:32.258 [INFO] dmwatcher P0000001318 T0000000000000001453  远程实例的模式、状态或者归档状态发生变化,新状态是:
[monitor]         2024-07-27 14:59:36: 
                  GROUP            OGUID       MON_CONFIRM MODE         MPP_FLAG  
                  GRP1             453331      TRUE        AUTO         FALSE     


                  <<DATABASE GLOBAL INFO:>>
                  DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
                  192.168.1.21        25238        2024-07-27 14:59:35  GLOBAL    VALID     OPEN           SSPUDB2          OK        1     1     OPEN        PRIMARY   DSC_OPEN       REALTIME  VALID    

                  EP INFO:
                  INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE     RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
                  192.168.1.21        25236      OK        SSPUDB2          OPEN        PRIMARY   0          0                REALTIME  VALID    8884            42644           8885            42645           NONE                  

                  <<DATABASE GLOBAL INFO:>>
                  DW_IP               MAL_DW_PORT  WTIME                WTYPE     WCTLSTAT  WSTATUS        INAME            INST_OK   N_EP  N_OK  ISTATUS     IMODE     DSC_STATUS     RTYPE     RSTAT    
                  192.168.1.20        15238        2024-07-27 14:59:35  GLOBAL    VALID     OPEN           SSPUDB1          OK        1     1     OPEN        STANDBY   DSC_OPEN       REALTIME  VALID    

                  EP INFO:
                  INST_IP             INST_PORT  INST_OK   INAME            ISTATUS     IMODE     DSC_SEQNO  DSC_CTL_NODE     RTYPE     RSTAT    FSEQ            FLSN            CSEQ            CLSN            DW_STAT_FLAG          
                  192.168.1.20        15236      OK        SSPUDB1          OPEN        STANDBY   0          0                REALTIME  VALID    8872            42644           8872            42644           NONE                  

                  DATABASE(SSPUDB1) APPLY INFO FROM (SSPUDB2), REDOS_PARALLEL_NUM (1):
                  DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[8884, 8884, 8885], (RLSN, SLSN, KLSN)[42644, 42644, 42645], N_TSK[0], TSK_MEM_USE[512]
                  REDO_LSN_ARR: (42644)

2.日志切换状态总结 

监听进程检测到数据库守护进程挂掉,数据库宕机。
通知实例2执行自动接管。
通知组(GRP1)当前活动的守护进程设置MID
开始使用实例SSPUDB2接管
通知守护进程SSPUDB2切换TAKEOVER状态
守护进程(SSPUDB2)状态切换 [OPEN-->TAKEOVER]
实例SSPUDB2开始执行 SP_SET_GLOBAL_DW_STATUS(0, 7)语句
实例SSPUDB2开始执行 SP_APPLY_KEEP_PKG() 语句
实例SSPUDB2开始执行ALTER DATABASE MOUNT语句
实例SSPUDB2开始执行ALTER DATABASE PRIMARY语句
修改所有实例归档为无效状态成功
实例SSPUDB2开始执行ALTER DATABASE OPEN FORCE语句
守护进程(SSPUDB2)状态切换 [TAKEOVER-->OPEN]
通知组(GRP1)的守护进程执行清理操作
SSPUDB1 启动后自动变为 STANDBY; 
守护进程(SSPUDB2)状态切换 [OPEN-->RECOVERY]
检测到实例(SSPUDB1)可恢复,执行恢复流程
开始向实例(SSPUDB1)发送归档日志
检测到实例(SSPUDB1)发送归档成功,设置为当前恢复实例
向实例(SSPUDB1)发送归档日志成功,实例(SSPUDB2)转入suspend状态
发送归档完毕,设置实例(SSPUDB1)归档有效
守护进程(SSPUDB2)状态切换 [RECOVERY-->OPEN]

3.数据库状态检查 

--SSPUDB2 状态检查 
SQL> conn sspudb/sspudb123456@192.168.1.21:25236

服务器[192.168.1.21:25236]:处于主库打开状态
登录使用时间 : 3.506(ms)
SQL> 
SQL> select role$,status$ from v$database; 

行号     ROLE$       STATUS$    
---------- ----------- -----------
1          1           4
已用时间: 1.021(毫秒). 执行号:800.

--SSPUDB1状态检查
SQL> conn sspudb/sspudb123456@192.168.1.20:15236

服务器[192.168.1.20:15236]:处于备库打开状态
登录使用时间 : 4.205(ms)
SQL> select name,status$,role$ from v$database; 

行号     NAME   STATUS$     ROLE$      
---------- ------ ----------- -----------
1          sspudb 4           1

已用时间: 2.483(毫秒). 执行号:900.

4.主库写入数据验证 

SQL> create table sspu_tab1 (id int,name varchar(20));
操作已执行
已用时间: 9.315(毫秒). 执行号:901.
SQL> insert into sspu_tab1 values(2,'xsq1'),(1,'xsq2');
影响行数 2

已用时间: 0.863(毫秒). 执行号:902.
SQL> commit; 
操作已执行
已用时间: 1.881(毫秒). 执行号:903.
SQL> select * from sspu_tab1;

行号     ID          NAME
---------- ----------- ----
1          2           xsq1
2          1           xsq2

已用时间: 0.857(毫秒). 执行号:904.
--从库检查 
SQL> conn sspudb/sspudb123456@192.168.1.20:15236

服务器[192.168.1.20:15236]:处于备库打开状态
登录使用时间 : 3.484(ms)
SQL> select * from sspu_tab1;   

行号     ID          NAME
---------- ----------- ----
1          2           xsq1
2          1           xsq2

已用时间: 4.890(毫秒). 执行号:100.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值