达梦数据库共享存储集群DSC故障测试

集群分布

两节点DSC集群

IP规划:

主机名服务ip心跳ip实例名节点用途
sds-part-dmdb01192.168.157.100192.168.156.100DSC1DSC节点1
sds-part-dmdb02192.168.157.101192.168.156.101DSC2DSC节点2

端口规划:

实例名实例端口MAL系统端口CSS端口ASM端口ASM的MAL端口DCR检查实例端口
DSC1523692369341935172369741
DSC2523692369341935172369741

测试过程详细记录

集群01

1.1测试01模式控制节点DSC节点进程kill

监视器检查各节点状态
> 192.168.157.101 监视器状态: 
> [dmdba@sds-part-dmdb02 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini         [monitor]         2021-08-30
> 14:37:07: CSS MONITOR V8 [monitor]         2021-08-30 14:37:07: CSS
> MONITOR SYSTEM IS READY.
> 
> [monitor]         2021-08-30 14:37:07: Wait CSS Control Node
> choosed... [monitor]         2021-08-30 14:37:08: Wait CSS Control
> Node choosed succeed.
> 
> show
> 
> monitor current time:2021-08-30 14:37:10, n_group:3
> =================== group[name = CSS, seq = 0, type = CSS, Control Node = 0] ========================================
> 
> [CSS1] auto check = TRUE, global info: [ASM1] auto restart = TRUE
> [DSC1] auto restart = TRUE
> 
> [CSS2] auto check = TRUE, global info: [ASM2] auto restart = TRUE
> [DSC2] auto restart = TRUE
> 
> ep:     css_time               inst_name     seqno     port    mode   
> inst_status        vtd_status   is_ok        active       guid        
> ts              
>         2021-08-30 14:37:13    CSS1          0         9341    Control Node OPEN               WORKING      OK           TRUE        
> 698495724         698496733       
>         2021-08-30 14:37:13    CSS2          1         9341    Normal Node  OPEN               WORKING      OK           TRUE        
> 698495623         698496630       
> 
> =================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================
> 
> n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)
> 
> sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL
> 
> crash process over flag is TRUE ep:     css_time              
> inst_name     seqno     port    mode         inst_status       
> vtd_status   is_ok        active       guid              ts           
> 
>   2021-08-30 14:37:13    ASM1          0         9351    Normal Node  OPEN               WORKING      OK           TRUE        
> 698501852         698502840       
>         2021-08-30 14:37:13    ASM2          1         9351    Control Node OPEN               WORKING      OK           TRUE        
> 698501748         698502733       
> 
> =================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================
> 
> n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)
> 
> sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL
> 
> crash process over flag is TRUE ep:     css_time              
> inst_name     seqno     port    mode         inst_status       
> vtd_status   is_ok        active       guid              ts           
> 
>2021-08-30 14:37:13    DSC1          0         5236    Control Node OPEN               WORKING      OK           TRUE        
> 1834441012        1834441945      
>         2021-08-30 14:37:13    DSC2          1         5236    Normal Node  OPEN               WORKING      OK    		TRUE         1834440664 
> 1834441594
执行命令并检查

在节点192.168.157.100 kil -9 控制节点上数据库实例服务的进程ID

192.168.157.100

[dmdba@sds-part-dmdb01 bin]$ ps aux|grep dmserver
dmdba     150323  3.1  2.8 75648456 3783828 ?    Ssl  14:21   0:49 /dm/dmdbms/bin/dmserver path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini
dmdba     171821  0.0  0.0 112728   960 pts/0    S+   14:47   0:00 grep --color=auto dmserver
[dmdba@sds-part-dmdb01 bin]$ kill -9 150323
观察监视器显示

[CSS1] [DB]: 设置命令[LINK_CHECK], 目标站点 DSC1[0], 命令序号[24]
[CSS1] [DB]: 设置命令[LINK_CHECK], 目标站点 DSC2[1], 命令序号[25]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0] [CSS1]
[DB]: 设置命令[SYS HALT], 目标站点 DSC1[0], 命令序号[27] [CSS1] [DB]:
设置命令[NONE], 目标站点 DSC1[0], 命令序号[0] [CSS1] [DB]: 检测到EP
DSC1[0]故障在PROCESS_LINK_CHECK中 [CSS1] [DB]: 设置EP
DSC1[0]为故障EP [CSS1] [DB]: 设置EP DSC2[1]为控制节点 [CSS1]
[DB]: 设置命令[EP_CRASH], 目标站点 DSC2[1], 命令序号[29] [CSS1] [DB]:
设置命令[NONE], 目标站点 DSC2[1], 命令序号[0] [CSS1] [DB]:
命令[EP_CRASH]处理结束 [CSS1] [DB]: 设置命令[CMD CLEAR], 目标站点
DSC2[1], 命令序号[32] [CSS1] [CSS]: 设置命令[CONFIG VIP], 目标站点
CSS2[1], 命令序号[3] [CSS1] [DB]: 设置命令[CONFIG VIP], 目标站点
DSC2[1], 命令序号[37] [CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1],
命令序号[0] [CSS1] [DB]: 命令[CONFIG VIP]处理结束 [CSS1]
[CSS]: 重启本地DB实例,命令:[/dm/dmdbms/bin/dmserver
path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini] [CSS1]
[DB]: 设置EP DSC1[0]为故障重加入EP [CSS1] [DB]: 设置命令[START
NOTIFY], 目标站点 DSC1[0], 命令序号[42] [CSS1] [DB]: 设置命令[SUSPEND
EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[43] [CSS1] [DB]:
暂停工作线程结束 [CSS1] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC1[0],
命令序号[44] [CSS1] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC2[1],
命令序号[45] [CSS1] [DB]: 故障EP重新加入DSC结束 [CSS1]
[DB]: 设置命令[ERROR EP ADD], 目标站点 DSC1[0], 命令序号[47] [CSS1]
[DB]: 设置命令[ERROR EP ADD], 目标站点 DSC2[1], 命令序号[48] [CSS1]
[DB]: 故障EP重新加入DSC结束 [CSS1] [DB]: 设置命令[EP RECV], 目标站点
DSC2[1], 命令序号[50] [CSS1] [DB]: 故障EP恢复结束 [CSS1]
[DB]: 设置命令[EP START], 目标站点 DSC1[0], 命令序号[52] [CSS1] [DB]:
设置命令[EP START2], 目标站点 DSC1[0], 命令序号[54] [CSS1] [DB]:
设置命令[EP OPEN], 目标站点 DSC1[0], 命令序号[56] [CSS1] [DB]:
设置命令[NONE], 目标站点 DSC1[0], 命令序号[0] [CSS1] [DB]: 设置命令[RESUME
EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[58] [CSS1] [DB]:
继续工作线程结束 [CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1] [DB]: 设置命令[EP REAL OPEN], 目标站点 DSC1[0], 命令序号[60]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]

进程端口检查
[dmdba@sds-part-dmdb01 bin]$ ps aux|grep dmserver
dmdba     173982 12.2  2.3 75632040 3141380 ?    Ssl  14:50   0:07 /dm/dmdbms/bin/dmserver path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini
dmdba     175719  0.0  0.0 112728   960 pts/0    S+   14:51   0:00 grep --color=auto dmserver
[dmdba@sds-part-dmdb01 bin]$ netstat -tunlp|grep 5236
(No info could be read for "-p": geteuid()=1001 but you should be root.)
tcp6       0      0 :::5236  	
监视器状态

监视器内执行命令show
ep: css_time inst_name seqno
port mode inst_status vtd_status is_ok
active guid ts
2021-08-30 14:51:55 CSS1 0 9341 Control Node OPEN WORKING OK TRUE
698495724 698497615
2021-08-30 14:51:55 CSS2 1 9341 Normal Node OPEN WORKING OK TRUE
698495623 698497511

=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================

n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)

sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL

crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts

2021-08-30 14:51:55 ASM1 0 9351 Normal Node OPEN WORKING OK TRUE
698501852 698503721
2021-08-30 14:51:55 ASM2 1 9351 Control Node OPEN WORKING OK TRUE
698501748 698503615

=================== group[name = DSC, seq = 2, type = DB, Control Node = 1] ========================================

n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)

sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL

crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts

2021-08-30 14:51:55 DSC1 0 5236 Normal Node OPEN WORKING OK TRUE
1835811277 1835811357
2021-08-30 14:51:55 DSC2 1 5236 Control Node OPEN WORKING OK TRUE
1834440664 1834442475 | — |

测试结果

当在检测时间内,控制节点的DSC1实例进程丢失后,DSC集群自动将控制节点转移至正常节点、并将其mode变更为Control Node,并标记为故障节点,待检查时间周期(设置的60s)原集群尝试自动拉起故障节点,拉起成功后重新加入集群,如正常加入后节点状态恢复正常,变更为非控制节点,其mode变更为Normal
Node,进程与端口正常,服务登录正常。

1.2 测试02模式非控制节点DSC节点进程kill

监视器检查各节点状态

ep: css_time inst_name seqno port mode
inst_status vtd_status is_ok active guid
ts
2021-08-30 14:51:55 DSC1 0 5236 Normal Node OPEN WORKING OK TRUE
1835811277 1835811357
2021-08-30 14:51:55 DSC2 1 5236 Control Node OPEN WORKING OK TRUE
1834440664 1834442475

执行命令并检查
[dmdba@sds-part-dmdb01 bin]$ netstat -tunlp|grep 5236 (No info could be read for "-p": geteuid()=1001 but you should be root.) tcp6       0 0 :::5236                 :::*                    LISTEN      -        
[dmdba@sds-part-dmdb01 bin]$ kill -9 173982
观察监视器显示
[CSS1]             [DB]: 设置命令[LINK_CHECK], 目标站点 DSC1[0], 命令序号[64]
[CSS1]             [DB]: 设置命令[LINK_CHECK], 目标站点 DSC2[1], 命令序号[65]
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1]             [DB]: 设置命令[SYS HALT], 目标站点 DSC1[0], 命令序号[67]
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS1]             [DB]: 检测到EP DSC1[0]故障在PROCESS_LINK_CHECK中
[CSS1]             [DB]: 设置EP DSC1[0]为故障EP
[CSS1]             [DB]: 设置命令[EP_CRASH], 目标站点 DSC2[1], 命令序号[69]
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1]             [DB]: 命令[EP_CRASH]处理结束
[CSS1]             [DB]: 设置命令[CMD CLEAR], 目标站点 DSC2[1], 命令序号[72]
[CSS1]             [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS2[1], 命令序号[4]
[CSS1]             [CSS]: 设置命令[NONE], 目标站点 CSS2[1], 命令序号[0]
[CSS1]             [DB]: 设置命令[CONFIG VIP], 目标站点 DSC2[1], 命令序号[77]
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1]             [DB]: 命令[CONFIG VIP]处理结束
[CSS1]             [CSS]: 重启本地DB实例,命令:[/dm/dmdbms/bin/dmserver  path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini]
[CSS1]             [DB]: 设置EP DSC1[0]为故障重加入EP
[CSS1]             [DB]: 设置命令[START NOTIFY], 目标站点 DSC1[0], 命令序号[82]
[CSS1]             [DB]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[83]
[CSS1]             [DB]: 暂停工作线程结束
[CSS1]             [DB]: 设置命令[DCR_LOAD], 目标站点 DSC1[0], 命令序号[84]
[CSS1]             [DB]: 设置命令[DCR_LOAD], 目标站点 DSC2[1], 命令序号[85]
[CSS1]             [DB]: 故障EP重新加入DSC结束
[CSS1]             [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC1[0], 命令序号[87]
[CSS1]             [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC2[1], 命令序号[88]
[CSS1]             [DB]: 故障EP重新加入DSC结束
[CSS1]             [DB]: 设置命令[EP RECV], 目标站点 DSC2[1], 命令序号[90]
[CSS1]             [DB]: 故障EP恢复结束
[CSS1]             [DB]: 设置命令[EP START], 目标站点 DSC1[0], 命令序号[92]
[CSS1]             [DB]: 设置命令[EP START2], 目标站点 DSC1[0], 命令序号[94]
[CSS1]             [DB]: 设置命令[EP OPEN], 目标站点 DSC1[0], 命令序号[96]
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS1]             [DB]: 设置命令[RESUME EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[98]
[CSS1]             [DB]: 继续工作线程结束
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1]             [DB]: 设置命令[EP REAL OPEN], 目标站点 DSC1[0], 命令序号[100]
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]	
进程端口检查
[dmdba@sds-part-dmdb01 bin]$ netstat -tunlp|grep 5236
(No info could be read for "-p": geteuid()=1001 but you should be root.)
tcp6       0      0 :::5236                 :::*                    LISTEN      -                   
[dmdba@sds-part-dmdb01 bin]$ ps aux|grep dmserver    
dmdba     179948  3.2  2.5 75632040 3318984 ?    Ssl  14:56   0:28 /dm/dmdbms/bin/dmserver path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini
dmdba     192488  0.0  0.0 112728   960 pts/0    S+   15:11   0:00 grep --color=auto dmserver		
监视器状态

ok_ep_arr(index, seqno): (0, 0) (1, 1)

sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL

crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts

2021-08-30 15:13:04 DSC1 0 5236 Normal Node OPEN WORKING OK TRUE
1836113922 1836114887
2021-08-30 15:13:04 DSC2 1 5236 Control Node OPEN WORKING OK TRUE 1834440664
1834443743

测试结果

当在检测时间内,非控制节点的DSC1实例进程丢失后,并标记为故障节点,待检查时间周期(设置的60s)原集群尝试自动拉起故障节点,拉起成功后重新加入集群,如正常加入后节点状态恢复正常,进程与端口正常,服务登录正常。

1.3 测试03模式控制节点服务器重启

监视器状态显示(重启前)

dmdba@sds-part-dmdb04 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor] 2021-08-30 16:28:50: CSS MONITOR V8 [monitor]
2021-08-30 16:28:50: CSS MONITOR SYSTEM IS READY.

[monitor] 2021-08-30 16:28:50: Wait CSS Control Node
choosed… [monitor] 2021-08-30 16:28:51: Wait CSS Control
Node choosed succeed.

show

monitor current time:2021-08-30 16:28:58, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 0] ========================================

[CSS3] auto check = TRUE, global info: [ASM3] auto restart = TRUE
[DSC3] auto restart = TRUE

[CSS4] auto check = TRUE, global info: [ASM4] auto restart = TRUE
[DSC4] auto restart = TRUE

ep: css_time inst_name seqno port mode
inst_status vtd_status is_ok active guid
ts
2021-08-30 16:28:59 CSS3 0 9341 Control Node OPEN WORKING OK TRUE
699508042 699512160
2021-08-30 16:28:59 CSS4 1 9341 Normal Node OPEN WORKING OK TRUE
699513005 699517103

=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================

n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)

sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL

crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts

2021-08-30 16:28:59 ASM3 0 9351 Normal Node OPEN WORKING OK TRUE
699515888 699519979
2021-08-30 16:28:59 ASM4 1 9351 Control Node OPEN WORKING OK TRUE
699519138 699523215

=================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================

n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)

sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL

crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts

2021-08-30 16:28:59 DSC3 0 5236 Control Node OPEN WORKING OK TRUE
1837285878 1837289913
2021-08-30 16:28:59 DSC4 1 5236 Normal Node OPEN WORKING OK TRUE
1837995638 1837998773

监视器状态显示(重启中)
[dmdba@sds-part-dmdb04 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor]         2021-08-30 16:51:20: CSS MONITOR V8
[monitor]         2021-08-30 16:51:23: CSS MONITOR SYSTEM IS READY.

[monitor]         2021-08-30 16:51:23: Wait CSS Control Node choosed...
[monitor]         2021-08-30 16:51:29: Wait CSS Control Node choosed failed, if dmcss has startuped and ini configured correctly, please wait a little more before execute command.

[CSS4]             [CSS]: 监测到控制节点关闭
[CSS4]             [CSS]: 设置EP [255]为控制节点
[monitor]         2021-08-30 16:52:27: 检测到CSS控制节点发生变化,由CSS[255]变为CSS[1]
[CSS4]             [CSS]: 设置EP CSS4[1]为控制节点
[CSS4]             [ASM]: 设置命令[SYS HALT], 目标站点 ASM3[0], 命令序号[357]
[CSS4]             [DB]: 设置命令[SYS HALT], 目标站点 DSC3[0], 命令序号[625]
[CSS4]             [ASM]: 设置EP ASM3[0]为故障EP
[CSS4]             [DB]: 检测到EP DSC3[0]故障在PROCESS_OPEN中
[CSS4]             [ASM]: 检测到EP ASM3[0]故障在PROCESS_OPEN中
[CSS4]             [DB]: 设置EP DSC3[0]为故障EP
[CSS4]             [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM4[1], 命令序号[360]
[CSS4]             [DB]: 设置EP DSC4[1]为控制节点
[CSS4]             [DB]: 设置命令[EP_CRASH], 目标站点 DSC4[1], 命令序号[627]
[CSS4]             [ASM]: 暂停工作线程结束
[CSS4]             [ASM]: 设置命令[CRASH RECV], 目标站点 ASM4[1], 命令序号[362]
[CSS4]             [ASM]: 设置命令[NONE], 目标站点 ASM4[1], 命令序号[0]
[CSS4]             [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM4[1], 命令序号[364]
[CSS4]             [ASM]: 设置命令[NONE], 目标站点 ASM4[1], 命令序号[0]
[CSS4]             [ASM]: 命令[RESUME EP WORKER THREAD]处理结束
[CSS4]             [DB]: 设置命令[NONE], 目标站点 DSC4[1], 命令序号[0]
[CSS4]             [DB]: 命令[EP_CRASH]处理结束
[CSS4]             [DB]: 设置命令[CMD CLEAR], 目标站点 DSC4[1], 命令序号[630]
[CSS4]             [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS3[0], 命令序号[6]
[CSS4]             [CSS]: 设置命令[NONE], 目标站点 CSS4[1], 命令序号[0]
[CSS4]             [DB]: 设置命令[CONFIG VIP], 目标站点 DSC4[1], 命令序号[635]
[CSS4]             [DB]: 设置命令[NONE], 目标站点 DSC4[1], 命令序号[0]
[CSS4]             [DB]: 命令[CONFIG VIP]处理结束
[CSS4]             [DB]: 命令[CONFIG VIP]处理结束
show

monitor current time:2021-08-30 16:55:51, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 1] ========================================

[CSS3] auto check = FALSE, global info:
Connect to [CSS3] failed, please check the network or the CSSM_CSS_IP config in [/dm/config/dmcssm.ini] .
[CSS4] auto check = TRUE, global info:
[ASM4] auto restart = TRUE
[DSC4] auto restart = TRUE

ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-08-30 16:55:50    CSS3          0         9341    Normal Node  OPEN               WORKING      OK           FALSE        699508042         699513441       
        2021-08-30 16:55:50    CSS4          1         9341    Control Node OPEN               WORKING      OK           TRUE         699513005         699518715       

=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================

n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-08-30 16:55:50    ASM3          0         9351    Normal Node  OPEN               SYSHALT      ERROR        FALSE        699515888         699521261       
        2021-08-30 16:55:50    ASM4          1         9351    Control Node OPEN               WORKING      OK           TRUE         699519138         699524827       

=================== group[name = DSC, seq = 2, type = DB, Control Node = 1] ========================================

n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-08-30 16:55:50    DSC3          0         5236    Normal Node  OPEN               WORKING      ERROR        FALSE        1837285878        1837291195      
        2021-08-30 16:55:50    DSC4          1         5236    Control Node OPEN               WORKING      OK           TRUE         1837995638        1838000386      
登录集群内未重启节点验证
[dmdba@sds-part-dmdb04 bin]$ ./disql
disql V8
用户名:
密码:

服务器[LOCALHOST:5236]:处于普通打开状态
登录使用时间 : 17.442(ms)
SQL> 


SQL> SELECT * FROM V$INSTANCE;
行号     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME       SVR_VERSION                DB_VERSION          START_TIME          STATUS$ MODE$  OGUID      
---------- ---- ------------- --------------- --------------- -------------------------- ------------------- ------------------- ------- ------ -----------
           DSC_SEQNO   DSC_ROLE    
           ----------- ------------
1          DSC4 DSC4          2               sds-part-dmdb04 DM Database Server x64 V8  DB Version: 0x7000c 2021-08-30 15:36:35 OPEN    NORMAL 0
           1           Control node


已用时间: 14.900(毫秒). 执行号:0.	
监视器状态显示(重启后)
监视器显示
[dmdba@sds-part-dmdb01 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor]         2021-09-01 10:16:15: CSS MONITOR V8
[monitor]         2021-09-01 10:16:15: CSS MONITOR SYSTEM IS READY.

[monitor]         2021-09-01 10:16:15: Wait CSS Control Node choosed...
[CSS1]             [CSS]: 监测到控制节点关闭
[CSS1]             [CSS]: 设置EP [255]为控制节点
[monitor]         2021-09-01 10:16:21: Wait CSS Control Node choosed failed, if dmcss has startuped and ini configured correctly, please wait a little more before execute command.

[monitor]         2021-09-01 10:16:24: 检测到CSS控制节点发生变化,由CSS[255]变为CSS[0]
[CSS1]             [CSS]: 设置EP CSS1[0]为控制节点
[CSS1]             [ASM]: 设置命令[SYS HALT], 目标站点 ASM2[1], 命令序号[263]
[CSS1]             [ASM]: 设置EP ASM2[1]为故障EP
[CSS1]             [ASM]: 检测到EP ASM2[1]故障在PROCESS_OPEN中
[CSS1]             [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM1[0], 命令序号[266]
[CSS1]             [DB]: 检测到EP DSC2[1]故障在PROCESS_OPEN中
[CSS1]             [DB]: 设置EP DSC2[1]为故障EP
[CSS1]             [DB]: 设置EP DSC1[0]为控制节点
[CSS1]             [DB]: 设置命令[EP_CRASH], 目标站点 DSC1[0], 命令序号[341]
[CSS1]             [ASM]: 暂停工作线程结束
[CSS1]             [ASM]: 设置EP ASM1[0]为控制节点
[CSS1]             [ASM]: 设置命令[CRASH RECV], 目标站点 ASM1[0], 命令序号[268]
[CSS1]             [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS1]             [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM1[0], 命令序号[270]
[CSS1]             [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS1]             [ASM]: 命令[RESUME EP WORKER THREAD]处理结束
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS1]             [DB]: 命令[EP_CRASH]处理结束
[CSS1]             [DB]: 设置命令[CMD CLEAR], 目标站点 DSC1[0], 命令序号[344]
[CSS1]             [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS2[1], 命令序号[6]
[CSS1]             [DB]: 设置命令[CONFIG VIP], 目标站点 DSC1[0], 命令序号[349]
[CSS1]             [DB]: 设置命令[NONE], 目标站点


[CSS2] auto check = FALSE, global info:
Connect to [CSS2] failed, please check the network or the CSSM_CSS_IP config in [/dm/config/dmcssm.ini] .

ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-09-01 10:17:20    CSS1          0         9341    Control Node OPEN               WORKING      OK           TRUE         742822335         742823260       
        2021-09-01 10:17:20    CSS2          1         9341    Normal Node  OPEN               WORKING      OK           FALSE        698495623         698653451       

=================== group[name = ASM, seq = 1, type = ASM, Control Node = 0] ========================================

n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 0)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-09-01 10:17:20    ASM1          0         9351    Control Node OPEN               WORKING      OK           TRUE         742828461         742829364       
        2021-09-01 10:17:20    ASM2          1         9351    Normal Node  OPEN               SYSHALT      ERROR        FALSE        698501748         698659643       

=================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================

n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 0)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-09-01 10:17:20    DSC1          0         5236    Control Node OPEN               WORKING      OK           TRUE         1959281632        1959282480      
        2021-09-01 10:17:20    DSC2          1         5236    Normal Node  OPEN               WORKING      ERROR        FALSE        1834440664        1834598502      

=================================================================================================================
33
[dmdba@sds-part-dmdb02 bin]$ ./DmCSSServicecss2 start
Starting DmCSSServicecss2: ok


启动后:

[CSS2]             [CSS]: 设置EP CSS1[0]为控制节点
[CSS2]             [CSS]: 重启本地ASM实例,命令:[/dm/dmdbms/bin/dmasmsvr  dcr_ini=/dm/config/dmdcr.ini]
[CSS1]             [ASM]: 设置EP ASM2[1]为故障重加入EP
[CSS1]             [ASM]: 设置命令[START NOTIFY], 目标站点 ASM2[1], 命令序号[274]
[CSS1]             [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM1[0], 命令序号[276]
[CSS1]             [ASM]: 暂停工作线程结束
[CSS1]             [ASM]: 设置命令[ERROR EP ADD], 目标站点 ASM1[0], 命令序号[277]
[CSS1]             [ASM]: 设置命令[ERROR EP ADD], 目标站点 ASM2[1], 命令序号[278]
[CSS1]             [ASM]: 故障EP重新加入DSC结束
[CSS1]             [ASM]: 设置命令[EP RECV], 目标站点 ASM1[0], 命令序号[280]
[CSS1]             [ASM]: 故障EP恢复结束
[CSS1]             [ASM]: 设置命令[EP START], 目标站点 ASM2[1], 命令序号[282]
[CSS1]             [ASM]: 设置命令[EP OPEN], 目标站点 ASM2[1], 命令序号[284]
[CSS1]             [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS1]             [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM1[0], 命令序号[286]
[CSS1]             [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS1]             [ASM]: 继续工作线程结束
[CSS1]             [ASM]: 设置命令[EP REAL OPEN], 目标站点 ASM2[1], 命令序号[288]
[CSS1]             [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS1]             [ASM]: 设置命令[LINK_CHECK], 目标站点 ASM1[0], 命令序号[292]
[CSS1]             [ASM]: 设置命令[LINK_CHECK], 目标站点 ASM2[1], 命令序号[293]
[CSS1]             [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS1]             [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS2]             [CSS]: 重启本地DB实例,命令:[/dm/dmdbms/bin/dmserver  path=/dm/config/dsc2/dm.ini dcr_ini=/dm/config/dmdcr.ini]
[CSS1]             [DB]: 设置EP DSC2[1]为故障重加入EP
[CSS1]             [DB]: 设置命令[START NOTIFY], 目标站点 DSC2[1], 命令序号[354]
[CSS1]             [DB]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 DSC1[0], 命令序号[355]
[CSS1]             [DB]: 暂停工作线程结束
[CSS1]             [DB]: 设置命令[DCR_LOAD], 目标站点 DSC1[0], 命令序号[356]
[CSS1]             [DB]: 设置命令[DCR_LOAD], 目标站点 DSC2[1], 命令序号[357]
[CSS1]             [DB]: 故障EP重新加入DSC结束
[CSS1]             [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC1[0], 命令序号[359]
[CSS1]             [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC2[1], 命令序号[360]
[CSS1]             [DB]: 故障EP重新加入DSC结束
[CSS1]             [DB]: 设置命令[EP RECV], 目标站点 DSC1[0], 命令序号[362]
[CSS1]             [DB]: 故障EP恢复结束
[CSS1]             [DB]: 设置命令[EP START], 目标站点 DSC2[1], 命令序号[364]
[CSS1]             [DB]: 设置命令[EP START2], 目标站点 DSC2[1], 命令序号[366]
[CSS1]             [DB]: 设置命令[EP OPEN], 目标站点 DSC2[1], 命令序号[368]
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1]             [DB]: 设置命令[RESUME EP WORKER THREAD], 目标站点 DSC1[0], 命令序号[370]
[CSS1]             [DB]: 继续工作线程结束
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS1]             [DB]: 设置命令[EP REAL OPEN], 目标站点 DSC2[1], 命令序号[372]
[CSS1]             [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]

=================== group[name = CSS, seq = 0, type = CSS, Control Node = 0] ========================================

[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE

[CSS2] auto check = TRUE, global info:
[ASM2] auto restart = TRUE
[DSC2] auto restart = TRUE


ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-09-01 10:24:51    CSS1          0         9341    Control Node OPEN               WORKING      OK           TRUE         742822335         742823711       
        2021-09-01 10:24:51    CSS2          1         9341    Normal Node  OPEN               WORKING      OK           TRUE         743176646         743176767       

=================== group[name = ASM, seq = 1, type = ASM, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-09-01 10:24:51    ASM1          0         9351    Control Node OPEN               WORKING      OK           TRUE         742828461         742829815       
        2021-09-01 10:24:51    ASM2          1         9351    Normal Node  OPEN               WORKING      OK           TRUE         743182775         743182874       

=================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-09-01 10:24:51    DSC1          0         5236    Control Node OPEN               WORKING      OK           TRUE         1959281632        1959282931      
        2021-09-01 10:24:51    DSC2          1         5236    Normal Node  OPEN               WORKING      OK           TRUE         1960283745        1960283788  
验证
SQL> SELECT * FROM V$DATABASE;

行号     NAME CREATE_TIME         ARCH_MODE LAST_CKPT_TIME STATUS$     ROLE$       MAX_SIZE             TOTAL_SIZE           DSC_NODES   OPEN_COUNT 
---------- ---- ------------------- --------- -------------- ----------- ----------- -------------------- -------------------- ----------- -----------
           STARTUP_COUNT        LAST_STARTUP_TIME  
           -------------------- -------------------
1          DSC  2021-08-25 15:52:39 Y         NULL           4           0           0                    163840               2           66
           5                    2021-08-30 14:22:05

1.4 测试04模式控制节点服务器重启

监视器状态显示(重启前)
dmdba@sds-part-dmdb04 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor]         2021-08-30 16:28:50: CSS MONITOR V8
[monitor]         2021-08-30 16:28:50: CSS MONITOR SYSTEM IS READY.

[monitor]         2021-08-30 16:28:50: Wait CSS Control Node choosed...
[monitor]         2021-08-30 16:28:51: Wait CSS Control Node choosed succeed.

show

monitor current time:2021-08-30 16:28:58, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 0] ========================================

[CSS3] auto check = TRUE, global info:
[ASM3] auto restart = TRUE
[DSC3] auto restart = TRUE

[CSS4] auto check = TRUE, global info:
[ASM4] auto restart = TRUE
[DSC4] auto restart = TRUE


ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-08-30 16:28:59    CSS3          0         9341    Control Node OPEN               WORKING      OK           TRUE         699508042         699512160       
        2021-08-30 16:28:59    CSS4          1         9341    Normal Node  OPEN               WORKING      OK           TRUE         699513005         699517103       

=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-08-30 16:28:59    ASM3          0         9351    Normal Node  OPEN               WORKING      OK           TRUE         699515888         699519979       
        2021-08-30 16:28:59    ASM4          1         9351    Control Node OPEN               WORKING      OK           TRUE         699519138         699523215       

=================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-08-30 16:28:59    DSC3          0         5236    Control Node OPEN               WORKING      OK           TRUE         1837285878        1837289913      
        2021-08-30 16:28:59    DSC4          1         5236    Normal Node  OPEN               WORKING      OK           TRUE         1837995638        1837998773      
监视器状态显示(重启中)
[dmdba@sds-part-dmdb04 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor]         2021-08-30 16:51:20: CSS MONITOR V8
[monitor]         2021-08-30 16:51:23: CSS MONITOR SYSTEM IS READY.

[monitor]         2021-08-30 16:51:23: Wait CSS Control Node choosed...
[monitor]         2021-08-30 16:51:29: Wait CSS Control Node choosed failed, if dmcss has startuped and ini configured correctly, please wait a little more before execute command.

[CSS4]             [CSS]: 监测到控制节点关闭
[CSS4]             [CSS]: 设置EP [255]为控制节点
[monitor]         2021-08-30 16:52:27: 检测到CSS控制节点发生变化,由CSS[255]变为CSS[1]
[CSS4]             [CSS]: 设置EP CSS4[1]为控制节点
[CSS4]             [ASM]: 设置命令[SYS HALT], 目标站点 ASM3[0], 命令序号[357]
[CSS4]             [DB]: 设置命令[SYS HALT], 目标站点 DSC3[0], 命令序号[625]
[CSS4]             [ASM]: 设置EP ASM3[0]为故障EP
[CSS4]             [DB]: 检测到EP DSC3[0]故障在PROCESS_OPEN中
[CSS4]             [ASM]: 检测到EP ASM3[0]故障在PROCESS_OPEN中
[CSS4]             [DB]: 设置EP DSC3[0]为故障EP
[CSS4]             [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM4[1], 命令序号[360]
[CSS4]             [DB]: 设置EP DSC4[1]为控制节点
[CSS4]             [DB]: 设置命令[EP_CRASH], 目标站点 DSC4[1], 命令序号[627]
[CSS4]             [ASM]: 暂停工作线程结束
[CSS4]             [ASM]: 设置命令[CRASH RECV], 目标站点 ASM4[1], 命令序号[362]
[CSS4]             [ASM]: 设置命令[NONE], 目标站点 ASM4[1], 命令序号[0]
[CSS4]             [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM4[1], 命令序号[364]
[CSS4]             [ASM]: 设置命令[NONE], 目标站点 ASM4[1], 命令序号[0]
[CSS4]             [ASM]: 命令[RESUME EP WORKER THREAD]处理结束
[CSS4]             [DB]: 设置命令[NONE], 目标站点 DSC4[1], 命令序号[0]
[CSS4]             [DB]: 命令[EP_CRASH]处理结束
[CSS4]             [DB]: 设置命令[CMD CLEAR], 目标站点 DSC4[1], 命令序号[630]
[CSS4]             [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS3[0], 命令序号[6]
[CSS4]             [CSS]: 设置命令[NONE], 目标站点 CSS4[1], 命令序号[0]
[CSS4]             [DB]: 设置命令[CONFIG VIP], 目标站点 DSC4[1], 命令序号[635]
[CSS4]             [DB]: 设置命令[NONE], 目标站点 DSC4[1], 命令序号[0]
[CSS4]             [DB]: 命令[CONFIG VIP]处理结束
[CSS4]             [DB]: 命令[CONFIG VIP]处理结束
show

monitor current time:2021-08-30 16:55:51, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 1] ========================================

[CSS3] auto check = FALSE, global info:
Connect to [CSS3] failed, please check the network or the CSSM_CSS_IP config in [/dm/config/dmcssm.ini] .
[CSS4] auto check = TRUE, global info:
[ASM4] auto restart = TRUE
[DSC4] auto restart = TRUE

ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-08-30 16:55:50    CSS3          0         9341    Normal Node  OPEN               WORKING      OK           FALSE        699508042         699513441       
        2021-08-30 16:55:50    CSS4          1         9341    Control Node OPEN               WORKING      OK           TRUE         699513005         699518715       

=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================

n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-08-30 16:55:50    ASM3          0         9351    Normal Node  OPEN               SYSHALT      ERROR        FALSE        699515888         699521261       
        2021-08-30 16:55:50    ASM4          1         9351    Control Node OPEN               WORKING      OK           TRUE         699519138         699524827       

=================== group[name = DSC, seq = 2, type = DB, Control Node = 1] ========================================

n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-08-30 16:55:50    DSC3          0         5236    Normal Node  OPEN               WORKING      ERROR        FALSE        1837285878        1837291195      
        2021-08-30 16:55:50    DSC4          1         5236    Control Node OPEN               WORKING      OK           TRUE         1837995638        1838000386  
登录集群内未重启节点验证
[dmdba@sds-part-dmdb04 bin]$ ./disql
disql V8
用户名:
密码:

服务器[LOCALHOST:5236]:处于普通打开状态
登录使用时间 : 17.442(ms)
SQL> 


SQL> SELECT * FROM V$INSTANCE;
行号     NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME       SVR_VERSION                DB_VERSION          START_TIME          STATUS$ MODE$  OGUID      
---------- ---- ------------- --------------- --------------- -------------------------- ------------------- ------------------- ------- ------ -----------
           DSC_SEQNO   DSC_ROLE    
           ----------- ------------
1          DSC4 DSC4          2               sds-part-dmdb04 DM Database Server x64 V8  DB Version: 0x7000c 2021-08-30 15:36:35 OPEN    NORMAL 0
           1           Control node


已用时间: 14.900(毫秒). 执行号:0.	
监视器状态显示(重启后)
[[A[dmdba@sds-part-dmdb02 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[[A[monitor]         2021-09-01 10:30:20: CSS MONITOR V8
[monitor]         2021-09-01 10:30:23: CSS MONITOR SYSTEM IS READY.

[monitor]         2021-09-01 10:30:23: Wait CSS Control Node choosed...
[monitor]         2021-09-01 10:30:29: Wait CSS Control Node choosed failed, if dmcss has startuped and ini configured correctly, please wait a little more before execute command.

[CSS2]             [CSS]: 监测到控制节点关闭
[CSS2]             [CSS]: 设置EP [255]为控制节点
[monitor]         2021-09-01 10:32:05: 检测到CSS控制节点发生变化,由CSS[255]变为CSS[1]
[CSS2]             [CSS]: 设置EP CSS2[1]为控制节点
[CSS2]             [ASM]: 设置命令[SYS HALT], 目标站点 ASM1[0], 命令序号[396]
[CSS2]             [ASM]: 设置EP ASM1[0]为故障EP
[CSS2]             [DB]: 设置命令[SYS HALT], 目标站点 DSC1[0], 命令序号[475]
[CSS2]             [DB]: 检测到EP DSC1[0]故障在PROCESS_OPEN中
[CSS2]             [DB]: 设置EP DSC1[0]为故障EP
[CSS2]             [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM2[1], 命令序号[399]
[CSS2]             [DB]: 设置EP DSC2[1]为控制节点
[CSS2]             [DB]: 设置命令[EP_CRASH], 目标站点 DSC2[1], 命令序号[477]
[CSS2]             [ASM]: 暂停工作线程结束
[CSS2]             [ASM]: 设置EP ASM2[1]为控制节点
[CSS2]             [ASM]: 设置命令[CRASH RECV], 目标站点 ASM2[1], 命令序号[401]
[CSS2]             [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS2]             [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM2[1], 命令序号[403]
[CSS2]             [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS2]             [ASM]: 命令[RESUME EP WORKER THREAD]处理结束
[CSS2]             [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS2]             [DB]: 命令[EP_CRASH]处理结束
[CSS2]             [DB]: 设置命令[CMD CLEAR], 目标站点 DSC2[1], 命令序号[480]
[CSS2]             [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS1[0], 命令序号[6]
[CSS2]             [DB]: 设置命令[CONFIG VIP], 目标站点 DSC2[1], 命令序号[485]
[CSS2]             [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS2]             [DB]: 命令[CONFIG VIP]处理结束
[CSS1]             [CSS]: 设置EP CSS2[1]为控制节点
[CSS1]             [CSS]: 重启本地ASM实例,命令:[/dm/dmdbms/bin/dmasmsvr  dcr_ini=/dm/config/dmdcr.ini]
[CSS2]             [ASM]: 设置EP ASM1[0]为故障重加入EP
[CSS2]             [ASM]: 设置命令[START NOTIFY], 目标站点 ASM1[0], 命令序号[407]
[CSS2]             [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM2[1], 命令序号[409]
[CSS2]             [ASM]: 暂停工作线程结束
[CSS2]             [ASM]: 设置命令[ERROR EP ADD], 目标站点 ASM1[0], 命令序号[410]
[CSS2]             [ASM]: 设置命令[ERROR EP ADD], 目标站点 ASM2[1], 命令序号[411]
[CSS2]             [ASM]: 故障EP重新加入DSC结束
[CSS2]             [ASM]: 设置命令[EP RECV], 目标站点 ASM2[1], 命令序号[413]
[CSS2]             [ASM]: 故障EP恢复结束
[CSS2]             [ASM]: 设置命令[EP START], 目标站点 ASM1[0], 命令序号[415]
[CSS2]             [ASM]: 设置命令[EP OPEN], 目标站点 ASM1[0], 命令序号[417]
[CSS2]             [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS2]             [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM2[1], 命令序号[419]
[CSS2]             [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS2]             [ASM]: 继续工作线程结束
[CSS2]             [ASM]: 设置命令[EP REAL OPEN], 目标站点 ASM1[0], 命令序号[421]
[CSS2]             [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS2]             [ASM]: 设置命令[LINK_CHECK], 目标站点 ASM1[0], 命令序号[425]
[CSS2]             [ASM]: 设置命令[LINK_CHECK], 目标站点 ASM2[1], 命令序号[426]
[CSS2]             [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS2]             [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS1]             [CSS]: 重启本地DB实例,命令:[/dm/dmdbms/bin/dmserver  path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini]
[CSS2]             [DB]: 设置EP DSC1[0]为故障重加入EP
[CSS2]             [DB]: 设置命令[START NOTIFY], 目标站点 DSC1[0], 命令序号[490]
[CSS2]             [DB]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[491]
[CSS2]             [DB]: 暂停工作线程结束
[CSS2]             [DB]: 设置命令[DCR_LOAD], 目标站点 DSC1[0], 命令序号[492]
[CSS2]             [DB]: 设置命令[DCR_LOAD], 目标站点 DSC2[1], 命令序号[493]
[CSS2]             [DB]: 故障EP重新加入DSC结束
[CSS2]             [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC1[0], 命令序号[495]
[CSS2]             [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC2[1], 命令序号[496]
[CSS2]             [DB]: 故障EP重新加入DSC结束
[CSS2]             [DB]: 设置命令[EP RECV], 目标站点 DSC2[1], 命令序号[498]
[CSS2]             [DB]: 故障EP恢复结束
[CSS2]             [DB]: 设置命令[EP START], 目标站点 DSC1[0], 命令序号[500]
[CSS2]             [DB]: 设置命令[EP START2], 目标站点 DSC1[0], 命令序号[502]
[CSS2]             [DB]: 设置命令[EP OPEN], 目标站点 DSC1[0], 命令序号[504]
[CSS2]             [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS2]             [DB]: 设置命令[RESUME EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[506]
[CSS2]             [DB]: 继续工作线程结束
[CSS2]             [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS2]             [DB]: 设置命令[EP REAL OPEN], 目标站点 DSC1[0], 命令序号[508]
[CSS2]             [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0



show

monitor current time:2021-09-01 10:42:51, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 1] ========================================

[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE

[CSS2] auto check = TRUE, global info:
[ASM2] auto restart = TRUE
[DSC2] auto restart = TRUE


ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-09-01 10:42:50    CSS1          0         9341    Normal Node  OPEN               WORKING      OK           TRUE         743476673         743476809       
        2021-09-01 10:42:50    CSS2          1         9341    Control Node OPEN               WORKING      OK           TRUE         743176646         743177845       

=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-09-01 10:42:50    ASM1          0         9351    Normal Node  OPEN               WORKING      OK           TRUE         743482830         743482944       
        2021-09-01 10:42:50    ASM2          1         9351    Control Node OPEN               WORKING      OK           TRUE         743182775         743183952       

=================== group[name = DSC, seq = 2, type = DB, Control Node = 1] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2021-09-01 10:42:50    DSC1          0         5236    Normal Node  OPEN               WORKING      OK           TRUE         1961132104        1961132163      
        2021-09-01 10:42:50    DSC2          1         5236    Control Node OPEN               WORKING      OK           TRUE         1960283745        1960284867  
测试结果

服务器重启过程中、dsc 集群会将css、asm、dsc集群组故障节点状态转为FALSE,待服务器启动成功后、bin目录下执行./DmCSSSERVICECSS1 start ,DSC 集群在检查周期检查将自动拉起故障节点、并验证状态、启动并自动拉起节点成功节点恢复正常。

更多资讯请上达梦技术社区了解: https://eco.dameng.com

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值