集群分布
两节点DSC集群
IP规划:
主机名 | 服务ip | 心跳ip | 实例名 | 节点用途 |
---|---|---|---|---|
sds-part-dmdb01 | 192.168.157.100 | 192.168.156.100 | DSC1 | DSC节点1 |
sds-part-dmdb02 | 192.168.157.101 | 192.168.156.101 | DSC2 | DSC节点2 |
端口规划:
实例名 | 实例端口 | MAL系统端口 | CSS端口 | ASM端口 | ASM的MAL端口 | DCR检查实例端口 |
---|---|---|---|---|---|---|
DSC1 | 5236 | 9236 | 9341 | 9351 | 7236 | 9741 |
DSC2 | 5236 | 9236 | 9341 | 9351 | 7236 | 9741 |
测试过程详细记录
集群01
1.1测试01模式控制节点DSC节点进程kill
监视器检查各节点状态
> 192.168.157.101 监视器状态:
> [dmdba@sds-part-dmdb02 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini [monitor] 2021-08-30
> 14:37:07: CSS MONITOR V8 [monitor] 2021-08-30 14:37:07: CSS
> MONITOR SYSTEM IS READY.
>
> [monitor] 2021-08-30 14:37:07: Wait CSS Control Node
> choosed... [monitor] 2021-08-30 14:37:08: Wait CSS Control
> Node choosed succeed.
>
> show
>
> monitor current time:2021-08-30 14:37:10, n_group:3
> =================== group[name = CSS, seq = 0, type = CSS, Control Node = 0] ========================================
>
> [CSS1] auto check = TRUE, global info: [ASM1] auto restart = TRUE
> [DSC1] auto restart = TRUE
>
> [CSS2] auto check = TRUE, global info: [ASM2] auto restart = TRUE
> [DSC2] auto restart = TRUE
>
> ep: css_time inst_name seqno port mode
> inst_status vtd_status is_ok active guid
> ts
> 2021-08-30 14:37:13 CSS1 0 9341 Control Node OPEN WORKING OK TRUE
> 698495724 698496733
> 2021-08-30 14:37:13 CSS2 1 9341 Normal Node OPEN WORKING OK TRUE
> 698495623 698496630
>
> =================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================
>
> n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)
>
> sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL
>
> crash process over flag is TRUE ep: css_time
> inst_name seqno port mode inst_status
> vtd_status is_ok active guid ts
>
> 2021-08-30 14:37:13 ASM1 0 9351 Normal Node OPEN WORKING OK TRUE
> 698501852 698502840
> 2021-08-30 14:37:13 ASM2 1 9351 Control Node OPEN WORKING OK TRUE
> 698501748 698502733
>
> =================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================
>
> n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)
>
> sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL
>
> crash process over flag is TRUE ep: css_time
> inst_name seqno port mode inst_status
> vtd_status is_ok active guid ts
>
>2021-08-30 14:37:13 DSC1 0 5236 Control Node OPEN WORKING OK TRUE
> 1834441012 1834441945
> 2021-08-30 14:37:13 DSC2 1 5236 Normal Node OPEN WORKING OK TRUE 1834440664
> 1834441594
执行命令并检查
在节点192.168.157.100 kil -9 控制节点上数据库实例服务的进程ID
192.168.157.100
[dmdba@sds-part-dmdb01 bin]$ ps aux|grep dmserver
dmdba 150323 3.1 2.8 75648456 3783828 ? Ssl 14:21 0:49 /dm/dmdbms/bin/dmserver path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini
dmdba 171821 0.0 0.0 112728 960 pts/0 S+ 14:47 0:00 grep --color=auto dmserver
[dmdba@sds-part-dmdb01 bin]$ kill -9 150323
观察监视器显示
[CSS1] [DB]: 设置命令[LINK_CHECK], 目标站点 DSC1[0], 命令序号[24]
[CSS1] [DB]: 设置命令[LINK_CHECK], 目标站点 DSC2[1], 命令序号[25]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0] [CSS1]
[DB]: 设置命令[SYS HALT], 目标站点 DSC1[0], 命令序号[27] [CSS1] [DB]:
设置命令[NONE], 目标站点 DSC1[0], 命令序号[0] [CSS1] [DB]: 检测到EP
DSC1[0]故障在PROCESS_LINK_CHECK中 [CSS1] [DB]: 设置EP
DSC1[0]为故障EP [CSS1] [DB]: 设置EP DSC2[1]为控制节点 [CSS1]
[DB]: 设置命令[EP_CRASH], 目标站点 DSC2[1], 命令序号[29] [CSS1] [DB]:
设置命令[NONE], 目标站点 DSC2[1], 命令序号[0] [CSS1] [DB]:
命令[EP_CRASH]处理结束 [CSS1] [DB]: 设置命令[CMD CLEAR], 目标站点
DSC2[1], 命令序号[32] [CSS1] [CSS]: 设置命令[CONFIG VIP], 目标站点
CSS2[1], 命令序号[3] [CSS1] [DB]: 设置命令[CONFIG VIP], 目标站点
DSC2[1], 命令序号[37] [CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1],
命令序号[0] [CSS1] [DB]: 命令[CONFIG VIP]处理结束 [CSS1]
[CSS]: 重启本地DB实例,命令:[/dm/dmdbms/bin/dmserver
path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini] [CSS1]
[DB]: 设置EP DSC1[0]为故障重加入EP [CSS1] [DB]: 设置命令[START
NOTIFY], 目标站点 DSC1[0], 命令序号[42] [CSS1] [DB]: 设置命令[SUSPEND
EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[43] [CSS1] [DB]:
暂停工作线程结束 [CSS1] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC1[0],
命令序号[44] [CSS1] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC2[1],
命令序号[45] [CSS1] [DB]: 故障EP重新加入DSC结束 [CSS1]
[DB]: 设置命令[ERROR EP ADD], 目标站点 DSC1[0], 命令序号[47] [CSS1]
[DB]: 设置命令[ERROR EP ADD], 目标站点 DSC2[1], 命令序号[48] [CSS1]
[DB]: 故障EP重新加入DSC结束 [CSS1] [DB]: 设置命令[EP RECV], 目标站点
DSC2[1], 命令序号[50] [CSS1] [DB]: 故障EP恢复结束 [CSS1]
[DB]: 设置命令[EP START], 目标站点 DSC1[0], 命令序号[52] [CSS1] [DB]:
设置命令[EP START2], 目标站点 DSC1[0], 命令序号[54] [CSS1] [DB]:
设置命令[EP OPEN], 目标站点 DSC1[0], 命令序号[56] [CSS1] [DB]:
设置命令[NONE], 目标站点 DSC1[0], 命令序号[0] [CSS1] [DB]: 设置命令[RESUME
EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[58] [CSS1] [DB]:
继续工作线程结束 [CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1] [DB]: 设置命令[EP REAL OPEN], 目标站点 DSC1[0], 命令序号[60]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
进程端口检查
[dmdba@sds-part-dmdb01 bin]$ ps aux|grep dmserver
dmdba 173982 12.2 2.3 75632040 3141380 ? Ssl 14:50 0:07 /dm/dmdbms/bin/dmserver path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini
dmdba 175719 0.0 0.0 112728 960 pts/0 S+ 14:51 0:00 grep --color=auto dmserver
[dmdba@sds-part-dmdb01 bin]$ netstat -tunlp|grep 5236
(No info could be read for "-p": geteuid()=1001 but you should be root.)
tcp6 0 0 :::5236
监视器状态
监视器内执行命令show
ep: css_time inst_name seqno
port mode inst_status vtd_status is_ok
active guid ts
2021-08-30 14:51:55 CSS1 0 9341 Control Node OPEN WORKING OK TRUE
698495724 698497615
2021-08-30 14:51:55 CSS2 1 9341 Normal Node OPEN WORKING OK TRUE
698495623 698497511=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================
n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)
sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL
crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts2021-08-30 14:51:55 ASM1 0 9351 Normal Node OPEN WORKING OK TRUE
698501852 698503721
2021-08-30 14:51:55 ASM2 1 9351 Control Node OPEN WORKING OK TRUE
698501748 698503615=================== group[name = DSC, seq = 2, type = DB, Control Node = 1] ========================================
n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)
sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL
crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts2021-08-30 14:51:55 DSC1 0 5236 Normal Node OPEN WORKING OK TRUE
1835811277 1835811357
2021-08-30 14:51:55 DSC2 1 5236 Control Node OPEN WORKING OK TRUE
1834440664 1834442475 | — |
测试结果
当在检测时间内,控制节点的DSC1实例进程丢失后,DSC集群自动将控制节点转移至正常节点、并将其mode变更为Control Node,并标记为故障节点,待检查时间周期(设置的60s)原集群尝试自动拉起故障节点,拉起成功后重新加入集群,如正常加入后节点状态恢复正常,变更为非控制节点,其mode变更为Normal
Node,进程与端口正常,服务登录正常。
1.2 测试02模式非控制节点DSC节点进程kill
监视器检查各节点状态
ep: css_time inst_name seqno port mode
inst_status vtd_status is_ok active guid
ts
2021-08-30 14:51:55 DSC1 0 5236 Normal Node OPEN WORKING OK TRUE
1835811277 1835811357
2021-08-30 14:51:55 DSC2 1 5236 Control Node OPEN WORKING OK TRUE
1834440664 1834442475
执行命令并检查
[dmdba@sds-part-dmdb01 bin]$ netstat -tunlp|grep 5236 (No info could be read for "-p": geteuid()=1001 but you should be root.) tcp6 0 0 :::5236 :::* LISTEN -
[dmdba@sds-part-dmdb01 bin]$ kill -9 173982
观察监视器显示
[CSS1] [DB]: 设置命令[LINK_CHECK], 目标站点 DSC1[0], 命令序号[64]
[CSS1] [DB]: 设置命令[LINK_CHECK], 目标站点 DSC2[1], 命令序号[65]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1] [DB]: 设置命令[SYS HALT], 目标站点 DSC1[0], 命令序号[67]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS1] [DB]: 检测到EP DSC1[0]故障在PROCESS_LINK_CHECK中
[CSS1] [DB]: 设置EP DSC1[0]为故障EP
[CSS1] [DB]: 设置命令[EP_CRASH], 目标站点 DSC2[1], 命令序号[69]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1] [DB]: 命令[EP_CRASH]处理结束
[CSS1] [DB]: 设置命令[CMD CLEAR], 目标站点 DSC2[1], 命令序号[72]
[CSS1] [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS2[1], 命令序号[4]
[CSS1] [CSS]: 设置命令[NONE], 目标站点 CSS2[1], 命令序号[0]
[CSS1] [DB]: 设置命令[CONFIG VIP], 目标站点 DSC2[1], 命令序号[77]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1] [DB]: 命令[CONFIG VIP]处理结束
[CSS1] [CSS]: 重启本地DB实例,命令:[/dm/dmdbms/bin/dmserver path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini]
[CSS1] [DB]: 设置EP DSC1[0]为故障重加入EP
[CSS1] [DB]: 设置命令[START NOTIFY], 目标站点 DSC1[0], 命令序号[82]
[CSS1] [DB]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[83]
[CSS1] [DB]: 暂停工作线程结束
[CSS1] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC1[0], 命令序号[84]
[CSS1] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC2[1], 命令序号[85]
[CSS1] [DB]: 故障EP重新加入DSC结束
[CSS1] [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC1[0], 命令序号[87]
[CSS1] [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC2[1], 命令序号[88]
[CSS1] [DB]: 故障EP重新加入DSC结束
[CSS1] [DB]: 设置命令[EP RECV], 目标站点 DSC2[1], 命令序号[90]
[CSS1] [DB]: 故障EP恢复结束
[CSS1] [DB]: 设置命令[EP START], 目标站点 DSC1[0], 命令序号[92]
[CSS1] [DB]: 设置命令[EP START2], 目标站点 DSC1[0], 命令序号[94]
[CSS1] [DB]: 设置命令[EP OPEN], 目标站点 DSC1[0], 命令序号[96]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS1] [DB]: 设置命令[RESUME EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[98]
[CSS1] [DB]: 继续工作线程结束
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1] [DB]: 设置命令[EP REAL OPEN], 目标站点 DSC1[0], 命令序号[100]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
进程端口检查
[dmdba@sds-part-dmdb01 bin]$ netstat -tunlp|grep 5236
(No info could be read for "-p": geteuid()=1001 but you should be root.)
tcp6 0 0 :::5236 :::* LISTEN -
[dmdba@sds-part-dmdb01 bin]$ ps aux|grep dmserver
dmdba 179948 3.2 2.5 75632040 3318984 ? Ssl 14:56 0:28 /dm/dmdbms/bin/dmserver path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini
dmdba 192488 0.0 0.0 112728 960 pts/0 S+ 15:11 0:00 grep --color=auto dmserver
监视器状态
ok_ep_arr(index, seqno): (0, 0) (1, 1)
sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL
crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts2021-08-30 15:13:04 DSC1 0 5236 Normal Node OPEN WORKING OK TRUE
1836113922 1836114887
2021-08-30 15:13:04 DSC2 1 5236 Control Node OPEN WORKING OK TRUE 1834440664
1834443743
测试结果
当在检测时间内,非控制节点的DSC1实例进程丢失后,并标记为故障节点,待检查时间周期(设置的60s)原集群尝试自动拉起故障节点,拉起成功后重新加入集群,如正常加入后节点状态恢复正常,进程与端口正常,服务登录正常。
1.3 测试03模式控制节点服务器重启
监视器状态显示(重启前)
dmdba@sds-part-dmdb04 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor] 2021-08-30 16:28:50: CSS MONITOR V8 [monitor]
2021-08-30 16:28:50: CSS MONITOR SYSTEM IS READY.[monitor] 2021-08-30 16:28:50: Wait CSS Control Node
choosed… [monitor] 2021-08-30 16:28:51: Wait CSS Control
Node choosed succeed.show
monitor current time:2021-08-30 16:28:58, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 0] ========================================[CSS3] auto check = TRUE, global info: [ASM3] auto restart = TRUE
[DSC3] auto restart = TRUE[CSS4] auto check = TRUE, global info: [ASM4] auto restart = TRUE
[DSC4] auto restart = TRUEep: css_time inst_name seqno port mode
inst_status vtd_status is_ok active guid
ts
2021-08-30 16:28:59 CSS3 0 9341 Control Node OPEN WORKING OK TRUE
699508042 699512160
2021-08-30 16:28:59 CSS4 1 9341 Normal Node OPEN WORKING OK TRUE
699513005 699517103=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================
n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)
sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL
crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts2021-08-30 16:28:59 ASM3 0 9351 Normal Node OPEN WORKING OK TRUE
699515888 699519979
2021-08-30 16:28:59 ASM4 1 9351 Control Node OPEN WORKING OK TRUE
699519138 699523215=================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1)
sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL
crash process over flag is TRUE ep: css_time
inst_name seqno port mode inst_status
vtd_status is_ok active guid ts2021-08-30 16:28:59 DSC3 0 5236 Control Node OPEN WORKING OK TRUE
1837285878 1837289913
2021-08-30 16:28:59 DSC4 1 5236 Normal Node OPEN WORKING OK TRUE
1837995638 1837998773
监视器状态显示(重启中)
[dmdba@sds-part-dmdb04 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor] 2021-08-30 16:51:20: CSS MONITOR V8
[monitor] 2021-08-30 16:51:23: CSS MONITOR SYSTEM IS READY.
[monitor] 2021-08-30 16:51:23: Wait CSS Control Node choosed...
[monitor] 2021-08-30 16:51:29: Wait CSS Control Node choosed failed, if dmcss has startuped and ini configured correctly, please wait a little more before execute command.
[CSS4] [CSS]: 监测到控制节点关闭
[CSS4] [CSS]: 设置EP [255]为控制节点
[monitor] 2021-08-30 16:52:27: 检测到CSS控制节点发生变化,由CSS[255]变为CSS[1]
[CSS4] [CSS]: 设置EP CSS4[1]为控制节点
[CSS4] [ASM]: 设置命令[SYS HALT], 目标站点 ASM3[0], 命令序号[357]
[CSS4] [DB]: 设置命令[SYS HALT], 目标站点 DSC3[0], 命令序号[625]
[CSS4] [ASM]: 设置EP ASM3[0]为故障EP
[CSS4] [DB]: 检测到EP DSC3[0]故障在PROCESS_OPEN中
[CSS4] [ASM]: 检测到EP ASM3[0]故障在PROCESS_OPEN中
[CSS4] [DB]: 设置EP DSC3[0]为故障EP
[CSS4] [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM4[1], 命令序号[360]
[CSS4] [DB]: 设置EP DSC4[1]为控制节点
[CSS4] [DB]: 设置命令[EP_CRASH], 目标站点 DSC4[1], 命令序号[627]
[CSS4] [ASM]: 暂停工作线程结束
[CSS4] [ASM]: 设置命令[CRASH RECV], 目标站点 ASM4[1], 命令序号[362]
[CSS4] [ASM]: 设置命令[NONE], 目标站点 ASM4[1], 命令序号[0]
[CSS4] [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM4[1], 命令序号[364]
[CSS4] [ASM]: 设置命令[NONE], 目标站点 ASM4[1], 命令序号[0]
[CSS4] [ASM]: 命令[RESUME EP WORKER THREAD]处理结束
[CSS4] [DB]: 设置命令[NONE], 目标站点 DSC4[1], 命令序号[0]
[CSS4] [DB]: 命令[EP_CRASH]处理结束
[CSS4] [DB]: 设置命令[CMD CLEAR], 目标站点 DSC4[1], 命令序号[630]
[CSS4] [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS3[0], 命令序号[6]
[CSS4] [CSS]: 设置命令[NONE], 目标站点 CSS4[1], 命令序号[0]
[CSS4] [DB]: 设置命令[CONFIG VIP], 目标站点 DSC4[1], 命令序号[635]
[CSS4] [DB]: 设置命令[NONE], 目标站点 DSC4[1], 命令序号[0]
[CSS4] [DB]: 命令[CONFIG VIP]处理结束
[CSS4] [DB]: 命令[CONFIG VIP]处理结束
show
monitor current time:2021-08-30 16:55:51, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 1] ========================================
[CSS3] auto check = FALSE, global info:
Connect to [CSS3] failed, please check the network or the CSSM_CSS_IP config in [/dm/config/dmcssm.ini] .
[CSS4] auto check = TRUE, global info:
[ASM4] auto restart = TRUE
[DSC4] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-08-30 16:55:50 CSS3 0 9341 Normal Node OPEN WORKING OK FALSE 699508042 699513441
2021-08-30 16:55:50 CSS4 1 9341 Control Node OPEN WORKING OK TRUE 699513005 699518715
=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================
n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-08-30 16:55:50 ASM3 0 9351 Normal Node OPEN SYSHALT ERROR FALSE 699515888 699521261
2021-08-30 16:55:50 ASM4 1 9351 Control Node OPEN WORKING OK TRUE 699519138 699524827
=================== group[name = DSC, seq = 2, type = DB, Control Node = 1] ========================================
n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-08-30 16:55:50 DSC3 0 5236 Normal Node OPEN WORKING ERROR FALSE 1837285878 1837291195
2021-08-30 16:55:50 DSC4 1 5236 Control Node OPEN WORKING OK TRUE 1837995638 1838000386
登录集群内未重启节点验证
[dmdba@sds-part-dmdb04 bin]$ ./disql
disql V8
用户名:
密码:
服务器[LOCALHOST:5236]:处于普通打开状态
登录使用时间 : 17.442(ms)
SQL>
SQL> SELECT * FROM V$INSTANCE;
行号 NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION DB_VERSION START_TIME STATUS$ MODE$ OGUID
---------- ---- ------------- --------------- --------------- -------------------------- ------------------- ------------------- ------- ------ -----------
DSC_SEQNO DSC_ROLE
----------- ------------
1 DSC4 DSC4 2 sds-part-dmdb04 DM Database Server x64 V8 DB Version: 0x7000c 2021-08-30 15:36:35 OPEN NORMAL 0
1 Control node
已用时间: 14.900(毫秒). 执行号:0.
监视器状态显示(重启后)
监视器显示
[dmdba@sds-part-dmdb01 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor] 2021-09-01 10:16:15: CSS MONITOR V8
[monitor] 2021-09-01 10:16:15: CSS MONITOR SYSTEM IS READY.
[monitor] 2021-09-01 10:16:15: Wait CSS Control Node choosed...
[CSS1] [CSS]: 监测到控制节点关闭
[CSS1] [CSS]: 设置EP [255]为控制节点
[monitor] 2021-09-01 10:16:21: Wait CSS Control Node choosed failed, if dmcss has startuped and ini configured correctly, please wait a little more before execute command.
[monitor] 2021-09-01 10:16:24: 检测到CSS控制节点发生变化,由CSS[255]变为CSS[0]
[CSS1] [CSS]: 设置EP CSS1[0]为控制节点
[CSS1] [ASM]: 设置命令[SYS HALT], 目标站点 ASM2[1], 命令序号[263]
[CSS1] [ASM]: 设置EP ASM2[1]为故障EP
[CSS1] [ASM]: 检测到EP ASM2[1]故障在PROCESS_OPEN中
[CSS1] [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM1[0], 命令序号[266]
[CSS1] [DB]: 检测到EP DSC2[1]故障在PROCESS_OPEN中
[CSS1] [DB]: 设置EP DSC2[1]为故障EP
[CSS1] [DB]: 设置EP DSC1[0]为控制节点
[CSS1] [DB]: 设置命令[EP_CRASH], 目标站点 DSC1[0], 命令序号[341]
[CSS1] [ASM]: 暂停工作线程结束
[CSS1] [ASM]: 设置EP ASM1[0]为控制节点
[CSS1] [ASM]: 设置命令[CRASH RECV], 目标站点 ASM1[0], 命令序号[268]
[CSS1] [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS1] [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM1[0], 命令序号[270]
[CSS1] [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS1] [ASM]: 命令[RESUME EP WORKER THREAD]处理结束
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS1] [DB]: 命令[EP_CRASH]处理结束
[CSS1] [DB]: 设置命令[CMD CLEAR], 目标站点 DSC1[0], 命令序号[344]
[CSS1] [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS2[1], 命令序号[6]
[CSS1] [DB]: 设置命令[CONFIG VIP], 目标站点 DSC1[0], 命令序号[349]
[CSS1] [DB]: 设置命令[NONE], 目标站点
[CSS2] auto check = FALSE, global info:
Connect to [CSS2] failed, please check the network or the CSSM_CSS_IP config in [/dm/config/dmcssm.ini] .
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-09-01 10:17:20 CSS1 0 9341 Control Node OPEN WORKING OK TRUE 742822335 742823260
2021-09-01 10:17:20 CSS2 1 9341 Normal Node OPEN WORKING OK FALSE 698495623 698653451
=================== group[name = ASM, seq = 1, type = ASM, Control Node = 0] ========================================
n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 0)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-09-01 10:17:20 ASM1 0 9351 Control Node OPEN WORKING OK TRUE 742828461 742829364
2021-09-01 10:17:20 ASM2 1 9351 Normal Node OPEN SYSHALT ERROR FALSE 698501748 698659643
=================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 0)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-09-01 10:17:20 DSC1 0 5236 Control Node OPEN WORKING OK TRUE 1959281632 1959282480
2021-09-01 10:17:20 DSC2 1 5236 Normal Node OPEN WORKING ERROR FALSE 1834440664 1834598502
=================================================================================================================
33
[dmdba@sds-part-dmdb02 bin]$ ./DmCSSServicecss2 start
Starting DmCSSServicecss2: ok
启动后:
[CSS2] [CSS]: 设置EP CSS1[0]为控制节点
[CSS2] [CSS]: 重启本地ASM实例,命令:[/dm/dmdbms/bin/dmasmsvr dcr_ini=/dm/config/dmdcr.ini]
[CSS1] [ASM]: 设置EP ASM2[1]为故障重加入EP
[CSS1] [ASM]: 设置命令[START NOTIFY], 目标站点 ASM2[1], 命令序号[274]
[CSS1] [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM1[0], 命令序号[276]
[CSS1] [ASM]: 暂停工作线程结束
[CSS1] [ASM]: 设置命令[ERROR EP ADD], 目标站点 ASM1[0], 命令序号[277]
[CSS1] [ASM]: 设置命令[ERROR EP ADD], 目标站点 ASM2[1], 命令序号[278]
[CSS1] [ASM]: 故障EP重新加入DSC结束
[CSS1] [ASM]: 设置命令[EP RECV], 目标站点 ASM1[0], 命令序号[280]
[CSS1] [ASM]: 故障EP恢复结束
[CSS1] [ASM]: 设置命令[EP START], 目标站点 ASM2[1], 命令序号[282]
[CSS1] [ASM]: 设置命令[EP OPEN], 目标站点 ASM2[1], 命令序号[284]
[CSS1] [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS1] [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM1[0], 命令序号[286]
[CSS1] [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS1] [ASM]: 继续工作线程结束
[CSS1] [ASM]: 设置命令[EP REAL OPEN], 目标站点 ASM2[1], 命令序号[288]
[CSS1] [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS1] [ASM]: 设置命令[LINK_CHECK], 目标站点 ASM1[0], 命令序号[292]
[CSS1] [ASM]: 设置命令[LINK_CHECK], 目标站点 ASM2[1], 命令序号[293]
[CSS1] [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS1] [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS2] [CSS]: 重启本地DB实例,命令:[/dm/dmdbms/bin/dmserver path=/dm/config/dsc2/dm.ini dcr_ini=/dm/config/dmdcr.ini]
[CSS1] [DB]: 设置EP DSC2[1]为故障重加入EP
[CSS1] [DB]: 设置命令[START NOTIFY], 目标站点 DSC2[1], 命令序号[354]
[CSS1] [DB]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 DSC1[0], 命令序号[355]
[CSS1] [DB]: 暂停工作线程结束
[CSS1] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC1[0], 命令序号[356]
[CSS1] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC2[1], 命令序号[357]
[CSS1] [DB]: 故障EP重新加入DSC结束
[CSS1] [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC1[0], 命令序号[359]
[CSS1] [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC2[1], 命令序号[360]
[CSS1] [DB]: 故障EP重新加入DSC结束
[CSS1] [DB]: 设置命令[EP RECV], 目标站点 DSC1[0], 命令序号[362]
[CSS1] [DB]: 故障EP恢复结束
[CSS1] [DB]: 设置命令[EP START], 目标站点 DSC2[1], 命令序号[364]
[CSS1] [DB]: 设置命令[EP START2], 目标站点 DSC2[1], 命令序号[366]
[CSS1] [DB]: 设置命令[EP OPEN], 目标站点 DSC2[1], 命令序号[368]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS1] [DB]: 设置命令[RESUME EP WORKER THREAD], 目标站点 DSC1[0], 命令序号[370]
[CSS1] [DB]: 继续工作线程结束
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS1] [DB]: 设置命令[EP REAL OPEN], 目标站点 DSC2[1], 命令序号[372]
[CSS1] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 0] ========================================
[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE
[CSS2] auto check = TRUE, global info:
[ASM2] auto restart = TRUE
[DSC2] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-09-01 10:24:51 CSS1 0 9341 Control Node OPEN WORKING OK TRUE 742822335 742823711
2021-09-01 10:24:51 CSS2 1 9341 Normal Node OPEN WORKING OK TRUE 743176646 743176767
=================== group[name = ASM, seq = 1, type = ASM, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-09-01 10:24:51 ASM1 0 9351 Control Node OPEN WORKING OK TRUE 742828461 742829815
2021-09-01 10:24:51 ASM2 1 9351 Normal Node OPEN WORKING OK TRUE 743182775 743182874
=================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-09-01 10:24:51 DSC1 0 5236 Control Node OPEN WORKING OK TRUE 1959281632 1959282931
2021-09-01 10:24:51 DSC2 1 5236 Normal Node OPEN WORKING OK TRUE 1960283745 1960283788
验证
SQL> SELECT * FROM V$DATABASE;
行号 NAME CREATE_TIME ARCH_MODE LAST_CKPT_TIME STATUS$ ROLE$ MAX_SIZE TOTAL_SIZE DSC_NODES OPEN_COUNT
---------- ---- ------------------- --------- -------------- ----------- ----------- -------------------- -------------------- ----------- -----------
STARTUP_COUNT LAST_STARTUP_TIME
-------------------- -------------------
1 DSC 2021-08-25 15:52:39 Y NULL 4 0 0 163840 2 66
5 2021-08-30 14:22:05
1.4 测试04模式控制节点服务器重启
监视器状态显示(重启前)
dmdba@sds-part-dmdb04 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor] 2021-08-30 16:28:50: CSS MONITOR V8
[monitor] 2021-08-30 16:28:50: CSS MONITOR SYSTEM IS READY.
[monitor] 2021-08-30 16:28:50: Wait CSS Control Node choosed...
[monitor] 2021-08-30 16:28:51: Wait CSS Control Node choosed succeed.
show
monitor current time:2021-08-30 16:28:58, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 0] ========================================
[CSS3] auto check = TRUE, global info:
[ASM3] auto restart = TRUE
[DSC3] auto restart = TRUE
[CSS4] auto check = TRUE, global info:
[ASM4] auto restart = TRUE
[DSC4] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-08-30 16:28:59 CSS3 0 9341 Control Node OPEN WORKING OK TRUE 699508042 699512160
2021-08-30 16:28:59 CSS4 1 9341 Normal Node OPEN WORKING OK TRUE 699513005 699517103
=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-08-30 16:28:59 ASM3 0 9351 Normal Node OPEN WORKING OK TRUE 699515888 699519979
2021-08-30 16:28:59 ASM4 1 9351 Control Node OPEN WORKING OK TRUE 699519138 699523215
=================== group[name = DSC, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-08-30 16:28:59 DSC3 0 5236 Control Node OPEN WORKING OK TRUE 1837285878 1837289913
2021-08-30 16:28:59 DSC4 1 5236 Normal Node OPEN WORKING OK TRUE 1837995638 1837998773
监视器状态显示(重启中)
[dmdba@sds-part-dmdb04 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[monitor] 2021-08-30 16:51:20: CSS MONITOR V8
[monitor] 2021-08-30 16:51:23: CSS MONITOR SYSTEM IS READY.
[monitor] 2021-08-30 16:51:23: Wait CSS Control Node choosed...
[monitor] 2021-08-30 16:51:29: Wait CSS Control Node choosed failed, if dmcss has startuped and ini configured correctly, please wait a little more before execute command.
[CSS4] [CSS]: 监测到控制节点关闭
[CSS4] [CSS]: 设置EP [255]为控制节点
[monitor] 2021-08-30 16:52:27: 检测到CSS控制节点发生变化,由CSS[255]变为CSS[1]
[CSS4] [CSS]: 设置EP CSS4[1]为控制节点
[CSS4] [ASM]: 设置命令[SYS HALT], 目标站点 ASM3[0], 命令序号[357]
[CSS4] [DB]: 设置命令[SYS HALT], 目标站点 DSC3[0], 命令序号[625]
[CSS4] [ASM]: 设置EP ASM3[0]为故障EP
[CSS4] [DB]: 检测到EP DSC3[0]故障在PROCESS_OPEN中
[CSS4] [ASM]: 检测到EP ASM3[0]故障在PROCESS_OPEN中
[CSS4] [DB]: 设置EP DSC3[0]为故障EP
[CSS4] [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM4[1], 命令序号[360]
[CSS4] [DB]: 设置EP DSC4[1]为控制节点
[CSS4] [DB]: 设置命令[EP_CRASH], 目标站点 DSC4[1], 命令序号[627]
[CSS4] [ASM]: 暂停工作线程结束
[CSS4] [ASM]: 设置命令[CRASH RECV], 目标站点 ASM4[1], 命令序号[362]
[CSS4] [ASM]: 设置命令[NONE], 目标站点 ASM4[1], 命令序号[0]
[CSS4] [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM4[1], 命令序号[364]
[CSS4] [ASM]: 设置命令[NONE], 目标站点 ASM4[1], 命令序号[0]
[CSS4] [ASM]: 命令[RESUME EP WORKER THREAD]处理结束
[CSS4] [DB]: 设置命令[NONE], 目标站点 DSC4[1], 命令序号[0]
[CSS4] [DB]: 命令[EP_CRASH]处理结束
[CSS4] [DB]: 设置命令[CMD CLEAR], 目标站点 DSC4[1], 命令序号[630]
[CSS4] [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS3[0], 命令序号[6]
[CSS4] [CSS]: 设置命令[NONE], 目标站点 CSS4[1], 命令序号[0]
[CSS4] [DB]: 设置命令[CONFIG VIP], 目标站点 DSC4[1], 命令序号[635]
[CSS4] [DB]: 设置命令[NONE], 目标站点 DSC4[1], 命令序号[0]
[CSS4] [DB]: 命令[CONFIG VIP]处理结束
[CSS4] [DB]: 命令[CONFIG VIP]处理结束
show
monitor current time:2021-08-30 16:55:51, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 1] ========================================
[CSS3] auto check = FALSE, global info:
Connect to [CSS3] failed, please check the network or the CSSM_CSS_IP config in [/dm/config/dmcssm.ini] .
[CSS4] auto check = TRUE, global info:
[ASM4] auto restart = TRUE
[DSC4] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-08-30 16:55:50 CSS3 0 9341 Normal Node OPEN WORKING OK FALSE 699508042 699513441
2021-08-30 16:55:50 CSS4 1 9341 Control Node OPEN WORKING OK TRUE 699513005 699518715
=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================
n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-08-30 16:55:50 ASM3 0 9351 Normal Node OPEN SYSHALT ERROR FALSE 699515888 699521261
2021-08-30 16:55:50 ASM4 1 9351 Control Node OPEN WORKING OK TRUE 699519138 699524827
=================== group[name = DSC, seq = 2, type = DB, Control Node = 1] ========================================
n_ok_ep = 1
ok_ep_arr(index, seqno):
(0, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-08-30 16:55:50 DSC3 0 5236 Normal Node OPEN WORKING ERROR FALSE 1837285878 1837291195
2021-08-30 16:55:50 DSC4 1 5236 Control Node OPEN WORKING OK TRUE 1837995638 1838000386
登录集群内未重启节点验证
[dmdba@sds-part-dmdb04 bin]$ ./disql
disql V8
用户名:
密码:
服务器[LOCALHOST:5236]:处于普通打开状态
登录使用时间 : 17.442(ms)
SQL>
SQL> SELECT * FROM V$INSTANCE;
行号 NAME INSTANCE_NAME INSTANCE_NUMBER HOST_NAME SVR_VERSION DB_VERSION START_TIME STATUS$ MODE$ OGUID
---------- ---- ------------- --------------- --------------- -------------------------- ------------------- ------------------- ------- ------ -----------
DSC_SEQNO DSC_ROLE
----------- ------------
1 DSC4 DSC4 2 sds-part-dmdb04 DM Database Server x64 V8 DB Version: 0x7000c 2021-08-30 15:36:35 OPEN NORMAL 0
1 Control node
已用时间: 14.900(毫秒). 执行号:0.
监视器状态显示(重启后)
[[A[dmdba@sds-part-dmdb02 bin]$ ./dmcssm ini_path=/dm/config/dmcssm.ini
[[A[monitor] 2021-09-01 10:30:20: CSS MONITOR V8
[monitor] 2021-09-01 10:30:23: CSS MONITOR SYSTEM IS READY.
[monitor] 2021-09-01 10:30:23: Wait CSS Control Node choosed...
[monitor] 2021-09-01 10:30:29: Wait CSS Control Node choosed failed, if dmcss has startuped and ini configured correctly, please wait a little more before execute command.
[CSS2] [CSS]: 监测到控制节点关闭
[CSS2] [CSS]: 设置EP [255]为控制节点
[monitor] 2021-09-01 10:32:05: 检测到CSS控制节点发生变化,由CSS[255]变为CSS[1]
[CSS2] [CSS]: 设置EP CSS2[1]为控制节点
[CSS2] [ASM]: 设置命令[SYS HALT], 目标站点 ASM1[0], 命令序号[396]
[CSS2] [ASM]: 设置EP ASM1[0]为故障EP
[CSS2] [DB]: 设置命令[SYS HALT], 目标站点 DSC1[0], 命令序号[475]
[CSS2] [DB]: 检测到EP DSC1[0]故障在PROCESS_OPEN中
[CSS2] [DB]: 设置EP DSC1[0]为故障EP
[CSS2] [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM2[1], 命令序号[399]
[CSS2] [DB]: 设置EP DSC2[1]为控制节点
[CSS2] [DB]: 设置命令[EP_CRASH], 目标站点 DSC2[1], 命令序号[477]
[CSS2] [ASM]: 暂停工作线程结束
[CSS2] [ASM]: 设置EP ASM2[1]为控制节点
[CSS2] [ASM]: 设置命令[CRASH RECV], 目标站点 ASM2[1], 命令序号[401]
[CSS2] [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS2] [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM2[1], 命令序号[403]
[CSS2] [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS2] [ASM]: 命令[RESUME EP WORKER THREAD]处理结束
[CSS2] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS2] [DB]: 命令[EP_CRASH]处理结束
[CSS2] [DB]: 设置命令[CMD CLEAR], 目标站点 DSC2[1], 命令序号[480]
[CSS2] [CSS]: 设置命令[CONFIG VIP], 目标站点 CSS1[0], 命令序号[6]
[CSS2] [DB]: 设置命令[CONFIG VIP], 目标站点 DSC2[1], 命令序号[485]
[CSS2] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS2] [DB]: 命令[CONFIG VIP]处理结束
[CSS1] [CSS]: 设置EP CSS2[1]为控制节点
[CSS1] [CSS]: 重启本地ASM实例,命令:[/dm/dmdbms/bin/dmasmsvr dcr_ini=/dm/config/dmdcr.ini]
[CSS2] [ASM]: 设置EP ASM1[0]为故障重加入EP
[CSS2] [ASM]: 设置命令[START NOTIFY], 目标站点 ASM1[0], 命令序号[407]
[CSS2] [ASM]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 ASM2[1], 命令序号[409]
[CSS2] [ASM]: 暂停工作线程结束
[CSS2] [ASM]: 设置命令[ERROR EP ADD], 目标站点 ASM1[0], 命令序号[410]
[CSS2] [ASM]: 设置命令[ERROR EP ADD], 目标站点 ASM2[1], 命令序号[411]
[CSS2] [ASM]: 故障EP重新加入DSC结束
[CSS2] [ASM]: 设置命令[EP RECV], 目标站点 ASM2[1], 命令序号[413]
[CSS2] [ASM]: 故障EP恢复结束
[CSS2] [ASM]: 设置命令[EP START], 目标站点 ASM1[0], 命令序号[415]
[CSS2] [ASM]: 设置命令[EP OPEN], 目标站点 ASM1[0], 命令序号[417]
[CSS2] [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS2] [ASM]: 设置命令[RESUME EP WORKER THREAD], 目标站点 ASM2[1], 命令序号[419]
[CSS2] [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS2] [ASM]: 继续工作线程结束
[CSS2] [ASM]: 设置命令[EP REAL OPEN], 目标站点 ASM1[0], 命令序号[421]
[CSS2] [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS2] [ASM]: 设置命令[LINK_CHECK], 目标站点 ASM1[0], 命令序号[425]
[CSS2] [ASM]: 设置命令[LINK_CHECK], 目标站点 ASM2[1], 命令序号[426]
[CSS2] [ASM]: 设置命令[NONE], 目标站点 ASM1[0], 命令序号[0]
[CSS2] [ASM]: 设置命令[NONE], 目标站点 ASM2[1], 命令序号[0]
[CSS1] [CSS]: 重启本地DB实例,命令:[/dm/dmdbms/bin/dmserver path=/dm/config/dsc1/dm.ini dcr_ini=/dm/config/dmdcr.ini]
[CSS2] [DB]: 设置EP DSC1[0]为故障重加入EP
[CSS2] [DB]: 设置命令[START NOTIFY], 目标站点 DSC1[0], 命令序号[490]
[CSS2] [DB]: 设置命令[SUSPEND EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[491]
[CSS2] [DB]: 暂停工作线程结束
[CSS2] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC1[0], 命令序号[492]
[CSS2] [DB]: 设置命令[DCR_LOAD], 目标站点 DSC2[1], 命令序号[493]
[CSS2] [DB]: 故障EP重新加入DSC结束
[CSS2] [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC1[0], 命令序号[495]
[CSS2] [DB]: 设置命令[ERROR EP ADD], 目标站点 DSC2[1], 命令序号[496]
[CSS2] [DB]: 故障EP重新加入DSC结束
[CSS2] [DB]: 设置命令[EP RECV], 目标站点 DSC2[1], 命令序号[498]
[CSS2] [DB]: 故障EP恢复结束
[CSS2] [DB]: 设置命令[EP START], 目标站点 DSC1[0], 命令序号[500]
[CSS2] [DB]: 设置命令[EP START2], 目标站点 DSC1[0], 命令序号[502]
[CSS2] [DB]: 设置命令[EP OPEN], 目标站点 DSC1[0], 命令序号[504]
[CSS2] [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0]
[CSS2] [DB]: 设置命令[RESUME EP WORKER THREAD], 目标站点 DSC2[1], 命令序号[506]
[CSS2] [DB]: 继续工作线程结束
[CSS2] [DB]: 设置命令[NONE], 目标站点 DSC2[1], 命令序号[0]
[CSS2] [DB]: 设置命令[EP REAL OPEN], 目标站点 DSC1[0], 命令序号[508]
[CSS2] [DB]: 设置命令[NONE], 目标站点 DSC1[0], 命令序号[0
show
monitor current time:2021-09-01 10:42:51, n_group:3
=================== group[name = CSS, seq = 0, type = CSS, Control Node = 1] ========================================
[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE
[CSS2] auto check = TRUE, global info:
[ASM2] auto restart = TRUE
[DSC2] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-09-01 10:42:50 CSS1 0 9341 Normal Node OPEN WORKING OK TRUE 743476673 743476809
2021-09-01 10:42:50 CSS2 1 9341 Control Node OPEN WORKING OK TRUE 743176646 743177845
=================== group[name = ASM, seq = 1, type = ASM, Control Node = 1] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-09-01 10:42:50 ASM1 0 9351 Normal Node OPEN WORKING OK TRUE 743482830 743482944
2021-09-01 10:42:50 ASM2 1 9351 Control Node OPEN WORKING OK TRUE 743182775 743183952
=================== group[name = DSC, seq = 2, type = DB, Control Node = 1] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2021-09-01 10:42:50 DSC1 0 5236 Normal Node OPEN WORKING OK TRUE 1961132104 1961132163
2021-09-01 10:42:50 DSC2 1 5236 Control Node OPEN WORKING OK TRUE 1960283745 1960284867
测试结果
服务器重启过程中、dsc 集群会将css、asm、dsc集群组故障节点状态转为FALSE,待服务器启动成功后、bin目录下执行./DmCSSSERVICECSS1 start ,DSC 集群在检查周期检查将自动拉起故障节点、并验证状态、启动并自动拉起节点成功节点恢复正常。
更多资讯请上达梦技术社区了解: https://eco.dameng.com