第五章—DMDSC集群动态扩展节点失败还原

第五章—DSC动态扩展节点失败还原



一、动态扩展节点后的状态

  在部署完DSC集群后,测试集群的动态扩展节点实验,在部署完成后,发现有问题,新增节点无法正常启动,并且影响原集群使用。判断是因为我之前的架构是DSC+DW单机架构,而新增的节点也是在DW单机这个服务上导致。
  查看当前CSSM监视器状态:

Show
monitor current time:2022-06-15 11:20:07, n_group:3
=================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 0] ========================================

[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = TRUE
[DSC0] auto restart = TRUE

[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE

[CSS2] auto check = FALSE, global info:
Connect to [CSS2] failed, please check the network or the CSSM_CSS_IP config in [dmcssm.ini] .

ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2022-06-15 11:20:06    CSS0          0         5336    Control Node OPEN               WORKING      OK           TRUE         11436             19336           
        2022-06-15 11:20:06    CSS1          1         5337    Normal Node  OPEN               WORKING      OK           TRUE         13639             21484           
        2022-06-15 11:20:06    CSS2          2         5338    Normal Node  OPEN               SYSHALT      OK           FALSE        1657983           1659342         

=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2022-06-15 11:20:06    ASM0          0         5436    Control Node OPEN               WORKING      OK           TRUE         16973             24856           
        2022-06-15 11:20:06    ASM1          1         5437    Normal Node  OPEN               WORKING      OK           TRUE         18641             26471           
        2022-06-15 11:20:06    ASM2          2         5438    Normal Node  OPEN               SYSHALT      ERROR        FALSE        1671014           1672330         

=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2022-06-15 11:20:06    DSC0          0         5236    Control Node OPEN               WORKING      OK           TRUE         1948071           1949983         
        2022-06-15 11:20:06    DSC1          1         5236    Normal Node  OPEN               WORKING      OK           TRUE         1932385           1934297         
        2022-06-15 11:20:06    DSC2          2         5237    Normal Node  STARTUP            WORKING      ERROR        FALSE        2054844           2054849         

==================================================================================================================

二、清理故障节点

2.1、导出DSC集群DCR盘信息

[dmdba@dmdsc01 ~]$ dmasmcmd
DMASMCMD V8
ASM>export dcrdisk '/dev/raw/raw1' to '/home/dmdba/dmdcr_cfg_bak.ini'
ASMCMD export DCRDISK success.
Used time: 5.945(ms).

2.2、清理日志文件

--删除之前添加的日志文件
[dmdba@dmdsc01 ~]$ dmasmtool dcr_ini=/dm8/dsc/config/dmdcr.ini
DMASMTOOL V8
ASM>
ASM>ls
+
disk groups total [4]......
NO.1     name: DMLOG
NO.2     name: DMDATA
NO.3     name: VOTE
NO.4     name: DCR
Used time: 5.918(ms).
ASM>cd DMLOG
+DMLOG
Used time: 6.172(ms).
ASM>ls
        dir : log
total count 1.
Used time: 3.322(ms).
ASM>cd log
+DMLOG/log
Used time: 2.544(ms).
ASM>ls
        file : dsc0_log01.log
        file : dsc0_log02.log
        file : dsc1_log01.log
        file : dsc1_log02.log
        file : DSC2_log01.log
        file : DSC2_log02.log
total count 6.
Used time: 4.659(ms).
ASM>
ASM>rm -fr DSC2_log01.log
Used time: 5.492(ms).
ASM>rm -fr DSC2_log02.log
Used time: 7.421(ms).
ASM>ls
        file : dsc0_log01.log
        file : dsc0_log02.log
        file : dsc1_log01.log
        file : dsc1_log02.log
total count 4.
Used time: 3.657(ms).
ASM>

2.3、关闭实例、css、server服务

--节点1
[dmdba@dmdsc01 config]$ DmServiceDSC stop
Stopping DmServiceDSC: [ OK ]
[dmdba@dmdsc01 config]$ DmCSSServiceCSS stop
Starting DmCSSServiceCSS: [ OK ]

--节点2
[dmdba@dmdsc02 config]$ DmServiceDSC stop
Stopping DmServiceDSC: [ OK ]
[dmdba@dmdsc02 config]$ DmCSSServiceCSS stop
Starting DmCSSServiceCSS: [ OK ]

2.4、修改dmdcr_cfg_bak.ini文件

还原之前的配置信息,将新增加的节点信息都删除:
主要是参数:DCR_GRP_N_EP和DCR_GRP_EP_ARR

--删除之前:
[dmdba@dmdsc01 ~]$ cat dmdcr_cfg_bak1.ini 
# the file is auto-created by system, self edit is invalid!
#DCR HDR
DCR_N_GRP              = 3
DCR_VTD_PATH           = /dev/raw/raw2
DCR_OGUID              = 45331

[GRP]
DCR_GRP_TYPE           = CSS
DCR_GRP_NAME           = GRP_CSS
DCR_GRP_N_EP           = 3
DCR_GRP_EP_ARR         = {0,1,2}
DCR_GRP_N_ERR_EP       = 0
DCR_GRP_ERR_EP_ARR     = {}
DCR_GRP_DSKCHK_CNT     = 60

[GRP]
DCR_GRP_TYPE           = ASM
DCR_GRP_NAME           = GRP_ASM
DCR_GRP_N_EP           = 3
DCR_GRP_EP_ARR         = {0,1,2}
DCR_GRP_N_ERR_EP       = 0
DCR_GRP_ERR_EP_ARR     = {}
DCR_GRP_DSKCHK_CNT     = 60

[GRP]
DCR_GRP_TYPE           = DB
DCR_GRP_NAME           = GRP_DSC
DCR_GRP_N_EP           = 3
DCR_GRP_EP_ARR         = {0,1,2}
DCR_GRP_N_ERR_EP       = 0
DCR_GRP_ERR_EP_ARR     = {}
DCR_GRP_DSKCHK_CNT     = 60

[GRP_CSS]
DCR_EP_NAME        = CSS0
DCR_EP_HOST        = 192.168.10.100
DCR_EP_PORT        = 5336

[GRP_CSS]
DCR_EP_NAME        = CSS1
DCR_EP_HOST        = 192.168.10.101
DCR_EP_PORT        = 5337

[GRP_CSS]
DCR_EP_NAME        = CSS2
DCR_EP_HOST        = 192.168.10.102
DCR_EP_PORT        = 5338

[GRP_ASM]
DCR_EP_NAME        = ASM0
DCR_EP_SHM_KEY     = 93360
DCR_EP_SHM_SIZE    = 10
DCR_EP_HOST        = 192.168.10.100
DCR_EP_PORT        = 5436
DCR_EP_ASM_LOAD_PATH  = /dev/raw

[GRP_ASM]
DCR_EP_NAME        = ASM1
DCR_EP_SHM_KEY     = 93361
DCR_EP_SHM_SIZE    = 10
DCR_EP_HOST        = 192.168.10.101
DCR_EP_PORT        = 5437
DCR_EP_ASM_LOAD_PATH  = /dev/raw

[GRP_ASM]
DCR_EP_NAME        = ASM2
DCR_EP_SHM_KEY     = 93362
DCR_EP_SHM_SIZE    = 10
DCR_EP_HOST        = 192.168.10.102
DCR_EP_PORT        = 5438
DCR_EP_ASM_LOAD_PATH  = /dev/raw

[GRP_DSC]
DCR_EP_NAME        = DSC0
DCR_EP_SEQNO       = 0
DCR_EP_PORT        = 5236
DCR_CHECK_PORT     = 5536

[GRP_DSC]
DCR_EP_NAME        = DSC1
DCR_EP_SEQNO       = 1
DCR_EP_PORT        = 5236
DCR_CHECK_PORT     = 5537

[GRP_DSC]
DCR_EP_NAME        = DSC2
DCR_EP_SEQNO       = 2
DCR_EP_PORT        = 5236
DCR_CHECK_PORT     = 5538

--删除之后:
[dmdba@dmdsc01 ~]$ cat dmdcr_cfg_bak.ini 
# the file is auto-created by system, self edit is invalid!
#DCR HDR
DCR_N_GRP              = 3
DCR_VTD_PATH           = /dev/raw/raw2
DCR_OGUID              = 45331

[GRP]
DCR_GRP_TYPE           = CSS
DCR_GRP_NAME           = GRP_CSS
DCR_GRP_N_EP           = 2
DCR_GRP_EP_ARR         = {0,1}
DCR_GRP_N_ERR_EP       = 0
DCR_GRP_ERR_EP_ARR     = {}
DCR_GRP_DSKCHK_CNT     = 60

[GRP]
DCR_GRP_TYPE           = ASM
DCR_GRP_NAME           = GRP_ASM
DCR_GRP_N_EP           = 2
DCR_GRP_EP_ARR         = {0,1}
DCR_GRP_N_ERR_EP       = 0
DCR_GRP_ERR_EP_ARR     = {}
DCR_GRP_DSKCHK_CNT     = 60

[GRP]
DCR_GRP_TYPE           = DB
DCR_GRP_NAME           = GRP_DSC
DCR_GRP_N_EP           = 2
DCR_GRP_EP_ARR         = {0,1}
DCR_GRP_N_ERR_EP       = 0
DCR_GRP_ERR_EP_ARR     = {}
DCR_GRP_DSKCHK_CNT     = 60

[GRP_CSS]
DCR_EP_NAME        = CSS0
DCR_EP_HOST        = 192.168.10.100
DCR_EP_PORT        = 5336

[GRP_CSS]
DCR_EP_NAME        = CSS1
DCR_EP_HOST        = 192.168.10.101
DCR_EP_PORT        = 5337

[GRP_ASM]
DCR_EP_NAME        = ASM0
DCR_EP_SHM_KEY     = 93360
DCR_EP_SHM_SIZE    = 10
DCR_EP_HOST        = 192.168.10.100
DCR_EP_PORT        = 5436
DCR_EP_ASM_LOAD_PATH  = /dev/raw

[GRP_ASM]
DCR_EP_NAME        = ASM1
DCR_EP_SHM_KEY     = 93361
DCR_EP_SHM_SIZE    = 10
DCR_EP_HOST        = 192.168.10.101
DCR_EP_PORT        = 5437
DCR_EP_ASM_LOAD_PATH  = /dev/raw

[GRP_DSC]
DCR_EP_NAME        = DSC0
DCR_EP_SEQNO       = 0
DCR_EP_PORT        = 5236
DCR_CHECK_PORT     = 5536

[GRP_DSC]
DCR_EP_NAME        = DSC1
DCR_EP_SEQNO       = 1
DCR_EP_PORT        = 5236
DCR_CHECK_PORT     = 5537

2.5、重新初始化DCR和VOTE盘

--初始化DCR
ASM>init dcrdisk '/dev/raw/raw1' from '/home/dmdba/dmdcr_cfg_bak.ini' identified by 'aaabbb'
[Trace]DG 126 alloc one extent for inodes, addr(disk_id, disk_auno, extent_no):(0,0,1).
[Trace]DG 126 allocate 4 extents for file 0xfe000002.
[Trace]DG 126 alloc 4 extents for 0xfe000002, addr(disk_id, disk_auno, extent_no):(0, 0, 2)->(0, 0, 5), need_init = 1.
Used time: 00:00:14.514.

--初始化VOTE
ASM>init votedisk '/dev/raw/raw2' from '/home/dmdba/dmdcr_cfg_bak.ini'
[Trace]DG 125 alloc one extent for inodes, addr(disk_id, disk_auno, extent_no):(0,0,1).
[Trace]DG 125 allocate 4 extents for file 0xfd000002.
[Trace]DG 125 alloc 4 extents for 0xfd000002, addr(disk_id, disk_auno, extent_no):(0, 0, 2)->(0, 0, 5), need_init = 1.
Used time: 00:00:14.486.

2.6、启动CSS服务

--节点1
[dmdba@dmdsc01 config]$ DmCSSServiceCSS start
Starting DmCSSServiceCSS: [ OK ]

--节点2
[dmdba@dmdsc02 config]$ DmCSSServiceCSS start
Starting DmCSSServiceCSS: [ OK ]

2.7、查看cssm监视器信息

Show
monitor current time:2022-06-15 12:02:42, n_group:3
=================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 0] ========================================

[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = TRUE
[DSC0] auto restart = TRUE

[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE


ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2022-06-15 12:02:42    CSS0          0         5336    Control Node OPEN               WORKING      OK           TRUE         2798325           2799883         
        2022-06-15 12:02:42    CSS1          1         5337    Normal Node  OPEN               WORKING      OK           TRUE         2785459           2787009         

=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2022-06-15 12:02:42    ASM0          0         5436    Control Node OPEN               WORKING      OK           TRUE         2811702           2813217         
        2022-06-15 12:02:42    ASM1          1         5437    Normal Node  OPEN               WORKING      OK           TRUE         2798831           2800339         

=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2022-06-15 12:02:42    DSC0          0         5236    Control Node SHUTDOWN             WORKING      OK          FALSE         3109952           3110826         
        2022-06-15 12:02:42    DSC1          1         5236    Normal Node  SHUTDOWN             WORKING      OK          FALSE         3096344           3097212         

==================================================================================================================

2.8、发现DB无法启动

[dmdba@dmdsc01 log]$ DmServiceDSC start
Starting DmServiceDSC: connnect dmasmtool successfully.
[ FAILED ]
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 45332
+DMLOG/log/DSC2_log01.log not exist, can not startup

--说明控制文件中还记录着日志的配置信息

2.9、重新编辑dm.ctl控制文件

--将控制文件转换成文本
[dmdba@dmdsc01 ~]$ dmctlcvt type=1 src=+DMDATA/data/dsc/dm.ctl dest=/home/dmdba/dmctrl.txt dcr_ini=/dm8/dsc/config/dmdcr.ini
DMCTLCVT V8
convert ctl to txt success!

--找到相应的日志信息并删除
fil_path=+DMLOG/log/DSC2_log01.log
# mirror path
mirror_path=
# file id
fil_id=0
# whether the file is auto extend
autoextend=1
# file create time
fil_create_time=DATETIME '2022-6-10 10:11:38'
# file modify time
fil_modify_time=DATETIME '2022-6-10 10:11:38'
# the max size of file
fil_max_size=0
# next size of file
fil_next_size=0

# file path
fil_path=+DMLOG/log/DSC2_log02.log
# mirror path
mirror_path=
# file id
fil_id=1
# whether the file is auto extend
autoextend=1
# file create time
fil_create_time=DATETIME '2022-6-10 10:11:39'
# file modify time
fil_modify_time=DATETIME '2022-6-10 10:11:39'
# the max size of file
fil_max_size=0
# next size of file
fil_next_size=0

--转换回控制文件
[dmdba@dmdsc01 ~]$ dmctlcvt type=2 src=/home/dmdba/dmctrl.txt dest=+DMDATA/data/dsc/dm.ctl dcr_ini=/dm8/dsc/config/dmdcr.ini
DMCTLCVT V8
convert txt to ctl success!

2.10、CSSM已经自动将DB启动

--查看监视器,发现DB已经自动起来了
show

monitor current time:2022-06-15 12:42:16, n_group:3
=================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 1] ========================================

[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = TRUE
[DSC0] auto restart = TRUE

[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE


ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2022-06-15 12:42:16    CSS0          0         5336    Normal Node  OPEN               WORKING      OK           TRUE         6915              8110            
        2022-06-15 12:42:16    CSS1          1         5337    Control Node OPEN               WORKING      OK           TRUE         6005              7452            

=================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 1] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2022-06-15 12:42:16    ASM0          0         5436    Normal Node  OPEN               WORKING      OK           TRUE         12343             13518           
        2022-06-15 12:42:16    ASM1          1         5437    Control Node OPEN               WORKING      OK           TRUE         11507             12936           

=================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:     css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
        2022-06-15 12:42:16    DSC0          0         5236    Control Node OPEN               WORKING      OK           TRUE         18290             19447           
        2022-06-15 12:42:16    DSC1          1         5236    Normal Node  OPEN               WORKING      OK           TRUE         44410             45739           

==================================================================================================================

三、问题及解决方法

  在清理完故障节点后,想重新进行动态扩展,在添加日志的时候报错如下(有遇见过该问题的高手帮忙指导一下):

SQL> alter database add node logfile '+DMLOG/log/dsc2_log01.log' size 2048, '+DMLOG/log/dsc2_log02.log' size 2048;
alter database add node logfile '+DMLOG/log/dsc2_log01.log' size 2048, '+DMLOG/log/dsc2_log02.log' size 2048;
第1 行附近出现错误[-3444]:扩展节点操作未完成,不能再次扩展节点.
已用时间: 1.832(毫秒). 执行号:0.

--解决方法
[dmdba@dmdsc01 ~]$ dmctlcvt type=1 src=+DMDATA/data/dsc/dm.ctl dest=/home/dmdba/dm_ctl11.txt dcr_ini=/dm8/dsc/config/dmdcr.ini
DMCTLCVT V8
convert txt to ctl success!

--删除了这部分内容
# table space name
ts_name=RLOG
 # table space ID
ts_id=2
# table space status
ts_state=0
# table space cache
ts_cache=
# DSC node number
ts_nth=2
# DSC optimized node number
ts_opt_node=0
# table space create time
ts_create_time=DATETIME '2022-6-16 10:19:6'
# table space modify time
ts_modify_time=DATETIME '2022-6-16 10:19:6'
# table space encrypt flag
ts_encrypt_flag=0
# table space copy num
ts_copy_num=0
# table space region size flag
ts_size_flag=0
# table space region huge size flag
ts_huge_size_flag=0

[dmdba@dmdsc01 ~]$ dmctlcvt type=2 dest=+DMDATA/data/dsc/dm.ctl src=/home/dmdba/dm_ctl11.txt dcr_ini=/dm8/dsc/config/dmdcr.ini
DMCTLCVT V8
convert txt to ctl success!

[dmdba@dmdsc01 ~]$ disql SYSDBA/SYSDBA

服务器[LOCALHOST:5236]:处于普通打开状态
登录使用时间 : 5.016(ms)
disql V8
SQL> alter database add node logfile '+DMLOG/log/dsc2_log01.log' size 2048,'+DMLOG/log/dsc2_log02.log' size 2048;
操作已执行
已用时间: 504.634(毫秒). 执行号:300.

社区地址:https://eco.dameng.com

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值