DM8 2节点DSC+DW搭建及故障测试


为什么要用DSC+DW

高可用集群常见的分为,机房高可用,同城高可用和异地高可用三层,针对DM8而言,DSC或者DW都可以满足机房高可用的场景,而同城高可用则可以通过DSC+DW来进行实现,本次将对此架构进行搭建和测试的展示


基础环境规划

本次环境基于4台虚拟机进行,硬件及操作系统如下

类别A机器B机器C机器D机器
CPUi5 1.60GHz * 2 corei5 1.60GHz * 2 corei5 1.60GHz * 2 corei5 1.60GHz * 2 core
本地盘20G20G20G20G
共享盘20G20GN/AN/A
内存4G4G4G2G
网卡1000MB * 21000MB * 21000MB * 21000MB * 2
OSCentOS 7.7.1908 (Core)CentOS 7.7.1908 (Core)CentOS 7.7.1908 (Core)CentOS 7.7.1908 (Core)
KERNEL3.10.0-1160.59.1.el7.x86_643.10.0-1160.59.1.el7.x86_643.10.0-1160.59.1.el7.x86_643.10.0-1160.59.1.el7.x86_64

软件环境规划

D机器作为monitor节点,仅需要安装软件即可
A/B/C机器两块网卡分别为HOST ONLY和INTNET方式,模拟以真实机器为APP,HOST ONLY网卡作为业务网,INTNET网卡作为为内部心跳及INTERCONNECT通信

类别A机器B机器C机器
主机名dmdsc0dmdsc1dmdw0
业务IP192.168.56.7192.168.56.8192.168.56.24
心跳IP10.30.5.1710.30.5.1810.30.5.24
实例名DSC0DSC1DW0
端口号523652365238
安装介质目录/opt/dsc/setup/opt/dsc/setup/opt/dw/setup
软件安装目录/opt/dsc/dmdbms/opt/dsc/dmdbms/opt/dw/dmdbms
配置文件目录/opt/dsc/config/opt/dsc/config/opt/dw/config
本地归档目录/opt/dsc/arch_0/opt/dsc/arch_1/opt/dw/arch
远程归档目录/opt/dsc/arch_1/opt/dsc/arch_0N/A
归档上限102410241024
实时归档实例DWDWDSC0/DSC1
备份目录/opt/dsc/bak/opt/dsc/bak/opt/dw/bak
监视器IP10.30.5.18810.30.5.18810.30.5.188

dmdcr_cfg

类别参数A机器B机器
CSSDCR_EP_NAMECSS0CSS1
DCR_EP_HOST10.30.5.1710.30.5.18
DCR_EP_PORT53365337
ASMDCR_EP_NAMEASM0ASM1
DCR_EP_HOST192.168.56.7192.168.56.8
DCR_EP_PORT54365437
DCR_EP_SHM_KEY9336093361
DCR_EP_SHM_SIZE1010
DCR_EP_ASM_LOAD_PATH/dev/raw/dev/raw
DBDCR_EP_NAMEDSC0DSC1
DCR_EP_PORT52365237
DCR_EP_SEQNO01
DCR_CHECK_PORT55365537
DCR_OGUID4533145331

dmasvrmal

参数A机器B机器
MAL_INST_NAMEASM0ASM1
MAL_HOST10.30.5.1710.30.5.18
MAL_PORT56365637

dmmal

参数A机器B机器C机器
MAL_PORT573657375738
MAL_DW_PORT583658375838
MAL_INST_DW_PORT593659375938

dmwatcher

组名参数A机器B机器C机器
GRP1DW_TYPEGLOBALGLOBALGLOBAL
-DW_MODEAUTOAUTOAUTO
-DW_ERROR_TIME606060
-INST_RECOVER_TIME606060
-INST_ERROR_TIME353535
-INST_INI/opt/dsc/config/dsc0_config/dm.ini/opt/dsc/config/dsc1_config/dm.ini/opt/dw/data/DAMENG/dm.ini
-DCR_INI/opt/dsc/config/dmdcr.ini/opt/dsc/config/dmdcr.iniN/A
-INST_OGUID/453324533245332
-INST_STARTUP_CMD/opt/dsc/dmdbms/bin/dmserver/opt/dsc/dmdbms/bin/dmserver/opt/dw/dmdbms/bin/dmserver
-INST_AUTO_RESTART000
-RLOG_SEND_THRESHOLD000
-RLOG_APPLY_THRESHOLD000

通用环境准备

主要包括一些常规步骤,本篇核心在于集群本身搭建和故障测试,仅列举一下内容

  • 主机名规划/hosts添加
  • 防火墙/selinux关闭
  • 网卡规划及配置
  • Limit/Kernel调整
  • 共享磁盘挂载分区及调度算法调整
  • 裸设备绑定
  • Swap禁用
  • 时间同步配置
  • 用户及目录创建
  • 数据库软件安装

2节点DSC搭建

准备dmdcr_cfg.ini

在配置文件目录下,编写用于初始化dcr和vote的配置文件

[dmdba@dmdsc0 home]$ cd /opt/dsc/config/
[dmdba@dmdsc0 config]$ vi dmdcr_cfg.ini
DCR_N_GRP = 3
DCR_VTD_PATH = /dev/raw/raw2
DCR_OGUID = 45331

[GRP]
  DCR_GRP_TYPE = CSS
  DCR_GRP_NAME = GRP_CSS
  DCR_GRP_N_EP = 2
  DCR_GRP_DSKCHK_CNT = 60

[GRP_CSS]
  DCR_EP_NAME = CSS0
  DCR_EP_HOST = 10.30.5.17
  DCR_EP_PORT = 5336

[GRP_CSS]
  DCR_EP_NAME = CSS1
  DCR_EP_HOST = 10.30.5.18
  DCR_EP_PORT = 5337

[GRP]
  DCR_GRP_TYPE = ASM
  DCR_GRP_NAME = GRP_ASM
  DCR_GRP_N_EP = 2
  DCR_GRP_DSKCHK_CNT = 60

[GRP_ASM]
  DCR_EP_NAME = ASM0
  DCR_EP_SHM_KEY = 93360
  DCR_EP_SHM_SIZE = 10
  DCR_EP_HOST = 192.168.56.7
  DCR_EP_PORT = 5436
  DCR_EP_ASM_LOAD_PATH = /dev/raw

[GRP_ASM]
  DCR_EP_NAME = ASM1
  DCR_EP_SHM_KEY = 93361
  DCR_EP_SHM_SIZE = 10
  DCR_EP_HOST = 192.168.56.8
  DCR_EP_PORT = 5437
  DCR_EP_ASM_LOAD_PATH = /dev/raw

[GRP]
  DCR_GRP_TYPE = DB
  DCR_GRP_NAME = GRP_DSC
  DCR_GRP_N_EP = 2
  DCR_GRP_DSKCHK_CNT = 60

[GRP_DSC]
  DCR_EP_NAME = DSC0
  DCR_EP_SEQNO = 0
  DCR_EP_PORT = 5236
  DCR_CHECK_PORT = 5536

[GRP_DSC]
  DCR_EP_NAME = DSC1
  DCR_EP_SEQNO = 1
  DCR_EP_PORT = 5236
  DCR_CHECK_PORT = 5537

标记ASM磁盘

通过asmcmd标记裸设备,初始化设备头信息,只在一台机器上进行即可

A机器

[dmdba@dmdsc0 config]$ /opt/dsc/dmdbms/bin/dmasmcmd
DMASMCMD V8
ASM>create dcrdisk '/dev/raw/raw1' 'dcr'
[Trace]The ASM initialize dcrdisk /dev/raw/raw1 to name DMASMdcr
Used time: 00:00:05.449.
ASM>create votedisk '/dev/raw/raw2' 'vote'
[Trace]The ASM initialize votedisk /dev/raw/raw2 to name DMASMvote
Used time: 14.441(ms).
ASM>create asmdisk '/dev/raw/raw3' 'LOG0'
[Trace]The ASM initialize asmdisk /dev/raw/raw3 to name DMASMLOG0
Used time: 15.057(ms).
ASM>create asmdisk '/dev/raw/raw4' 'DATA0'
[Trace]The ASM initialize asmdisk /dev/raw/raw4 to name DMASMDATA0
Used time: 14.499(ms).

初始化dcr/vote

通过asmcmd将此前配置的dmdcr_cfg.ini写入到dcrdisk和votedisk中,只在一台机器执行即可

A机器

ASM>init dcrdisk '/dev/raw/raw1' from '/opt/dsc/config/dmdcr_cfg.ini' identified by 'abcd'
[Trace]DG 126 alloc one extent for inodes, addr(disk_id, disk_auno, extent_no):(0,0,1).
[Trace]DG 126 allocate 4 extents for file 0xfe000002.
[Trace]DG 126 alloc 4 extents for 0xfe000002, addr(disk_id, disk_auno, extent_no):(0, 0, 2)->(0, 0, 5), need_init = 1.
Used time: 234.261(ms).

ASM>init votedisk '/dev/raw/raw2' from '/opt/dsc/config/dmdcr_cfg.ini'
[Trace]DG 125 alloc one extent for inodes, addr(disk_id, disk_auno, extent_no):(0,0,1).
[Trace]DG 125 allocate 4 extents for file 0xfd000002.
[Trace]DG 125 alloc 4 extents for 0xfd000002, addr(disk_id, disk_auno, extent_no):(0, 0, 2)->(0, 0, 5), need_init = 1.
Used time: 116.459(ms).
ASM>exit

配置dmasvrmal.ini

配置ASM使用的MAL信息,实例名与dmdcr_cfg中ASM组配置一致,A/B机器上配置相同

A机器

[dmdba@dmdsc0 config]$ vi /opt/dsc/config/dmasvrmal.ini 
[MAL_INST1]
  MAL_INST_NAME = ASM0
  MAL_HOST = 10.30.5.17
  MAL_PORT = 5636  

[MAL_INST2]
  MAL_INST_NAME = ASM1
  MAL_HOST = 10.30.5.18
  MAL_PORT = 5637

B机器

从A机器拷贝即可

配置dmdcr.ini

通过dmdcr.ini配置dcr盘对应裸设备,ASM使用的MAL配置文件,及自身在集群中的节点编号,以便于使其他命令使用时能正确读取dcr内容,与ASM MAL进行通信,以及根据配置自动拉起ASM或DB服务

A机器

[dmdba@dmdsc0 ~]$ vi /opt/dsc/config/dmdcr.ini
DMDCR_PATH = /dev/raw/raw1
DMDCR_MAL_PATH = /opt/dsc/config/dmasvrmal.ini
DMDCR_SEQNO = 0

#DMDCR_ASM_RESTART_INTERVAL = 30                
#DMDCR_ASM_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmasmsvr dcr_ini=/opt/dsc/config/dmdcr.ini

#DMDCR_DB_RESTART_INTERVAL = 60  
#DMDCR_DB_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini

B机器

[dmdba@dmdsc1 ~]$ vi /opt/dsc/config/dmdcr.ini
DMDCR_PATH = /dev/raw/raw1
DMDCR_MAL_PATH = /opt/dsc/config/dmasvrmal.ini
DMDCR_SEQNO = 1

#DMDCR_ASM_RESTART_INTERVAL = 30                
#DMDCR_ASM_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmasmsvr dcr_ini=/opt/dsc/config/dmdcr.ini

#DMDCR_DB_RESTART_INTERVAL = 60 
#DMDCR_DB_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini

配置阶段将自动拉起注释掉,可以避免一些麻烦 😛

手动启动CSS/ASM

A机器

[dmdba@dmdsc0 ~]$ /opt/dsc/dmdbms/bin/dmcss DCR_INI=/opt/dsc/config/dmdcr.ini
DMCSS V8
DMCSS IS READY
[2022-03-31 16:30:39:877] [CSS]: Set EP CSS0[0] as Control node

[dmdba@dmdsc0 config]$ /opt/dsc/dmdbms/bin/dmasmsvr DCR_INI=/opt/dsc/config/dmdcr.ini

ASM SELF EPNO:0
DMASMSVR V8
dmasmsvr task worker thread startup
the ASM server is Ready.
check css cmd: START NOTIFY, cmd_seq: 19
check css cmd: EP START, cmd_seq: 20

ASM Control Node EPNO:0
check css cmd: EP OPEN, cmd_seq: 27
check css cmd: EP REAL OPEN, cmd_seq: 30 

B机器

[dmdba@dmdsc1 ~]$ /opt/dsc/dmdbms/bin/dmcss DCR_INI=/opt/dsc/config/dmdcr.ini
DMCSS V8
DMCSS IS READY
[2022-03-31 16:32:00:151] [CSS]: Set EP CSS0[0] as Control node

[dmdba@dmdsc1 config]$ /opt/dsc/dmdbms/bin/dmasmsvr DCR_INI=/opt/dsc/config/dmdcr.ini

ASM SELF EPNO:1
DMASMSVR V8
dmasmsvr task worker thread startup
the ASM server is Ready.
check css cmd: EP START, cmd_seq: 22

ASM Control Node EPNO:0
check css cmd: EP OPEN, cmd_seq: 28
check css cmd: EP REAL OPEN, cmd_seq: 31

一个小问题

最初启动css时依赖库无法打开
/opt/dsc/dmdbms/bin/dmcss: error while loading shared libraries: libdmcalc.so: cannot open shared object file: No such file or directory
实际依赖库是能够找到的,包括二层依赖也正常,但从全局库缓存发现缺失,将其添加到全局库中,并重建缓存
[dmdba@dmdsc0 ~]$ ldconfig -p|grep libdmcalc
[root@dmdsc0 config]# vi /etc/ld.so.conf.d/dm.conf
/opt/dsc/dmdbms/bin
[root@dmdsc0 config]# ldconfig
ldconfig: /opt/dsc/dmdbms/bin/libxerces-c-3.1.so is not a symbolic link
[root@dmdsc0 config]# ldconfig -p|grep libdmcalc
libdmcalc.so (libc6,x86-64) => /opt/dsc/dmdbms/bin/libdmcalc.so
此后便不会有依赖问题,当然实际上进到so所在的位置去执行也不会有问题,anyway,it dpends on u.

创建 ASM磁盘组

通过dmasmtool创建ASM管理的磁盘组,只在一个节点进行即可

A机器

[dmdba@dmdsc0 ~]$ /opt/dsc/dmdbms/bin/dmasmtool DCR_INI=/opt/dsc/config/dmdcr.ini
DMASMTOOL V8
ASM>create diskgroup 'DMLOG' asmdisk '/dev/raw/raw3'
Used time: 38.129(ms).
ASM>create diskgroup 'DMDATA' asmdisk '/dev/raw/raw4'
Used time: 46.751(ms).

配置dminit.ini

配置数据库实例初始化文件,写入ASM中的实例初始化只能通过配置文件操作完成

[dmdba@dmdsc0 config]$ vi dminit.ini
  db_name = dsc
  system_path = +DMDATA/data
  main = +DMDATA/data/dsc/main.dbf
  main_size = 128
  roll = +DMDATA/data/dsc/roll.dbf
  roll_size = 128
  system = +DMDATA/data/dsc/system.dbf
  system_size = 128
  ctl_path = +DMDATA/data/dsc/dm.ctl
  ctl_size = 8
  log_size = 256
  dcr_path = /dev/raw/raw1
  dcr_seqno = 0
  auto_overwrite = 1
  PAGE_SIZE = 32
  CASE_SENSITIVE = Y
  CHARSET = 0

[DSC0]
  config_path = /opt/dsc/config/dsc0_config
  port_num = 5236
  mal_host = 10.30.5.17
  mal_port = 5736
  log_path = +DMLOG/log/dsc0_log01.log
  log_path = +DMLOG/log/dsc0_log02.log

[DSC1]
  config_path = /opt/dsc/config/dsc1_config
  port_num = 5236
  mal_host = 10.30.5.18
  mal_port = 5737
  log_path = +DMLOG/log/dsc1_log01.log
  log_path = +DMLOG/log/dsc1_log02.log

初始化DSC实例

A机器

[dmdba@dmdsc0 ~]$ /opt/dsc/dmdbms/bin/dminit control=/opt/dsc/config/dminit.ini
initdb V8
db version: 0x7000c
file dm.key not found, use default license!
License will expire on 2023-03-04
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL

log file path: +DMLOG/log/dsc0_log01.log

log file path: +DMLOG/log/dsc0_log02.log

log file path: +DMLOG/log/dsc1_log01.log

log file path: +DMLOG/log/dsc1_log02.log

write to dir [+DMDATA/data/dsc].
create dm database success. 2022-04-07 17:24:55

拷贝配置文件到其他节点

[dmdba@dmdsc0 ~]$ scp -rp  /opt/dsc/config/dsc1_config 192.168.56.8:/opt/dsc/config/
dm.ini               100%   56KB  27.2MB/s   00:00    
sqllog.ini            100%  481   640.0KB/s   00:00    
dmmal.ini            100%  200   295.8KB/s   00:00

配置DSC归档

A机器

[dmdba@dmdsc0 dsc0_config]$ vi dmarch.ini
[ARCHIVE_LOCAL1]
 ARCH_TYPE = LOCAL
 ARCH_DEST = /opt/dsc/arch_0
 ARCH_FILE_SIZE  = 1024
 ARCH_SPACE_LIMIT = 1024
[ARCH_REMOTE1]
 ARCH_TYPE = REMOTE
 ARCH_DEST = DSC1
 ARCH_INCOMING_PATH = /opt/dsc/arch_1
 ARCH_FILE_SIZE = 1024
 ARCH_SPACE_LIMIT = 1024
 
[dmdba@dmdsc0 dsc0_config]$ vi dm.ini
ARCH_INI = 1

B机器

[dmdba@dmdsc1 dsc1_config]$ vi dmarch.ini
[ARCHIVE_LOCAL1]
  ARCH_TYPE = LOCAL
  ARCH_DEST = /opt/dsc/arch_1
  ARCH_FILE_SIZE = 1024
  ARCH_SPACE_LIMIT = 1024
[ARCH_REMOTE1]
  ARCH_TYPE = REMOTE
  ARCH_DEST = DSC0
  ARCH_INCOMING_PATH = /opt/dsc/arch_0
  ARCH_FILE_SIZE = 1024
  ARCH_SPACE_LIMIT = 1024

[dmdba@dmdsc1 dsc1_config]$ vi dm.ini
ARCH_INI = 1

启动DSC实例

A机器

[dmdba@dmdsc0 ~]$ /opt/dsc/dmdbms/bin/dmserver /opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
hlck_sys_init, init g_drm_dest:[0, 1]
lbs_sys_init, the length of g_master_map is 1117, fill it use ok_ep_arr:[0, 1], n_ok_ep:2!
check CSS cmd: START NOTIFY, cmd_seq: 2
Control Node change from 255 to 254
check CSS cmd: DCR_LOAD, cmd_seq: 3
check CSS cmd: EP START, cmd_seq: 6
Control Node change from 254 to 0
file lsn: 0
check CSS cmd: EP START2, cmd_seq: 11
ndct db load finished
ckpt2_exec_immediately begin.
file_lsn < cur_lsn & no dirty page & in mount status, ignore checkpoint
checkpoint end, 0 pages flushed, used_space[512], free_space[536862208].
checkpoint: buffer pages flushing...
checkpoint end, 2 pages flushed, used_space[512], free_space[536862208].
checkpoint end, 0 pages flushed, used_space[0], free_space[536862720].
checkpoint end, 0 pages flushed, used_space[0], free_space[536862720].
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
iid page's trxid[1002]
NEXT TRX ID = 1003
pseg_collect_mgr_items, total collect 0 active_trxs, 0 cmt_trxs, 0 pre_cmt_trxs, 0 to_release_trxs, 0 active_pages, 0 cmt_pages, 0 pre_cmt_pages, 0 to_release_pages, 0 mgr pages, 0 mgr recs!
iid page's trxid[2004]
NEXT TRX ID = 3008.
total 0 active crash trx, pseg_crash_trx_rollback sys_only(0) begin ...
pseg_crash_trx_rollback end, total 0 active crash trx, include 0 empty_trxs, 0 empty_pages which only need to delete mgr recs.
pseg_crash_trx_rollback end
pseg recv finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
trx: 3008 purged 1 pages
trx: 3046 purged 1 pages
......
trx: 4238 purged 1 pages
checkpoint for flush ts[65535] buffer...
checkpoint for flush ts[65535] buffer end
systables desc init success.
ndct_db_load_info success.
nsvr_process_before_open begin.
nsvr_process_before_open success.
SYSTEM IS READY.
check CSS cmd: EP OPEN, cmd_seq: 16
iid page's trxid[6020]
NEXT TRX ID = 7024.
[!!!DSC INFO!!!] DSC crash process over!
check CSS cmd: EP REAL OPEN, cmd_seq: 19
total 0 active crash trx, pseg_crash_trx_rollback sys_only(0) begin ...
pseg_crash_trx_rollback end, total 0 active crash trx, include 0 empty_trxs, 0 empty_pages which only need to delete mgr recs.
pseg_crash_trx_rollback end

B机器

[dmdba@dmdsc1 config]$ /opt/dsc/dmdbms/bin/dmserver /opt/dsc/config/dsc1_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
hpc_ini_info_pre_check end, code:0
hlck_sys_init, init g_drm_dest:[0, 1]
lbs_sys_init, the length of g_master_map is 1117, fill it use ok_ep_arr:[0, 1], n_ok_ep:2!
check CSS cmd: DCR_LOAD, cmd_seq: 4
check CSS cmd: EP START, cmd_seq: 8
Control Node change from 255 to 0
mal_tsk_process_g_crash_lsn_bro, ep_seqno(0), crash_lsn(0)
mal_tsk_process_g_crash_lsn_bro, ep_seqno(1), crash_lsn(0)
check CSS cmd: EP START2, cmd_seq: 13
Control node start status: OPEN
EP[1] adjust cur_lsn from [34128] to [34293]
file lsn: 0
ndct db load finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
ckpt2_exec_immediately begin.
file_lsn < cur_lsn & no dirty page & in mount status, ignore checkpoint
checkpoint end, 0 pages flushed, used_space[512], free_space[536862208].
checkpoint end, 0 pages flushed, used_space[0], free_space[536862720].
checkpoint end, 0 pages flushed, used_space[0], free_space[536862720].
pseg recv finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info success.
nsvr_process_before_open begin.
nsvr_process_before_open success.
SYSTEM IS READY.
check CSS cmd: EP OPEN, cmd_seq: 17
iid page's trxid[6017]
NEXT TRX ID = 7021.
check CSS cmd: EP REAL OPEN, cmd_seq: 20

确认DSC归档

A机器

[dmdba@dmdsc0 dsc]$ ls arch_0
ARCHIVE_LOCAL1_0x5E2ABBF0_EP0_2022-04-10_15-08-39.log
[dmdba@dmdsc0 dsc]$ ls arch_1
ARCH_REMOTE1_0x5E2ABBF0_EP1_2022-04-10_15-08-40.log

B机器

[dmdba@dmdsc1 dsc]$ ls arch_0
ARCH_REMOTE1_0x5E2ABBF0_EP0_2022-04-10_15-08-39.log
[dmdba@dmdsc1 dsc]$ ls arch_1
ARCHIVE_LOCAL1_0x5E2ABBF0_EP1_2022-04-10_15-08-40.log

备份DSC库

A机器

SQL> backup database full backupset '/opt/dsc/bak/for_dw_bak';
操作已执行
已用时间: 00:00:08.061. 执行号:5301701.

拷贝到C机器

[dmdba@dmdsc0 bin]$ scp -rp /opt/dsc/bak/for_dw_bak 192.168.56.24:/opt/dw/bak/
dmdba@192.168.56.24's password: 
for_dw_bak.bak                                                                                                                                    100%  777MB  81.1MB/s   00:09    
for_dw_bak_1.bak                                                                                                                                  100%   45KB   8.1MB/s   00:00    
for_dw_bak.meta                                                                                                                                   100%   97KB  25.8MB/s   00:00    

配置CSSM

实际上最终搭建完成dmwatcher将会处理cssm的工作,此处只是用来确认一下DSC状态

C机器

[dmdba@dmdw0 config]$ vi dmcssm.ini
CSSM_OGUID = 45331
CSSM_CSS_IP = 10.30.5.17:5336
CSSM_CSS_IP = 10.30.5.18:5337
CSSM_LOG_PATH = ../log
CSSM_LOG_FILE_SIZE = 256
CSSM_LOG_SPACE_LIMIT = 2048

确认集群状态

通过CSSM确认集群状态正常即可

CSS

show GRP_CSS

monitor current time:2022-04-08 08:57:17
=================== group[name = grp_css, seq = 0, type = CSS, Control Node = 0] ========================================

[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = TRUE
[DSC0] auto restart = TRUE

[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE


ep:	css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
	2022-04-08 08:57:16    CSS0          0         5336    Control Node OPEN               WORKING      OK           TRUE         112679            113577          
	2022-04-08 08:57:16    CSS1          1         5337    Normal Node  OPEN               WORKING      OK           TRUE         148468            149250          

==================================================================================================================

ASM

show GRP_ASM

monitor current time:2022-04-08 08:57:20
=================== group[name = grp_asm, seq = 1, type = ASM, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:	css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
	2022-04-08 08:57:20    ASM0          0         5436    Control Node OPEN               WORKING      OK           TRUE         125753            126613          
	2022-04-08 08:57:20    ASM1          1         5437    Normal Node  OPEN               WORKING      OK           TRUE         161236            161980          

==================================================================================================================

DB

show GRP_DSC

monitor current time:2022-04-20 11:02:08
=================== group[name = grp_dsc, seq = 2, type = DB, Control Node = 0] ========================================

n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)

sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL

crash process over flag is TRUE
ep:	css_time               inst_name     seqno     port    mode         inst_status        vtd_status   is_ok        active       guid              ts              
	2022-04-20 11:02:07    DSC0          0         5236    Control Node OPEN               WORKING      OK           TRUE         277743            284931          
	2022-04-20 11:02:07    DSC1          1         5236    Normal Node  OPEN               WORKING      OK           TRUE         276920            284104          

==================================================================================================================

注册并启动CSS服务

之前手动操作无异常情况下,依次退出DBASM/CSS,打开ASM自动拉起,注册CSS为服务启动
此处并不使用DB自动拉起,将会在之后交给启动到mount模式的单独DB服务进行管理,以便整合DW启停规范模式

A机器

[dmdba@dmdsc0 config]$ vi dmdcr.ini
DMDCR_PATH = /dev/raw/raw1
DMDCR_MAL_PATH = /opt/dsc/config/dmasvrmal.ini
DMDCR_SEQNO = 0

DMDCR_ASM_RESTART_INTERVAL = 30                
DMDCR_ASM_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmasmsvr dcr_ini=/opt/dsc/config/dmdcr.ini

#DMDCR_DB_RESTART_INTERVAL = 60  
#DMDCR_DB_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini

[root@dmdsc0 ~]# sh /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmcss -dcr_ini /opt/dsc/config/dmdcr.ini -p CSS
Finished to create the service (DmCSSServiceCSS)
[root@dmdsc0 ~]# systemctl start DmCSSServiceCSS.service

B机器

[dmdba@dmdsc1 config]$ vi dmdcr.ini
DMDCR_PATH = /dev/raw/raw1
DMDCR_MAL_PATH = /opt/dsc/config/dmasvrmal.ini
DMDCR_SEQNO = 1

DMDCR_ASM_RESTART_INTERVAL = 30                
DMDCR_ASM_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmasmsvr dcr_ini=/opt/dsc/config/dmdcr.ini

#DMDCR_DB_RESTART_INTERVAL = 60  
#DMDCR_DB_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc1_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini

[root@dmdsc1 ~]# sh /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmcss -dcr_ini /opt/dsc/config/dmdcr.ini -p CSS
Finished to create the service (DmCSSServiceCSS)
[root@dmdsc1 ~]# systemctl start DmCSSServiceCSS

DW搭建

由于DW搭建过程中必须将DSC主备同时启动到MOUNT状态,所以也就不考虑在线扩容操作的意义,直接从配置文件进行即可

初始化单实例库

DSC的备份集不允许通过TYPE 2方式直接进行还原,大概因为有多个不同dm.ini,所以此处可以先dminit关键信息来初始化单机库

C机器

[dmdba@dmdw0 bin]$ ./dminit PATH=/opt/dw/data  EXTENT_SIZE=32 PAGE_SIZE=32 LOG_SIZE=256 CASE_SENSITIVE=Y CHARSET=0
initdb V8
db version: 0x7000c
file dm.key not found, use default license!
License will expire on 2023-03-04
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL

 log file path: /opt/dw/data/DAMENG/DAMENG01.log

 log file path: /opt/dw/data/DAMENG/DAMENG02.log
write to dir [/opt/dw/data/DAMENG].
create dm database success. 2022-04-18 18:09:23

还原DSC库到单实例

C机器

[dmdba@dmdw0 dmdbms]$ cd bin
[dmdba@dmdw0 bin]$ ./dmrman
dmrman V8
RMAN> restore database '/opt/dw/data/DAMENG/dm.ini' from backupset '/opt/dw/bak/for_dw_bak';
restore database '/opt/dw/data/DAMENG/dm.ini' from backupset '/opt/dw/bak/for_dw_bak';
file dm.key not found, use default license!
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
[Percent:100.00%][Speed:0.00M/s][Cost:00:00:10][Remaining:00:00:00]                                 
restore successfully.
time used: 00:00:10.167

RMAN> recover database '/opt/dw/data/DAMENG/dm.ini' from backupset '/opt/dw/bak/for_dw_bak';
recover database '/opt/dw/data/DAMENG/dm.ini' from backupset '/opt/dw/bak/for_dw_bak';
Database mode = 0, oguid = 0
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
EP[0]'s cur_lsn[13086226], file_lsn[13086226]
[Percent:100.00%][Speed:0.00PKG/s][Cost:00:00:00][Remaining:00:00:00]                               
recover successfully!
time used: 00:00:04.499

RMAN> recover database '/opt/dw/data/DAMENG/dm.ini' update db_magic;
recover database '/opt/dw/data/DAMENG/dm.ini' update db_magic;
Database mode = 0, oguid = 0
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
EP[0]'s cur_lsn[13086261], file_lsn[13086261]
recover successfully!
time used: 00:00:01.006

修改守护相关参数

A/B机器归档在之前已经打开,如果之前未打开则在此处需要配置

A机器

[dmdba@dmdsc0 bin]$ vi /opt/dsc/config/dsc0_config/dm.ini
DW_INACTIVE_INTERVAL = 60 
ALTER_MODE_STATUS = 0   
ENABLE_OFFLINE_TS = 2  
RLOG_SEND_APPLY_MON = 64 

B机器

[dmdba@dmdsc1 ~]$ vi /opt/dsc/config/dsc1_config/dm.ini
DW_INACTIVE_INTERVAL = 60 
ALTER_MODE_STATUS = 0   
ENABLE_OFFLINE_TS = 2  
RLOG_SEND_APPLY_MON = 64 

C机器

[dmdba@dmdw0 dmdbms]$ vi /opt/dw/data/DAMENG/dm.ini
INSTANCE_NAME = DW0
PORT_NUM = 5238
DW_INACTIVE_INTERVAL = 60 
ALTER_MODE_STATUS = 0 
ENABLE_OFFLINE_TS = 2    
MAL_INI = 1 
ARCH_INI = 1 
RLOG_SEND_APPLY_MON = 64 

修改MAL配置

A机器

[dmdba@dmdsc0 bin]$ vi /opt/dsc/config/dsc0_config/dmmal.ini
[mal_inst0]
   mal_inst_name  = DSC0
   mal_host       = 10.30.5.17
   mal_port       = 5736
   mal_inst_host = 192.168.56.7
   mal_inst_port = 5236
   mal_dw_port = 5836
   mal_inst_dw_port = 5936

[mal_inst1]
   mal_inst_name  = DSC1
   mal_host       = 10.30.5.18
   mal_port       = 5737
   mal_inst_host = 192.168.56.8
   mal_inst_port = 5236
   mal_dw_port = 5837
   mal_inst_dw_port = 5937

[mal_inst2]
   mal_inst_name  = DW0
   mal_host       = 10.30.5.24
   mal_port       = 5738
   mal_inst_host = 192.168.56.24
   mal_inst_port = 5238
   mal_dw_port = 5838
   mal_inst_dw_port = 5938

B机器

从A拷贝即可

[dmdba@dmdsc0 bin]$ scp -rp /opt/dsc/config/dsc0_config/dmmal.ini 192.168.56.8:/opt/dsc/config/dsc1_config/
dmdba@192.168.56.8's password: 
dmmal.ini                       100%  296   529.8KB/s   00:00    

C机器

从A拷贝即可

[dmdba@dmdsc0 bin]$ scp -rp /opt/dsc/config/dsc0_config/dmmal.ini 192.168.56.24:/opt/dw/data/DAMENG/
dmdba@192.168.56.24's password: 
dmmal.ini                        100%  296   399.4KB/s   00:00    

修改归档配置

A机器

[dmdba@dmdsc0 bin]$ vi /opt/dsc/config/dsc0_config/dmarch.ini
#DaMeng Database Archive Configuration file
#this is comments

        ARCH_WAIT_APPLY      = 0  
 
[ARCHIVE_LOCAL1]
        ARCH_TYPE            = LOCAL
        ARCH_DEST            = /opt/dsc/arch_0
        ARCH_FILE_SIZE       = 256
        ARCH_SPACE_LIMIT     = 1024  
        ARCH_FLUSH_BUF_SIZE  = 0
        ARCH_HANG_FLAG       = 1

[ARCH_REMOTE1]
        ARCH_TYPE            = REMOTE
        ARCH_DEST            = DSC1
        ARCH_INCOMING_PATH   = /opt/dsc/arch_1
        ARCH_FILE_SIZE       = 256
        ARCH_SPACE_LIMIT     = 1024  
        ARCH_FLUSH_BUF_SIZE  = 0

[ARCHIVE_REALTIME1]
        ARCH_TYPE = REALTIME
        ARCH_DEST = DW0

B机器

[dmdba@dmdsc1 ~]$ vi /opt/dsc/config/dsc1_config/dmarch.ini
#DaMeng Database Archive Configuration file
#this is comments

        ARCH_WAIT_APPLY      = 0 

[ARCHIVE_LOCAL1]
        ARCH_TYPE            = LOCAL
        ARCH_DEST            = /opt/dsc/arch_1
        ARCH_FILE_SIZE       = 256  
        ARCH_SPACE_LIMIT     = 1024 
        ARCH_FLUSH_BUF_SIZE  = 0
        ARCH_HANG_FLAG       = 1

[ARCH_REMOTE1]
        ARCH_TYPE            = REMOTE
        ARCH_DEST            = DSC0
        ARCH_INCOMING_PATH   = /opt/dsc/arch_0
        ARCH_FILE_SIZE       = 256
        ARCH_SPACE_LIMIT     = 1024 
        ARCH_FLUSH_BUF_SIZE  = 0

[ARCHIVE_REALTIME1]
        ARCH_TYPE = REALTIME
        ARCH_DEST = DW0

C机器

[dmdba@dmdw0 dmdbms]$ vi /opt/dw/data/DAMENG/dmarch.ini
#DaMeng Database Archive Configuration file
#this is comments

        ARCH_WAIT_APPLY      = 0 

[ARCHIVE_LOCAL1]
        ARCH_TYPE            = LOCAL
        ARCH_DEST            = /opt/dw/arch
        ARCH_FILE_SIZE       = 256
        ARCH_SPACE_LIMIT     = 1024 
        ARCH_FLUSH_BUF_SIZE  = 0

[ARCHIVE_REALTIME1]
        ARCH_TYPE = REALTIME
        ARCH_DEST = DSC0/DSC1

参数中的ARCH_WAIT_APPLY和ARCH_SPACE_LIMIT可以根据实际主备场景进行调整,高性能模式可以提升整体性能,但如果备节点性能过差也会导致LSN逐渐拉开,而ARCH_SPACE_LIMIT会影响归档清理的触发,如果备节点还原后需要请求的归档已经被清理,则会返回718错误


配置dmwatcher

自动切换还是手动切换可以根据实际需要来修改,实例启动这里保持关闭,如果这里通过dmwatcher来fork进程启动实例,则当dmwatcher进程关闭时,其fork进程一并被结束,这样则违背了STANDBY DB应当最后结束的操作规范,可能产生INVLIAD LSN,同时也不符合先启动DB再打开dmwatcher的启动顺序要求
尽管可以通过将命令变为后台执行来规避上面的逻辑问题,但作为一个野进程放置后台并不便于优雅的管理

A机器

[dmdba@dmdsc0 bin]$ vi /opt/dsc/config/dmwatcher.ini
[GRP1]
 DW_TYPE = GLOBAL 
 DW_MODE = AUTO 
 DW_ERROR_TIME = 60 
 INST_RECOVER_TIME = 60 
 INST_ERROR_TIME = 35 
 INST_INI = /opt/dsc/config/dsc0_config/dm.ini             
 DCR_INI = /opt/dsc/config/dmdcr.ini                       
 INST_OGUID = 45332
 INST_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver   
 INST_AUTO_RESTART = 0   
 RLOG_SEND_THRESHOLD = 0  
 RLOG_APPLY_THRESHOLD = 0

B机器

[dmdba@dmdsc1 bin]$ vi /opt/dsc/config/dmwatcher.ini
[GRP1]
 DW_TYPE = GLOBAL 
 DW_MODE = AUTO  //自动切换 或手动
 DW_ERROR_TIME = 60 
 INST_RECOVER_TIME = 60 
 INST_ERROR_TIME = 35 
 INST_INI = /opt/dsc/config/dsc1_config/dm.ini             
 DCR_INI = /opt/dsc/config/dmdcr.ini                       
 INST_OGUID = 45332
 INST_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver
 INST_AUTO_RESTART = 0  
 RLOG_SEND_THRESHOLD = 0  
 RLOG_APPLY_THRESHOLD = 0

C机器

[dmdba@dmdw0 dmdbms]$ vi /opt/dw/data/DAMENG/dmwatcher.ini
[GRP1]
 DW_TYPE = GLOBAL
 DW_MODE = AUTO   //自动切换 或手动
 DW_ERROR_TIME = 60
 INST_RECOVER_TIME = 60
 INST_ERROR_TIME = 35
 INST_INI = /opt/dw/data/DAMENG/dm.ini
 INST_OGUID = 45332
 INST_STARTUP_CMD = /opt/dw/dmdbms/bin/dmserver
 INST_AUTO_RESTART = 0   
 RLOG_SEND_THRESHOLD = 0
 RLOG_APPLY_THRESHOLD = 0

配置dmmonitor

如果有3/5台可以放置于集群成员或外部机器组成raft协议,这里以一台D机器配置,关于多台的场景可以参考我另一篇博文

D机器

[dmdba@tpcc config]$ vi dmmonitor.ini
MON_LOG_PATH = ../dmdbms/log
MON_LOG_INTERVAL = 60
MON_LOG_FILE_SIZE = 64
MON_LOG_SPACE_LIMIT = 0
MON_DW_CONFIRM = 1    //配合自动切换

[GRP1]
MON_INST_OGUID = 45332
MON_DW_IP = 10.30.5.17:5836/10.30.5.18:5837
MON_DW_IP = 10.30.5.24:5838

手动启动实例

关闭所有节点,依次启动DSC集群节点到mount,再启动DW节点到mount
A机器

[dmdba@dmdsc0 bin]$ /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini mount
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
hlck_sys_init, init g_drm_dest:[0, 1]
lbs_sys_init, the length of g_master_map is 1117, fill it use ok_ep_arr:[0, 1], n_ok_ep:2!
check CSS cmd: START NOTIFY, cmd_seq: 33
Control Node change from 255 to 254
check CSS cmd: DCR_LOAD, cmd_seq: 34
check CSS cmd: EP START, cmd_seq: 37
Control Node change from 254 to 0
EP[0] adjust cur_lsn from [13103624] to [13103632]
file lsn: 13103624
check CSS cmd: EP START2, cmd_seq: 42
ndct db load finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info success.
SYSTEM IS READY.
check CSS cmd: EP OPEN, cmd_seq: 47
iid page's trxid[6016]
NEXT TRX ID = 576502.
[!!!DSC INFO!!!] DSC crash process over!
check CSS cmd: EP REAL OPEN, cmd_seq: 50

B机器

[dmdba@dmdsc1 bin]$ /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc1_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini mount
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
hpc_ini_info_pre_check end, code:0
hlck_sys_init, init g_drm_dest:[0, 1]
lbs_sys_init, the length of g_master_map is 1117, fill it use ok_ep_arr:[0, 1], n_ok_ep:2!
check CSS cmd: DCR_LOAD, cmd_seq: 35
check CSS cmd: EP START, cmd_seq: 39
Control Node change from 255 to 0
mal_tsk_process_g_crash_lsn_bro, ep_seqno(0), crash_lsn(0)
mal_tsk_process_g_crash_lsn_bro, ep_seqno(1), crash_lsn(0)
check CSS cmd: EP START2, cmd_seq: 44
Control node start status: MOUNT
file lsn: 13103632
begin redo pwr log collect, last ckpt lsn: 13103624 ...
redo pwr log collect finished
ndct db load finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info success.
SYSTEM IS READY.
check CSS cmd: EP OPEN, cmd_seq: 48
iid page's trxid[6017]
NEXT TRX ID = 576501.
check CSS cmd: EP REAL OPEN, cmd_seq: 51

C机器

[dmdba@dmdw0 bin]$ /opt/dw/dmdbms/bin/dmserver path=/opt/dw/data/DAMENG/dm.ini mount
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
file lsn: 13086261
ndct db load finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info success.
SYSTEM IS READY.

配置OGUID/数据库模式

A/B机器任意一台

[dmdba@dmdsc0 bin]$ ./disql SYSDBA/SYSDBA@192.168.56.7:5236

服务器[192.168.56.7:5236]:处于普通配置状态
登录使用时间 : 8.027(ms)
disql V8
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 1);
DMSQL 过程已成功完成
已用时间: 50.718(毫秒). 执行号:0.
SQL>  ALTER DATABASE PRIMARY;
操作已执行
已用时间: 84.143(毫秒). 执行号:0.
SQL> SP_SET_OGUID(45332);
DMSQL 过程已成功完成
已用时间: 42.604(毫秒). 执行号:1.
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 0);
DMSQL 过程已成功完成
已用时间: 9.889(毫秒). 执行号:2.

C机器

[dmdba@dmdw0 bin]$ ./disql SYSDBA/SYSDBA@192.168.56.24:5238

服务器[192.168.56.24:5238]:处于普通配置状态
登录使用时间 : 13.113(ms)
disql V8
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 1);
DMSQL 过程已成功完成
已用时间: 125.324(毫秒). 执行号:0.
SQL> ALTER DATABASE STANDBY;
操作已执行
已用时间: 32.043(毫秒). 执行号:0.
SQL> SP_SET_OGUID(45332);
DMSQL 过程已成功完成
已用时间: 13.134(毫秒). 执行号:1.
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 0);
DMSQL 过程已成功完成
已用时间: 4.570(毫秒). 执行号:2.

手动启动dmwatcher

先启动备机器dmwatcher,再启动主机器dmwatcher

C机器

[dmdba@dmdw0 bin]$ /opt/dw/dmdbms/bin/dmwatcher path=/opt/dw/data/DAMENG/dmwatcher.ini
DMWATCHER[4.0] V8
DMWATCHER[4.0] IS READY

A机器

[dmdba@dmdsc0 bin]$ /opt/dsc/dmdbms/bin/dmwatcher path=/opt/dsc/config/dmwatcher.ini
DMWATCHER[4.0] V8
DMWATCHER[4.0] IS READY

B机器

[dmdba@dmdsc1 bin]$ /opt/dsc/dmdbms/bin/dmwatcher path=/opt/dsc/config/dmwatcher.ini
DMWATCHER[4.0] V8
DMWATCHER[4.0] IS READY

确认集群状态

手动启动dmmonitor
D机器

[dmdba@tpcc bin]$ /opt/dsc/dmdbms/bin/dmmonitor path=/opt/dsc/config/dmmonitor.ini
[monitor]         2022-04-19 08:54:51: DMMONITOR[4.0] V8
[monitor]         2022-04-19 08:54:51: DMMONITOR[4.0] IS READY. 

集群状态已经正常
在这里插入图片描述
确认正常后依次退出MONITOR,备dmwatcher,主dmwatcher,DSC数据库,DW数据库

注册数据库库服务

这里我们将所有节点启动到mount状态注册为单独的服务,并且再DSC节点上加上对CSS服务的依赖

A机器

[root@dmdsc0 ~]# /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmserver -p DSC0 -dm_ini /opt/dsc/config/dsc0_config/dm.ini -dcr_ini /opt/dsc/config/dmdcr.ini -m mount -y DmCSSServiceCSS
Created symlink from /etc/systemd/system/multi-user.target.wants/DmServiceDSC0.service to /usr/lib/systemd/system/DmServiceDSC0.service.
创建服务(DmServiceDSC0)完成

B机器

/opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmserver -p DSC1 -dm_ini /opt/dsc/config/dsc1_config/dm.ini -dcr_ini /opt/dsc/config/dmdcr.ini -m mount -y DmCSSServiceCSS
Created symlink from /etc/systemd/system/multi-user.target.wants/DmServiceDSC1.service to /usr/lib/systemd/system/DmServiceDSC1.service.
创建服务(DmServiceDSC1)完成

C机器

[root@dmdw0 ~]# /opt/dw/dmdbms/script/root/dm_service_installer.sh -t dmserver -p DW0 -dm_ini /opt/dw/data/DAMENG/dm.ini -m mount
Created symlink from /etc/systemd/system/multi-user.target.wants/DmServiceDW0.service to /usr/lib/systemd/system/DmServiceDW0.service.
创建服务(DmServiceDW0)完成

注册dmwatcher服务

A机器

[root@dmdsc0 bin]# /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmwatcher -watcher_ini /opt/dsc/config/dmwatcher.ini -p WATCHER
Created symlink from /etc/systemd/system/multi-user.target.wants/DmWatcherServiceWATCHER.service to /usr/lib/systemd/system/DmWatcherServiceWATCHER.service.
创建服务(DmWatcherServiceWATCHER)完成

B机器

[root@dmdsc1 ~]# /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmwatcher -watcher_ini /opt/dsc/config/dmwatcher.ini -p WATCHER
Created symlink from /etc/systemd/system/multi-user.target.wants/DmWatcherServiceWATCHER.service to /usr/lib/systemd/system/DmWatcherServiceWATCHER.service.
创建服务(DmWatcherServiceWATCHER)完成

C机器

[root@dmdw0 ~]# /opt/dw/dmdbms/script/root/dm_service_installer.sh -t dmwatcher -watcher_ini /opt/dw/data/DAMENG/dmwatcher.ini -p WATCHER
Created symlink from /etc/systemd/system/multi-user.target.wants/DmWatcherServiceWATCHER.service to /usr/lib/systemd/system/DmWatcherServiceWATCHER.service.
创建服务(DmWatcherServiceWATCHER)完成

启停流程

操作时应当遵守顺序,否则备节点可能出现LSN INVALID导致脑裂,针对本次规划的逻辑启停顺序如下

启动

启动PRIMARY CSS服务

[root@dmdsc0 ~]# systemctl start DmCSSServiceCSS
[root@dmdsc1 ~]# systemctl start DmCSSServiceCSS

启动PRIMARY实例(该服务依赖CSS服务,可能会卡住等待,但不会影响整体逻辑)

[root@dmdsc0 ~]# systemctl start DmServiceDSC0
[root@dmdsc1 ~]# systemctl start DmServiceDSC1

启动STANDBY实例

[root@dmdw0 ~]# systemctl start DmServiceDW0

启动STANDBY dmwatcher

[root@dmdw0 ~]# systemctl start DmWatcherServiceWATCHER.service

启动PRIAMRY dmwatcher

[root@dmdsc0 ~]# systemctl start DmWatcherServiceWATCHER.service
[root@dmdsc1 ~]# systemctl start DmWatcherServiceWATCHER.service

启动监视器

[dmdba@tpcc bin]$ /opt/dsc/dmdbms/bin/dmmonitor path=/opt/dsc/config/dmmonitor.ini

状态确认

可以通过dmmonitor确认其状态是否正常
在这里插入图片描述

关闭

退出监视器

关闭STANDBY节点dmwatcher

[root@dmdw0 ~]# systemctl stop DmWatcherServiceWATCHER.service

关闭PRIMARY节点dmwatcher

[root@dmdsc0 ~]# systemctl stop DmWatcherServiceWATCHER.service
[root@dmdsc1 ~]# systemctl stop DmWatcherServiceWATCHER.service

关闭PRIMARY节点数据库

[root@dmdsc0 ~]# systemctl stop DmServiceDSC0
[root@dmdsc1 ~]# systemctl stop DmServiceDSC1

关闭PRIMARY CSS服务

[root@dmdsc0 ~]# systemctl stop DmCSSServiceCSS
[root@dmdsc1 ~]# systemctl stop DmCSSServiceCSS

关闭STANDBY节点数据库

[root@dmdw0 ~]# systemctl stop DmServiceDW0

DW的修复

如果DW的STANDBY出现日志INVALID且不能自动修复,可以手动进行修复,其流程为

  • 关闭STANDBY节点dmwatcher
  • 关闭PRIMARY节点dmwatcher
  • 备份PRIMARY节点数据
  • 关闭STANDBY节点并还原数据
  • 启动STANDBY节点到mount
  • 重新配置STANDBY节点OGUID/模式
  • 启动STANDBY节点dmwatcher
  • 启动PRIMARY节点dmwatcher
  • 确认状态是否修复

此处不用关闭PRIMARY,确保应用正常使用

故障切换测试

DSC单点故障

模拟DSC0故障,故障前
在这里插入图片描述
检测到故障并自动切换PRIMARY节点为DSC1
在这里插入图片描述
DSC0重新加入后,自动切换回DSC0为PRIMARY
在这里插入图片描述
DSC1非PRIMARY 无影响,仅检测到故障,在此不展示

所有DSC故障

发现节点故障,此时DW仍然以STANDBY模式打开,需要通过takeover来接管
在这里插入图片描述
可以手动切换为DW0变成PRIMARY
在这里插入图片描述

login
用户名:SYSDBA
密码:
[monitor]         2022-04-19 16:07:49: 登录监视器成功!

takeover
[monitor]         2022-04-19 16:07:58: 开始使用实例DW0接管
[monitor]         2022-04-19 16:07:58: 通知守护进程DW0切换TAKEOVER状态
[monitor]         2022-04-19 16:07:58: 守护进程(DW0)状态切换 [OPEN-->TAKEOVER]
[monitor]         2022-04-19 16:07:58: 切换守护进程DW0为TAKEOVER状态成功
[monitor]         2022-04-19 16:07:58: 实例DW0开始执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句
[monitor]         2022-04-19 16:07:58: 实例DW0执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句成功
[monitor]         2022-04-19 16:07:58: 实例DW0开始执行SP_APPLY_KEEP_PKG()语句
[monitor]         2022-04-19 16:07:59: 实例DW0执行SP_APPLY_KEEP_PKG()语句成功
[monitor]         2022-04-19 16:07:59: 实例DW0开始执行ALTER DATABASE MOUNT语句
[monitor]         2022-04-19 16:07:59: 实例DW0执行ALTER DATABASE MOUNT语句成功
[monitor]         2022-04-19 16:07:59: 实例DW0开始执行ALTER DATABASE PRIMARY语句
[monitor]         2022-04-19 16:07:59: 实例DW0执行ALTER DATABASE PRIMARY语句成功
[monitor]         2022-04-19 16:07:59: 通知实例DW0修改所有归档状态无效
[monitor]         2022-04-19 16:07:59: 修改所有实例归档为无效状态成功
[monitor]         2022-04-19 16:07:59: 实例DW0开始执行ALTER DATABASE OPEN FORCE语句
[monitor]         2022-04-19 16:07:59: 实例DW0执行ALTER DATABASE OPEN FORCE语句成功
[monitor]         2022-04-19 16:07:59: 实例DW0开始执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句
[monitor]         2022-04-19 16:07:59: 实例DW0执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句成功
[monitor]         2022-04-19 16:07:59: 通知守护进程DW0切换OPEN状态
[monitor]         2022-04-19 16:07:59: 守护进程(DW0)状态切换 [TAKEOVER-->OPEN]
[monitor]         2022-04-19 16:08:00: 切换守护进程DW0为OPEN状态成功
[monitor]         2022-04-19 16:08:00: 通知组(GRP1)的守护进程执行清理操作
[monitor]         2022-04-19 16:08:00: 清理守护进程(DW0)请求成功
[monitor]         2022-04-19 16:08:00: 使用实例DW0接管成功

接管后变为
在这里插入图片描述
此时可以正常对外服务

[dmdba@dmdw0 bin]$ ./disql SYSDBA/SYSDBA@192.168.56.24:5238

服务器[192.168.56.24:5238]:处于主库打开状态
登录使用时间 : 1.576(ms)
disql V8
SQL>

DSC重新加入

DSC节点通过PRIMARY备份恢复完成修复后正常启动自动加入
在这里插入图片描述
此时DW保持PRIMARY,DSC以STANDBY加入
在这里插入图片描述
在这里插入图片描述

手动切换

如果希望切回DSC为PRIMARY可以手动切换一下
在这里插入图片描述

login
用户名:SYSDBA
密码:
[monitor]         2022-04-19 16:15:19: 登录监视器成功!

switchover DSC0
[monitor]         2022-04-19 16:15:25: 开始切换实例DSC0
[monitor]         2022-04-19 16:15:25: 通知守护进程DW0切换SWITCHOVER状态
[monitor]         2022-04-19 16:15:25: 守护进程(DW0)状态切换 [OPEN-->SWITCHOVER]
[monitor]         2022-04-19 16:15:26: 切换守护进程DW0为SWITCHOVER状态成功
[monitor]         2022-04-19 16:15:26: 通知守护进程DSC0切换SWITCHOVER状态
[monitor]         2022-04-19 16:15:26: 守护进程(DSC0)状态切换 [OPEN-->SWITCHOVER]
[monitor]         2022-04-19 16:15:27: 切换守护进程DSC0为SWITCHOVER状态成功
[monitor]         2022-04-19 16:15:27: 实例DW0开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor]         2022-04-19 16:15:27: 实例DW0执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor]         2022-04-19 16:15:27: 实例DSC0开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor]         2022-04-19 16:15:27: 实例DSC0执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor]         2022-04-19 16:15:27: 实例DW0开始执行ALTER DATABASE MOUNT语句
[monitor]         2022-04-19 16:15:27: 实例DW0执行ALTER DATABASE MOUNT语句成功
[monitor]         2022-04-19 16:15:27: 实例DSC0开始执行SP_APPLY_KEEP_PKG()语句
[monitor]         2022-04-19 16:15:27: 实例DSC0执行SP_APPLY_KEEP_PKG()语句成功
[monitor]         2022-04-19 16:15:27: 实例DSC0开始执行ALTER DATABASE MOUNT语句
[monitor]         2022-04-19 16:15:27: 实例DSC0执行ALTER DATABASE MOUNT语句成功
[monitor]         2022-04-19 16:15:27: 实例DW0开始执行ALTER DATABASE STANDBY语句
[monitor]         2022-04-19 16:15:27: 实例DW0执行ALTER DATABASE STANDBY语句成功
[monitor]         2022-04-19 16:15:27: 实例DSC0开始执行ALTER DATABASE PRIMARY语句
[monitor]         2022-04-19 16:15:28: 实例DSC0执行ALTER DATABASE PRIMARY语句成功
[monitor]         2022-04-19 16:15:28: 通知实例DSC0修改所有归档状态无效
[monitor]         2022-04-19 16:15:28: 修改所有实例归档为无效状态成功
[monitor]         2022-04-19 16:15:28: 实例DW0开始执行ALTER DATABASE OPEN FORCE语句
[monitor]         2022-04-19 16:15:28: 实例DW0执行ALTER DATABASE OPEN FORCE语句成功
[monitor]         2022-04-19 16:15:28: 实例DSC0开始执行ALTER DATABASE OPEN FORCE语句
[monitor]         2022-04-19 16:15:29: 实例DSC0执行ALTER DATABASE OPEN FORCE语句成功
[monitor]         2022-04-19 16:15:29: 实例DW0开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor]         2022-04-19 16:15:29: 实例DW0执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor]         2022-04-19 16:15:29: 实例DSC0开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor]         2022-04-19 16:15:29: 实例DSC0执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor]         2022-04-19 16:15:29: 通知守护进程DW0切换OPEN状态
[monitor]         2022-04-19 16:15:29: 守护进程(DW0)状态切换 [SWITCHOVER-->OPEN]
[monitor]         2022-04-19 16:15:30: 切换守护进程DW0为OPEN状态成功
[monitor]         2022-04-19 16:15:30: 通知守护进程DSC0切换OPEN状态
[monitor]         2022-04-19 16:15:31: 守护进程(DSC0)状态切换 [SWITCHOVER-->OPEN]
[monitor]         2022-04-19 16:15:31: 切换守护进程DSC0为OPEN状态成功
[monitor]         2022-04-19 16:15:31: 通知组(GRP1)的守护进程执行清理操作
[monitor]         2022-04-19 16:15:31: 清理守护进程(DSC0)请求成功
[monitor]         2022-04-19 16:15:31: 清理守护进程(DW0)请求成功
[monitor]         2022-04-19 16:15:31: 实例DSC0切换成功

切换后
在这里插入图片描述
在这里插入图片描述

总结

本文针对DM8 DSC+DW的搭建,操作流程,集群修复及故障切换基于一个测试场景进行了整理,实际生产环境必定有更多需要关注的细节,有机会会在后续进行分享。

达梦云适配技术社区
https://eco.dameng.com/

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

-守仁-

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值