为什么要用DSC+DW
高可用集群常见的分为,机房高可用,同城高可用和异地高可用三层,针对DM8而言,DSC或者DW都可以满足机房高可用的场景,而同城高可用则可以通过DSC+DW来进行实现,本次将对此架构进行搭建和测试的展示
基础环境规划
本次环境基于4台虚拟机进行,硬件及操作系统如下
类别 | A机器 | B机器 | C机器 | D机器 |
---|---|---|---|---|
CPU | i5 1.60GHz * 2 core | i5 1.60GHz * 2 core | i5 1.60GHz * 2 core | i5 1.60GHz * 2 core |
本地盘 | 20G | 20G | 20G | 20G |
共享盘 | 20G | 20G | N/A | N/A |
内存 | 4G | 4G | 4G | 2G |
网卡 | 1000MB * 2 | 1000MB * 2 | 1000MB * 2 | 1000MB * 2 |
OS | CentOS 7.7.1908 (Core) | CentOS 7.7.1908 (Core) | CentOS 7.7.1908 (Core) | CentOS 7.7.1908 (Core) |
KERNEL | 3.10.0-1160.59.1.el7.x86_64 | 3.10.0-1160.59.1.el7.x86_64 | 3.10.0-1160.59.1.el7.x86_64 | 3.10.0-1160.59.1.el7.x86_64 |
软件环境规划
D机器作为monitor节点,仅需要安装软件即可
A/B/C机器两块网卡分别为HOST ONLY和INTNET方式,模拟以真实机器为APP,HOST ONLY网卡作为业务网,INTNET网卡作为为内部心跳及INTERCONNECT通信
类别 | A机器 | B机器 | C机器 |
---|---|---|---|
主机名 | dmdsc0 | dmdsc1 | dmdw0 |
业务IP | 192.168.56.7 | 192.168.56.8 | 192.168.56.24 |
心跳IP | 10.30.5.17 | 10.30.5.18 | 10.30.5.24 |
实例名 | DSC0 | DSC1 | DW0 |
端口号 | 5236 | 5236 | 5238 |
安装介质目录 | /opt/dsc/setup | /opt/dsc/setup | /opt/dw/setup |
软件安装目录 | /opt/dsc/dmdbms | /opt/dsc/dmdbms | /opt/dw/dmdbms |
配置文件目录 | /opt/dsc/config | /opt/dsc/config | /opt/dw/config |
本地归档目录 | /opt/dsc/arch_0 | /opt/dsc/arch_1 | /opt/dw/arch |
远程归档目录 | /opt/dsc/arch_1 | /opt/dsc/arch_0 | N/A |
归档上限 | 1024 | 1024 | 1024 |
实时归档实例 | DW | DW | DSC0/DSC1 |
备份目录 | /opt/dsc/bak | /opt/dsc/bak | /opt/dw/bak |
监视器IP | 10.30.5.188 | 10.30.5.188 | 10.30.5.188 |
dmdcr_cfg
类别 | 参数 | A机器 | B机器 |
---|---|---|---|
CSS | DCR_EP_NAME | CSS0 | CSS1 |
DCR_EP_HOST | 10.30.5.17 | 10.30.5.18 | |
DCR_EP_PORT | 5336 | 5337 | |
ASM | DCR_EP_NAME | ASM0 | ASM1 |
DCR_EP_HOST | 192.168.56.7 | 192.168.56.8 | |
DCR_EP_PORT | 5436 | 5437 | |
DCR_EP_SHM_KEY | 93360 | 93361 | |
DCR_EP_SHM_SIZE | 10 | 10 | |
DCR_EP_ASM_LOAD_PATH | /dev/raw | /dev/raw | |
DB | DCR_EP_NAME | DSC0 | DSC1 |
DCR_EP_PORT | 5236 | 5237 | |
DCR_EP_SEQNO | 0 | 1 | |
DCR_CHECK_PORT | 5536 | 5537 | |
DCR_OGUID | 45331 | 45331 |
dmasvrmal
参数 | A机器 | B机器 |
---|---|---|
MAL_INST_NAME | ASM0 | ASM1 |
MAL_HOST | 10.30.5.17 | 10.30.5.18 |
MAL_PORT | 5636 | 5637 |
dmmal
参数 | A机器 | B机器 | C机器 |
---|---|---|---|
MAL_PORT | 5736 | 5737 | 5738 |
MAL_DW_PORT | 5836 | 5837 | 5838 |
MAL_INST_DW_PORT | 5936 | 5937 | 5938 |
dmwatcher
组名 | 参数 | A机器 | B机器 | C机器 |
---|---|---|---|---|
GRP1 | DW_TYPE | GLOBAL | GLOBAL | GLOBAL |
- | DW_MODE | AUTO | AUTO | AUTO |
- | DW_ERROR_TIME | 60 | 60 | 60 |
- | INST_RECOVER_TIME | 60 | 60 | 60 |
- | INST_ERROR_TIME | 35 | 35 | 35 |
- | INST_INI | /opt/dsc/config/dsc0_config/dm.ini | /opt/dsc/config/dsc1_config/dm.ini | /opt/dw/data/DAMENG/dm.ini |
- | DCR_INI | /opt/dsc/config/dmdcr.ini | /opt/dsc/config/dmdcr.ini | N/A |
- | INST_OGUID | /45332 | 45332 | 45332 |
- | INST_STARTUP_CMD | /opt/dsc/dmdbms/bin/dmserver | /opt/dsc/dmdbms/bin/dmserver | /opt/dw/dmdbms/bin/dmserver |
- | INST_AUTO_RESTART | 0 | 0 | 0 |
- | RLOG_SEND_THRESHOLD | 0 | 0 | 0 |
- | RLOG_APPLY_THRESHOLD | 0 | 0 | 0 |
通用环境准备
主要包括一些常规步骤,本篇核心在于集群本身搭建和故障测试,仅列举一下内容
- 主机名规划/hosts添加
- 防火墙/selinux关闭
- 网卡规划及配置
- Limit/Kernel调整
- 共享磁盘挂载分区及调度算法调整
- 裸设备绑定
- Swap禁用
- 时间同步配置
- 用户及目录创建
- 数据库软件安装
2节点DSC搭建
准备dmdcr_cfg.ini
在配置文件目录下,编写用于初始化dcr和vote的配置文件
[dmdba@dmdsc0 home]$ cd /opt/dsc/config/
[dmdba@dmdsc0 config]$ vi dmdcr_cfg.ini
DCR_N_GRP = 3
DCR_VTD_PATH = /dev/raw/raw2
DCR_OGUID = 45331
[GRP]
DCR_GRP_TYPE = CSS
DCR_GRP_NAME = GRP_CSS
DCR_GRP_N_EP = 2
DCR_GRP_DSKCHK_CNT = 60
[GRP_CSS]
DCR_EP_NAME = CSS0
DCR_EP_HOST = 10.30.5.17
DCR_EP_PORT = 5336
[GRP_CSS]
DCR_EP_NAME = CSS1
DCR_EP_HOST = 10.30.5.18
DCR_EP_PORT = 5337
[GRP]
DCR_GRP_TYPE = ASM
DCR_GRP_NAME = GRP_ASM
DCR_GRP_N_EP = 2
DCR_GRP_DSKCHK_CNT = 60
[GRP_ASM]
DCR_EP_NAME = ASM0
DCR_EP_SHM_KEY = 93360
DCR_EP_SHM_SIZE = 10
DCR_EP_HOST = 192.168.56.7
DCR_EP_PORT = 5436
DCR_EP_ASM_LOAD_PATH = /dev/raw
[GRP_ASM]
DCR_EP_NAME = ASM1
DCR_EP_SHM_KEY = 93361
DCR_EP_SHM_SIZE = 10
DCR_EP_HOST = 192.168.56.8
DCR_EP_PORT = 5437
DCR_EP_ASM_LOAD_PATH = /dev/raw
[GRP]
DCR_GRP_TYPE = DB
DCR_GRP_NAME = GRP_DSC
DCR_GRP_N_EP = 2
DCR_GRP_DSKCHK_CNT = 60
[GRP_DSC]
DCR_EP_NAME = DSC0
DCR_EP_SEQNO = 0
DCR_EP_PORT = 5236
DCR_CHECK_PORT = 5536
[GRP_DSC]
DCR_EP_NAME = DSC1
DCR_EP_SEQNO = 1
DCR_EP_PORT = 5236
DCR_CHECK_PORT = 5537
标记ASM磁盘
通过asmcmd标记裸设备,初始化设备头信息,只在一台机器上进行即可
A机器
[dmdba@dmdsc0 config]$ /opt/dsc/dmdbms/bin/dmasmcmd
DMASMCMD V8
ASM>create dcrdisk '/dev/raw/raw1' 'dcr'
[Trace]The ASM initialize dcrdisk /dev/raw/raw1 to name DMASMdcr
Used time: 00:00:05.449.
ASM>create votedisk '/dev/raw/raw2' 'vote'
[Trace]The ASM initialize votedisk /dev/raw/raw2 to name DMASMvote
Used time: 14.441(ms).
ASM>create asmdisk '/dev/raw/raw3' 'LOG0'
[Trace]The ASM initialize asmdisk /dev/raw/raw3 to name DMASMLOG0
Used time: 15.057(ms).
ASM>create asmdisk '/dev/raw/raw4' 'DATA0'
[Trace]The ASM initialize asmdisk /dev/raw/raw4 to name DMASMDATA0
Used time: 14.499(ms).
初始化dcr/vote
通过asmcmd将此前配置的dmdcr_cfg.ini写入到dcrdisk和votedisk中,只在一台机器执行即可
A机器
ASM>init dcrdisk '/dev/raw/raw1' from '/opt/dsc/config/dmdcr_cfg.ini' identified by 'abcd'
[Trace]DG 126 alloc one extent for inodes, addr(disk_id, disk_auno, extent_no):(0,0,1).
[Trace]DG 126 allocate 4 extents for file 0xfe000002.
[Trace]DG 126 alloc 4 extents for 0xfe000002, addr(disk_id, disk_auno, extent_no):(0, 0, 2)->(0, 0, 5), need_init = 1.
Used time: 234.261(ms).
ASM>init votedisk '/dev/raw/raw2' from '/opt/dsc/config/dmdcr_cfg.ini'
[Trace]DG 125 alloc one extent for inodes, addr(disk_id, disk_auno, extent_no):(0,0,1).
[Trace]DG 125 allocate 4 extents for file 0xfd000002.
[Trace]DG 125 alloc 4 extents for 0xfd000002, addr(disk_id, disk_auno, extent_no):(0, 0, 2)->(0, 0, 5), need_init = 1.
Used time: 116.459(ms).
ASM>exit
配置dmasvrmal.ini
配置ASM使用的MAL信息,实例名与dmdcr_cfg中ASM组配置一致,A/B机器上配置相同
A机器
[dmdba@dmdsc0 config]$ vi /opt/dsc/config/dmasvrmal.ini
[MAL_INST1]
MAL_INST_NAME = ASM0
MAL_HOST = 10.30.5.17
MAL_PORT = 5636
[MAL_INST2]
MAL_INST_NAME = ASM1
MAL_HOST = 10.30.5.18
MAL_PORT = 5637
B机器
从A机器拷贝即可
配置dmdcr.ini
通过dmdcr.ini配置dcr盘对应裸设备,ASM使用的MAL配置文件,及自身在集群中的节点编号,以便于使其他命令使用时能正确读取dcr内容,与ASM MAL进行通信,以及根据配置自动拉起ASM或DB服务
A机器
[dmdba@dmdsc0 ~]$ vi /opt/dsc/config/dmdcr.ini
DMDCR_PATH = /dev/raw/raw1
DMDCR_MAL_PATH = /opt/dsc/config/dmasvrmal.ini
DMDCR_SEQNO = 0
#DMDCR_ASM_RESTART_INTERVAL = 30
#DMDCR_ASM_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmasmsvr dcr_ini=/opt/dsc/config/dmdcr.ini
#DMDCR_DB_RESTART_INTERVAL = 60
#DMDCR_DB_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini
B机器
[dmdba@dmdsc1 ~]$ vi /opt/dsc/config/dmdcr.ini
DMDCR_PATH = /dev/raw/raw1
DMDCR_MAL_PATH = /opt/dsc/config/dmasvrmal.ini
DMDCR_SEQNO = 1
#DMDCR_ASM_RESTART_INTERVAL = 30
#DMDCR_ASM_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmasmsvr dcr_ini=/opt/dsc/config/dmdcr.ini
#DMDCR_DB_RESTART_INTERVAL = 60
#DMDCR_DB_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini
配置阶段将自动拉起注释掉,可以避免一些麻烦 😛
手动启动CSS/ASM
A机器
[dmdba@dmdsc0 ~]$ /opt/dsc/dmdbms/bin/dmcss DCR_INI=/opt/dsc/config/dmdcr.ini
DMCSS V8
DMCSS IS READY
[2022-03-31 16:30:39:877] [CSS]: Set EP CSS0[0] as Control node
[dmdba@dmdsc0 config]$ /opt/dsc/dmdbms/bin/dmasmsvr DCR_INI=/opt/dsc/config/dmdcr.ini
ASM SELF EPNO:0
DMASMSVR V8
dmasmsvr task worker thread startup
the ASM server is Ready.
check css cmd: START NOTIFY, cmd_seq: 19
check css cmd: EP START, cmd_seq: 20
ASM Control Node EPNO:0
check css cmd: EP OPEN, cmd_seq: 27
check css cmd: EP REAL OPEN, cmd_seq: 30
B机器
[dmdba@dmdsc1 ~]$ /opt/dsc/dmdbms/bin/dmcss DCR_INI=/opt/dsc/config/dmdcr.ini
DMCSS V8
DMCSS IS READY
[2022-03-31 16:32:00:151] [CSS]: Set EP CSS0[0] as Control node
[dmdba@dmdsc1 config]$ /opt/dsc/dmdbms/bin/dmasmsvr DCR_INI=/opt/dsc/config/dmdcr.ini
ASM SELF EPNO:1
DMASMSVR V8
dmasmsvr task worker thread startup
the ASM server is Ready.
check css cmd: EP START, cmd_seq: 22
ASM Control Node EPNO:0
check css cmd: EP OPEN, cmd_seq: 28
check css cmd: EP REAL OPEN, cmd_seq: 31
一个小问题
最初启动css时依赖库无法打开
/opt/dsc/dmdbms/bin/dmcss: error while loading shared libraries: libdmcalc.so: cannot open shared object file: No such file or directory
实际依赖库是能够找到的,包括二层依赖也正常,但从全局库缓存发现缺失,将其添加到全局库中,并重建缓存
[dmdba@dmdsc0 ~]$ ldconfig -p|grep libdmcalc
[root@dmdsc0 config]# vi /etc/ld.so.conf.d/dm.conf
/opt/dsc/dmdbms/bin
[root@dmdsc0 config]# ldconfig
ldconfig: /opt/dsc/dmdbms/bin/libxerces-c-3.1.so is not a symbolic link
[root@dmdsc0 config]# ldconfig -p|grep libdmcalc
libdmcalc.so (libc6,x86-64) => /opt/dsc/dmdbms/bin/libdmcalc.so
此后便不会有依赖问题,当然实际上进到so所在的位置去执行也不会有问题,anyway,it dpends on u.
创建 ASM磁盘组
通过dmasmtool创建ASM管理的磁盘组,只在一个节点进行即可
A机器
[dmdba@dmdsc0 ~]$ /opt/dsc/dmdbms/bin/dmasmtool DCR_INI=/opt/dsc/config/dmdcr.ini
DMASMTOOL V8
ASM>create diskgroup 'DMLOG' asmdisk '/dev/raw/raw3'
Used time: 38.129(ms).
ASM>create diskgroup 'DMDATA' asmdisk '/dev/raw/raw4'
Used time: 46.751(ms).
配置dminit.ini
配置数据库实例初始化文件,写入ASM中的实例初始化只能通过配置文件操作完成
[dmdba@dmdsc0 config]$ vi dminit.ini
db_name = dsc
system_path = +DMDATA/data
main = +DMDATA/data/dsc/main.dbf
main_size = 128
roll = +DMDATA/data/dsc/roll.dbf
roll_size = 128
system = +DMDATA/data/dsc/system.dbf
system_size = 128
ctl_path = +DMDATA/data/dsc/dm.ctl
ctl_size = 8
log_size = 256
dcr_path = /dev/raw/raw1
dcr_seqno = 0
auto_overwrite = 1
PAGE_SIZE = 32
CASE_SENSITIVE = Y
CHARSET = 0
[DSC0]
config_path = /opt/dsc/config/dsc0_config
port_num = 5236
mal_host = 10.30.5.17
mal_port = 5736
log_path = +DMLOG/log/dsc0_log01.log
log_path = +DMLOG/log/dsc0_log02.log
[DSC1]
config_path = /opt/dsc/config/dsc1_config
port_num = 5236
mal_host = 10.30.5.18
mal_port = 5737
log_path = +DMLOG/log/dsc1_log01.log
log_path = +DMLOG/log/dsc1_log02.log
初始化DSC实例
A机器
[dmdba@dmdsc0 ~]$ /opt/dsc/dmdbms/bin/dminit control=/opt/dsc/config/dminit.ini
initdb V8
db version: 0x7000c
file dm.key not found, use default license!
License will expire on 2023-03-04
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
log file path: +DMLOG/log/dsc0_log01.log
log file path: +DMLOG/log/dsc0_log02.log
log file path: +DMLOG/log/dsc1_log01.log
log file path: +DMLOG/log/dsc1_log02.log
write to dir [+DMDATA/data/dsc].
create dm database success. 2022-04-07 17:24:55
拷贝配置文件到其他节点
[dmdba@dmdsc0 ~]$ scp -rp /opt/dsc/config/dsc1_config 192.168.56.8:/opt/dsc/config/
dm.ini 100% 56KB 27.2MB/s 00:00
sqllog.ini 100% 481 640.0KB/s 00:00
dmmal.ini 100% 200 295.8KB/s 00:00
配置DSC归档
A机器
[dmdba@dmdsc0 dsc0_config]$ vi dmarch.ini
[ARCHIVE_LOCAL1]
ARCH_TYPE = LOCAL
ARCH_DEST = /opt/dsc/arch_0
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
[ARCH_REMOTE1]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC1
ARCH_INCOMING_PATH = /opt/dsc/arch_1
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
[dmdba@dmdsc0 dsc0_config]$ vi dm.ini
ARCH_INI = 1
B机器
[dmdba@dmdsc1 dsc1_config]$ vi dmarch.ini
[ARCHIVE_LOCAL1]
ARCH_TYPE = LOCAL
ARCH_DEST = /opt/dsc/arch_1
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
[ARCH_REMOTE1]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC0
ARCH_INCOMING_PATH = /opt/dsc/arch_0
ARCH_FILE_SIZE = 1024
ARCH_SPACE_LIMIT = 1024
[dmdba@dmdsc1 dsc1_config]$ vi dm.ini
ARCH_INI = 1
启动DSC实例
A机器
[dmdba@dmdsc0 ~]$ /opt/dsc/dmdbms/bin/dmserver /opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
hlck_sys_init, init g_drm_dest:[0, 1]
lbs_sys_init, the length of g_master_map is 1117, fill it use ok_ep_arr:[0, 1], n_ok_ep:2!
check CSS cmd: START NOTIFY, cmd_seq: 2
Control Node change from 255 to 254
check CSS cmd: DCR_LOAD, cmd_seq: 3
check CSS cmd: EP START, cmd_seq: 6
Control Node change from 254 to 0
file lsn: 0
check CSS cmd: EP START2, cmd_seq: 11
ndct db load finished
ckpt2_exec_immediately begin.
file_lsn < cur_lsn & no dirty page & in mount status, ignore checkpoint
checkpoint end, 0 pages flushed, used_space[512], free_space[536862208].
checkpoint: buffer pages flushing...
checkpoint end, 2 pages flushed, used_space[512], free_space[536862208].
checkpoint end, 0 pages flushed, used_space[0], free_space[536862720].
checkpoint end, 0 pages flushed, used_space[0], free_space[536862720].
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
iid page's trxid[1002]
NEXT TRX ID = 1003
pseg_collect_mgr_items, total collect 0 active_trxs, 0 cmt_trxs, 0 pre_cmt_trxs, 0 to_release_trxs, 0 active_pages, 0 cmt_pages, 0 pre_cmt_pages, 0 to_release_pages, 0 mgr pages, 0 mgr recs!
iid page's trxid[2004]
NEXT TRX ID = 3008.
total 0 active crash trx, pseg_crash_trx_rollback sys_only(0) begin ...
pseg_crash_trx_rollback end, total 0 active crash trx, include 0 empty_trxs, 0 empty_pages which only need to delete mgr recs.
pseg_crash_trx_rollback end
pseg recv finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
trx: 3008 purged 1 pages
trx: 3046 purged 1 pages
......
trx: 4238 purged 1 pages
checkpoint for flush ts[65535] buffer...
checkpoint for flush ts[65535] buffer end
systables desc init success.
ndct_db_load_info success.
nsvr_process_before_open begin.
nsvr_process_before_open success.
SYSTEM IS READY.
check CSS cmd: EP OPEN, cmd_seq: 16
iid page's trxid[6020]
NEXT TRX ID = 7024.
[!!!DSC INFO!!!] DSC crash process over!
check CSS cmd: EP REAL OPEN, cmd_seq: 19
total 0 active crash trx, pseg_crash_trx_rollback sys_only(0) begin ...
pseg_crash_trx_rollback end, total 0 active crash trx, include 0 empty_trxs, 0 empty_pages which only need to delete mgr recs.
pseg_crash_trx_rollback end
B机器
[dmdba@dmdsc1 config]$ /opt/dsc/dmdbms/bin/dmserver /opt/dsc/config/dsc1_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
hpc_ini_info_pre_check end, code:0
hlck_sys_init, init g_drm_dest:[0, 1]
lbs_sys_init, the length of g_master_map is 1117, fill it use ok_ep_arr:[0, 1], n_ok_ep:2!
check CSS cmd: DCR_LOAD, cmd_seq: 4
check CSS cmd: EP START, cmd_seq: 8
Control Node change from 255 to 0
mal_tsk_process_g_crash_lsn_bro, ep_seqno(0), crash_lsn(0)
mal_tsk_process_g_crash_lsn_bro, ep_seqno(1), crash_lsn(0)
check CSS cmd: EP START2, cmd_seq: 13
Control node start status: OPEN
EP[1] adjust cur_lsn from [34128] to [34293]
file lsn: 0
ndct db load finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
ckpt2_exec_immediately begin.
file_lsn < cur_lsn & no dirty page & in mount status, ignore checkpoint
checkpoint end, 0 pages flushed, used_space[512], free_space[536862208].
checkpoint end, 0 pages flushed, used_space[0], free_space[536862720].
checkpoint end, 0 pages flushed, used_space[0], free_space[536862720].
pseg recv finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info success.
nsvr_process_before_open begin.
nsvr_process_before_open success.
SYSTEM IS READY.
check CSS cmd: EP OPEN, cmd_seq: 17
iid page's trxid[6017]
NEXT TRX ID = 7021.
check CSS cmd: EP REAL OPEN, cmd_seq: 20
确认DSC归档
A机器
[dmdba@dmdsc0 dsc]$ ls arch_0
ARCHIVE_LOCAL1_0x5E2ABBF0_EP0_2022-04-10_15-08-39.log
[dmdba@dmdsc0 dsc]$ ls arch_1
ARCH_REMOTE1_0x5E2ABBF0_EP1_2022-04-10_15-08-40.log
B机器
[dmdba@dmdsc1 dsc]$ ls arch_0
ARCH_REMOTE1_0x5E2ABBF0_EP0_2022-04-10_15-08-39.log
[dmdba@dmdsc1 dsc]$ ls arch_1
ARCHIVE_LOCAL1_0x5E2ABBF0_EP1_2022-04-10_15-08-40.log
备份DSC库
A机器
SQL> backup database full backupset '/opt/dsc/bak/for_dw_bak';
操作已执行
已用时间: 00:00:08.061. 执行号:5301701.
拷贝到C机器
[dmdba@dmdsc0 bin]$ scp -rp /opt/dsc/bak/for_dw_bak 192.168.56.24:/opt/dw/bak/
dmdba@192.168.56.24's password:
for_dw_bak.bak 100% 777MB 81.1MB/s 00:09
for_dw_bak_1.bak 100% 45KB 8.1MB/s 00:00
for_dw_bak.meta 100% 97KB 25.8MB/s 00:00
配置CSSM
实际上最终搭建完成dmwatcher将会处理cssm的工作,此处只是用来确认一下DSC状态
C机器
[dmdba@dmdw0 config]$ vi dmcssm.ini
CSSM_OGUID = 45331
CSSM_CSS_IP = 10.30.5.17:5336
CSSM_CSS_IP = 10.30.5.18:5337
CSSM_LOG_PATH = ../log
CSSM_LOG_FILE_SIZE = 256
CSSM_LOG_SPACE_LIMIT = 2048
确认集群状态
通过CSSM确认集群状态正常即可
CSS
show GRP_CSS
monitor current time:2022-04-08 08:57:17
=================== group[name = grp_css, seq = 0, type = CSS, Control Node = 0] ========================================
[CSS0] auto check = TRUE, global info:
[ASM0] auto restart = TRUE
[DSC0] auto restart = TRUE
[CSS1] auto check = TRUE, global info:
[ASM1] auto restart = TRUE
[DSC1] auto restart = TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-04-08 08:57:16 CSS0 0 5336 Control Node OPEN WORKING OK TRUE 112679 113577
2022-04-08 08:57:16 CSS1 1 5337 Normal Node OPEN WORKING OK TRUE 148468 149250
==================================================================================================================
ASM
show GRP_ASM
monitor current time:2022-04-08 08:57:20
=================== group[name = grp_asm, seq = 1, type = ASM, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-04-08 08:57:20 ASM0 0 5436 Control Node OPEN WORKING OK TRUE 125753 126613
2022-04-08 08:57:20 ASM1 1 5437 Normal Node OPEN WORKING OK TRUE 161236 161980
==================================================================================================================
DB
show GRP_DSC
monitor current time:2022-04-20 11:02:08
=================== group[name = grp_dsc, seq = 2, type = DB, Control Node = 0] ========================================
n_ok_ep = 2
ok_ep_arr(index, seqno):
(0, 0)
(1, 1)
sta = OPEN, sub_sta = STARTUP
break ep = NULL
recover ep = NULL
crash process over flag is TRUE
ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts
2022-04-20 11:02:07 DSC0 0 5236 Control Node OPEN WORKING OK TRUE 277743 284931
2022-04-20 11:02:07 DSC1 1 5236 Normal Node OPEN WORKING OK TRUE 276920 284104
==================================================================================================================
注册并启动CSS服务
之前手动操作无异常情况下,依次退出DBASM/CSS,打开ASM自动拉起,注册CSS为服务启动
此处并不使用DB自动拉起,将会在之后交给启动到mount模式的单独DB服务进行管理,以便整合DW启停规范模式
A机器
[dmdba@dmdsc0 config]$ vi dmdcr.ini
DMDCR_PATH = /dev/raw/raw1
DMDCR_MAL_PATH = /opt/dsc/config/dmasvrmal.ini
DMDCR_SEQNO = 0
DMDCR_ASM_RESTART_INTERVAL = 30
DMDCR_ASM_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmasmsvr dcr_ini=/opt/dsc/config/dmdcr.ini
#DMDCR_DB_RESTART_INTERVAL = 60
#DMDCR_DB_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini
[root@dmdsc0 ~]# sh /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmcss -dcr_ini /opt/dsc/config/dmdcr.ini -p CSS
Finished to create the service (DmCSSServiceCSS)
[root@dmdsc0 ~]# systemctl start DmCSSServiceCSS.service
B机器
[dmdba@dmdsc1 config]$ vi dmdcr.ini
DMDCR_PATH = /dev/raw/raw1
DMDCR_MAL_PATH = /opt/dsc/config/dmasvrmal.ini
DMDCR_SEQNO = 1
DMDCR_ASM_RESTART_INTERVAL = 30
DMDCR_ASM_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmasmsvr dcr_ini=/opt/dsc/config/dmdcr.ini
#DMDCR_DB_RESTART_INTERVAL = 60
#DMDCR_DB_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc1_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini
[root@dmdsc1 ~]# sh /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmcss -dcr_ini /opt/dsc/config/dmdcr.ini -p CSS
Finished to create the service (DmCSSServiceCSS)
[root@dmdsc1 ~]# systemctl start DmCSSServiceCSS
DW搭建
由于DW搭建过程中必须将DSC主备同时启动到MOUNT状态,所以也就不考虑在线扩容操作的意义,直接从配置文件进行即可
初始化单实例库
DSC的备份集不允许通过TYPE 2方式直接进行还原,大概因为有多个不同dm.ini,所以此处可以先dminit关键信息来初始化单机库
C机器
[dmdba@dmdw0 bin]$ ./dminit PATH=/opt/dw/data EXTENT_SIZE=32 PAGE_SIZE=32 LOG_SIZE=256 CASE_SENSITIVE=Y CHARSET=0
initdb V8
db version: 0x7000c
file dm.key not found, use default license!
License will expire on 2023-03-04
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
log file path: /opt/dw/data/DAMENG/DAMENG01.log
log file path: /opt/dw/data/DAMENG/DAMENG02.log
write to dir [/opt/dw/data/DAMENG].
create dm database success. 2022-04-18 18:09:23
还原DSC库到单实例
C机器
[dmdba@dmdw0 dmdbms]$ cd bin
[dmdba@dmdw0 bin]$ ./dmrman
dmrman V8
RMAN> restore database '/opt/dw/data/DAMENG/dm.ini' from backupset '/opt/dw/bak/for_dw_bak';
restore database '/opt/dw/data/DAMENG/dm.ini' from backupset '/opt/dw/bak/for_dw_bak';
file dm.key not found, use default license!
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
[Percent:100.00%][Speed:0.00M/s][Cost:00:00:10][Remaining:00:00:00]
restore successfully.
time used: 00:00:10.167
RMAN> recover database '/opt/dw/data/DAMENG/dm.ini' from backupset '/opt/dw/bak/for_dw_bak';
recover database '/opt/dw/data/DAMENG/dm.ini' from backupset '/opt/dw/bak/for_dw_bak';
Database mode = 0, oguid = 0
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
EP[0]'s cur_lsn[13086226], file_lsn[13086226]
[Percent:100.00%][Speed:0.00PKG/s][Cost:00:00:00][Remaining:00:00:00]
recover successfully!
time used: 00:00:04.499
RMAN> recover database '/opt/dw/data/DAMENG/dm.ini' update db_magic;
recover database '/opt/dw/data/DAMENG/dm.ini' update db_magic;
Database mode = 0, oguid = 0
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
EP[0]'s cur_lsn[13086261], file_lsn[13086261]
recover successfully!
time used: 00:00:01.006
修改守护相关参数
A/B机器归档在之前已经打开,如果之前未打开则在此处需要配置
A机器
[dmdba@dmdsc0 bin]$ vi /opt/dsc/config/dsc0_config/dm.ini
DW_INACTIVE_INTERVAL = 60
ALTER_MODE_STATUS = 0
ENABLE_OFFLINE_TS = 2
RLOG_SEND_APPLY_MON = 64
B机器
[dmdba@dmdsc1 ~]$ vi /opt/dsc/config/dsc1_config/dm.ini
DW_INACTIVE_INTERVAL = 60
ALTER_MODE_STATUS = 0
ENABLE_OFFLINE_TS = 2
RLOG_SEND_APPLY_MON = 64
C机器
[dmdba@dmdw0 dmdbms]$ vi /opt/dw/data/DAMENG/dm.ini
INSTANCE_NAME = DW0
PORT_NUM = 5238
DW_INACTIVE_INTERVAL = 60
ALTER_MODE_STATUS = 0
ENABLE_OFFLINE_TS = 2
MAL_INI = 1
ARCH_INI = 1
RLOG_SEND_APPLY_MON = 64
修改MAL配置
A机器
[dmdba@dmdsc0 bin]$ vi /opt/dsc/config/dsc0_config/dmmal.ini
[mal_inst0]
mal_inst_name = DSC0
mal_host = 10.30.5.17
mal_port = 5736
mal_inst_host = 192.168.56.7
mal_inst_port = 5236
mal_dw_port = 5836
mal_inst_dw_port = 5936
[mal_inst1]
mal_inst_name = DSC1
mal_host = 10.30.5.18
mal_port = 5737
mal_inst_host = 192.168.56.8
mal_inst_port = 5236
mal_dw_port = 5837
mal_inst_dw_port = 5937
[mal_inst2]
mal_inst_name = DW0
mal_host = 10.30.5.24
mal_port = 5738
mal_inst_host = 192.168.56.24
mal_inst_port = 5238
mal_dw_port = 5838
mal_inst_dw_port = 5938
B机器
从A拷贝即可
[dmdba@dmdsc0 bin]$ scp -rp /opt/dsc/config/dsc0_config/dmmal.ini 192.168.56.8:/opt/dsc/config/dsc1_config/
dmdba@192.168.56.8's password:
dmmal.ini 100% 296 529.8KB/s 00:00
C机器
从A拷贝即可
[dmdba@dmdsc0 bin]$ scp -rp /opt/dsc/config/dsc0_config/dmmal.ini 192.168.56.24:/opt/dw/data/DAMENG/
dmdba@192.168.56.24's password:
dmmal.ini 100% 296 399.4KB/s 00:00
修改归档配置
A机器
[dmdba@dmdsc0 bin]$ vi /opt/dsc/config/dsc0_config/dmarch.ini
#DaMeng Database Archive Configuration file
#this is comments
ARCH_WAIT_APPLY = 0
[ARCHIVE_LOCAL1]
ARCH_TYPE = LOCAL
ARCH_DEST = /opt/dsc/arch_0
ARCH_FILE_SIZE = 256
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
ARCH_HANG_FLAG = 1
[ARCH_REMOTE1]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC1
ARCH_INCOMING_PATH = /opt/dsc/arch_1
ARCH_FILE_SIZE = 256
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
[ARCHIVE_REALTIME1]
ARCH_TYPE = REALTIME
ARCH_DEST = DW0
B机器
[dmdba@dmdsc1 ~]$ vi /opt/dsc/config/dsc1_config/dmarch.ini
#DaMeng Database Archive Configuration file
#this is comments
ARCH_WAIT_APPLY = 0
[ARCHIVE_LOCAL1]
ARCH_TYPE = LOCAL
ARCH_DEST = /opt/dsc/arch_1
ARCH_FILE_SIZE = 256
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
ARCH_HANG_FLAG = 1
[ARCH_REMOTE1]
ARCH_TYPE = REMOTE
ARCH_DEST = DSC0
ARCH_INCOMING_PATH = /opt/dsc/arch_0
ARCH_FILE_SIZE = 256
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
[ARCHIVE_REALTIME1]
ARCH_TYPE = REALTIME
ARCH_DEST = DW0
C机器
[dmdba@dmdw0 dmdbms]$ vi /opt/dw/data/DAMENG/dmarch.ini
#DaMeng Database Archive Configuration file
#this is comments
ARCH_WAIT_APPLY = 0
[ARCHIVE_LOCAL1]
ARCH_TYPE = LOCAL
ARCH_DEST = /opt/dw/arch
ARCH_FILE_SIZE = 256
ARCH_SPACE_LIMIT = 1024
ARCH_FLUSH_BUF_SIZE = 0
[ARCHIVE_REALTIME1]
ARCH_TYPE = REALTIME
ARCH_DEST = DSC0/DSC1
参数中的ARCH_WAIT_APPLY和ARCH_SPACE_LIMIT可以根据实际主备场景进行调整,高性能模式可以提升整体性能,但如果备节点性能过差也会导致LSN逐渐拉开,而ARCH_SPACE_LIMIT会影响归档清理的触发,如果备节点还原后需要请求的归档已经被清理,则会返回718错误
配置dmwatcher
自动切换还是手动切换可以根据实际需要来修改,实例启动这里保持关闭,如果这里通过dmwatcher来fork进程启动实例,则当dmwatcher进程关闭时,其fork进程一并被结束,这样则违背了STANDBY DB应当最后结束的操作规范,可能产生INVLIAD LSN,同时也不符合先启动DB再打开dmwatcher的启动顺序要求
尽管可以通过将命令变为后台执行来规避上面的逻辑问题,但作为一个野进程放置后台并不便于优雅的管理
A机器
[dmdba@dmdsc0 bin]$ vi /opt/dsc/config/dmwatcher.ini
[GRP1]
DW_TYPE = GLOBAL
DW_MODE = AUTO
DW_ERROR_TIME = 60
INST_RECOVER_TIME = 60
INST_ERROR_TIME = 35
INST_INI = /opt/dsc/config/dsc0_config/dm.ini
DCR_INI = /opt/dsc/config/dmdcr.ini
INST_OGUID = 45332
INST_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver
INST_AUTO_RESTART = 0
RLOG_SEND_THRESHOLD = 0
RLOG_APPLY_THRESHOLD = 0
B机器
[dmdba@dmdsc1 bin]$ vi /opt/dsc/config/dmwatcher.ini
[GRP1]
DW_TYPE = GLOBAL
DW_MODE = AUTO //自动切换 或手动
DW_ERROR_TIME = 60
INST_RECOVER_TIME = 60
INST_ERROR_TIME = 35
INST_INI = /opt/dsc/config/dsc1_config/dm.ini
DCR_INI = /opt/dsc/config/dmdcr.ini
INST_OGUID = 45332
INST_STARTUP_CMD = /opt/dsc/dmdbms/bin/dmserver
INST_AUTO_RESTART = 0
RLOG_SEND_THRESHOLD = 0
RLOG_APPLY_THRESHOLD = 0
C机器
[dmdba@dmdw0 dmdbms]$ vi /opt/dw/data/DAMENG/dmwatcher.ini
[GRP1]
DW_TYPE = GLOBAL
DW_MODE = AUTO //自动切换 或手动
DW_ERROR_TIME = 60
INST_RECOVER_TIME = 60
INST_ERROR_TIME = 35
INST_INI = /opt/dw/data/DAMENG/dm.ini
INST_OGUID = 45332
INST_STARTUP_CMD = /opt/dw/dmdbms/bin/dmserver
INST_AUTO_RESTART = 0
RLOG_SEND_THRESHOLD = 0
RLOG_APPLY_THRESHOLD = 0
配置dmmonitor
如果有3/5台可以放置于集群成员或外部机器组成raft协议,这里以一台D机器配置,关于多台的场景可以参考我另一篇博文
D机器
[dmdba@tpcc config]$ vi dmmonitor.ini
MON_LOG_PATH = ../dmdbms/log
MON_LOG_INTERVAL = 60
MON_LOG_FILE_SIZE = 64
MON_LOG_SPACE_LIMIT = 0
MON_DW_CONFIRM = 1 //配合自动切换
[GRP1]
MON_INST_OGUID = 45332
MON_DW_IP = 10.30.5.17:5836/10.30.5.18:5837
MON_DW_IP = 10.30.5.24:5838
手动启动实例
关闭所有节点,依次启动DSC集群节点到mount,再启动DW节点到mount
A机器
[dmdba@dmdsc0 bin]$ /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc0_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini mount
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
hlck_sys_init, init g_drm_dest:[0, 1]
lbs_sys_init, the length of g_master_map is 1117, fill it use ok_ep_arr:[0, 1], n_ok_ep:2!
check CSS cmd: START NOTIFY, cmd_seq: 33
Control Node change from 255 to 254
check CSS cmd: DCR_LOAD, cmd_seq: 34
check CSS cmd: EP START, cmd_seq: 37
Control Node change from 254 to 0
EP[0] adjust cur_lsn from [13103624] to [13103632]
file lsn: 13103624
check CSS cmd: EP START2, cmd_seq: 42
ndct db load finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info success.
SYSTEM IS READY.
check CSS cmd: EP OPEN, cmd_seq: 47
iid page's trxid[6016]
NEXT TRX ID = 576502.
[!!!DSC INFO!!!] DSC crash process over!
check CSS cmd: EP REAL OPEN, cmd_seq: 50
B机器
[dmdba@dmdsc1 bin]$ /opt/dsc/dmdbms/bin/dmserver path=/opt/dsc/config/dsc1_config/dm.ini dcr_ini=/opt/dsc/config/dmdcr.ini mount
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
hpc_ini_info_pre_check end, code:0
hlck_sys_init, init g_drm_dest:[0, 1]
lbs_sys_init, the length of g_master_map is 1117, fill it use ok_ep_arr:[0, 1], n_ok_ep:2!
check CSS cmd: DCR_LOAD, cmd_seq: 35
check CSS cmd: EP START, cmd_seq: 39
Control Node change from 255 to 0
mal_tsk_process_g_crash_lsn_bro, ep_seqno(0), crash_lsn(0)
mal_tsk_process_g_crash_lsn_bro, ep_seqno(1), crash_lsn(0)
check CSS cmd: EP START2, cmd_seq: 44
Control node start status: MOUNT
file lsn: 13103632
begin redo pwr log collect, last ckpt lsn: 13103624 ...
redo pwr log collect finished
ndct db load finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info success.
SYSTEM IS READY.
check CSS cmd: EP OPEN, cmd_seq: 48
iid page's trxid[6017]
NEXT TRX ID = 576501.
check CSS cmd: EP REAL OPEN, cmd_seq: 51
C机器
[dmdba@dmdw0 bin]$ /opt/dw/dmdbms/bin/dmserver path=/opt/dw/data/DAMENG/dm.ini mount
file dm.key not found, use default license!
version info: develop
DM Database Server 64 V8 03134283890-20220304-158322-10045 startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 0, oguid = 0
License will expire on 2023-03-04
file lsn: 13086261
ndct db load finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct second level fill fast pool finished
ndct third level fill fast pool finished
ndct fill fast pool finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info success.
SYSTEM IS READY.
配置OGUID/数据库模式
A/B机器任意一台
[dmdba@dmdsc0 bin]$ ./disql SYSDBA/SYSDBA@192.168.56.7:5236
服务器[192.168.56.7:5236]:处于普通配置状态
登录使用时间 : 8.027(ms)
disql V8
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 1);
DMSQL 过程已成功完成
已用时间: 50.718(毫秒). 执行号:0.
SQL> ALTER DATABASE PRIMARY;
操作已执行
已用时间: 84.143(毫秒). 执行号:0.
SQL> SP_SET_OGUID(45332);
DMSQL 过程已成功完成
已用时间: 42.604(毫秒). 执行号:1.
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 0);
DMSQL 过程已成功完成
已用时间: 9.889(毫秒). 执行号:2.
C机器
[dmdba@dmdw0 bin]$ ./disql SYSDBA/SYSDBA@192.168.56.24:5238
服务器[192.168.56.24:5238]:处于普通配置状态
登录使用时间 : 13.113(ms)
disql V8
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 1);
DMSQL 过程已成功完成
已用时间: 125.324(毫秒). 执行号:0.
SQL> ALTER DATABASE STANDBY;
操作已执行
已用时间: 32.043(毫秒). 执行号:0.
SQL> SP_SET_OGUID(45332);
DMSQL 过程已成功完成
已用时间: 13.134(毫秒). 执行号:1.
SQL> SP_SET_PARA_VALUE(1, 'ALTER_MODE_STATUS', 0);
DMSQL 过程已成功完成
已用时间: 4.570(毫秒). 执行号:2.
手动启动dmwatcher
先启动备机器dmwatcher,再启动主机器dmwatcher
C机器
[dmdba@dmdw0 bin]$ /opt/dw/dmdbms/bin/dmwatcher path=/opt/dw/data/DAMENG/dmwatcher.ini
DMWATCHER[4.0] V8
DMWATCHER[4.0] IS READY
A机器
[dmdba@dmdsc0 bin]$ /opt/dsc/dmdbms/bin/dmwatcher path=/opt/dsc/config/dmwatcher.ini
DMWATCHER[4.0] V8
DMWATCHER[4.0] IS READY
B机器
[dmdba@dmdsc1 bin]$ /opt/dsc/dmdbms/bin/dmwatcher path=/opt/dsc/config/dmwatcher.ini
DMWATCHER[4.0] V8
DMWATCHER[4.0] IS READY
确认集群状态
手动启动dmmonitor
D机器
[dmdba@tpcc bin]$ /opt/dsc/dmdbms/bin/dmmonitor path=/opt/dsc/config/dmmonitor.ini
[monitor] 2022-04-19 08:54:51: DMMONITOR[4.0] V8
[monitor] 2022-04-19 08:54:51: DMMONITOR[4.0] IS READY.
集群状态已经正常
确认正常后依次退出MONITOR,备dmwatcher,主dmwatcher,DSC数据库,DW数据库
注册数据库库服务
这里我们将所有节点启动到mount状态注册为单独的服务,并且再DSC节点上加上对CSS服务的依赖
A机器
[root@dmdsc0 ~]# /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmserver -p DSC0 -dm_ini /opt/dsc/config/dsc0_config/dm.ini -dcr_ini /opt/dsc/config/dmdcr.ini -m mount -y DmCSSServiceCSS
Created symlink from /etc/systemd/system/multi-user.target.wants/DmServiceDSC0.service to /usr/lib/systemd/system/DmServiceDSC0.service.
创建服务(DmServiceDSC0)完成
B机器
/opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmserver -p DSC1 -dm_ini /opt/dsc/config/dsc1_config/dm.ini -dcr_ini /opt/dsc/config/dmdcr.ini -m mount -y DmCSSServiceCSS
Created symlink from /etc/systemd/system/multi-user.target.wants/DmServiceDSC1.service to /usr/lib/systemd/system/DmServiceDSC1.service.
创建服务(DmServiceDSC1)完成
C机器
[root@dmdw0 ~]# /opt/dw/dmdbms/script/root/dm_service_installer.sh -t dmserver -p DW0 -dm_ini /opt/dw/data/DAMENG/dm.ini -m mount
Created symlink from /etc/systemd/system/multi-user.target.wants/DmServiceDW0.service to /usr/lib/systemd/system/DmServiceDW0.service.
创建服务(DmServiceDW0)完成
注册dmwatcher服务
A机器
[root@dmdsc0 bin]# /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmwatcher -watcher_ini /opt/dsc/config/dmwatcher.ini -p WATCHER
Created symlink from /etc/systemd/system/multi-user.target.wants/DmWatcherServiceWATCHER.service to /usr/lib/systemd/system/DmWatcherServiceWATCHER.service.
创建服务(DmWatcherServiceWATCHER)完成
B机器
[root@dmdsc1 ~]# /opt/dsc/dmdbms/script/root/dm_service_installer.sh -t dmwatcher -watcher_ini /opt/dsc/config/dmwatcher.ini -p WATCHER
Created symlink from /etc/systemd/system/multi-user.target.wants/DmWatcherServiceWATCHER.service to /usr/lib/systemd/system/DmWatcherServiceWATCHER.service.
创建服务(DmWatcherServiceWATCHER)完成
C机器
[root@dmdw0 ~]# /opt/dw/dmdbms/script/root/dm_service_installer.sh -t dmwatcher -watcher_ini /opt/dw/data/DAMENG/dmwatcher.ini -p WATCHER
Created symlink from /etc/systemd/system/multi-user.target.wants/DmWatcherServiceWATCHER.service to /usr/lib/systemd/system/DmWatcherServiceWATCHER.service.
创建服务(DmWatcherServiceWATCHER)完成
启停流程
操作时应当遵守顺序,否则备节点可能出现LSN INVALID导致脑裂,针对本次规划的逻辑启停顺序如下
启动
启动PRIMARY CSS服务
[root@dmdsc0 ~]# systemctl start DmCSSServiceCSS
[root@dmdsc1 ~]# systemctl start DmCSSServiceCSS
启动PRIMARY实例(该服务依赖CSS服务,可能会卡住等待,但不会影响整体逻辑)
[root@dmdsc0 ~]# systemctl start DmServiceDSC0
[root@dmdsc1 ~]# systemctl start DmServiceDSC1
启动STANDBY实例
[root@dmdw0 ~]# systemctl start DmServiceDW0
启动STANDBY dmwatcher
[root@dmdw0 ~]# systemctl start DmWatcherServiceWATCHER.service
启动PRIAMRY dmwatcher
[root@dmdsc0 ~]# systemctl start DmWatcherServiceWATCHER.service
[root@dmdsc1 ~]# systemctl start DmWatcherServiceWATCHER.service
启动监视器
[dmdba@tpcc bin]$ /opt/dsc/dmdbms/bin/dmmonitor path=/opt/dsc/config/dmmonitor.ini
状态确认
可以通过dmmonitor确认其状态是否正常
关闭
退出监视器
关闭STANDBY节点dmwatcher
[root@dmdw0 ~]# systemctl stop DmWatcherServiceWATCHER.service
关闭PRIMARY节点dmwatcher
[root@dmdsc0 ~]# systemctl stop DmWatcherServiceWATCHER.service
[root@dmdsc1 ~]# systemctl stop DmWatcherServiceWATCHER.service
关闭PRIMARY节点数据库
[root@dmdsc0 ~]# systemctl stop DmServiceDSC0
[root@dmdsc1 ~]# systemctl stop DmServiceDSC1
关闭PRIMARY CSS服务
[root@dmdsc0 ~]# systemctl stop DmCSSServiceCSS
[root@dmdsc1 ~]# systemctl stop DmCSSServiceCSS
关闭STANDBY节点数据库
[root@dmdw0 ~]# systemctl stop DmServiceDW0
DW的修复
如果DW的STANDBY出现日志INVALID且不能自动修复,可以手动进行修复,其流程为
- 关闭STANDBY节点dmwatcher
- 关闭PRIMARY节点dmwatcher
- 备份PRIMARY节点数据
- 关闭STANDBY节点并还原数据
- 启动STANDBY节点到mount
- 重新配置STANDBY节点OGUID/模式
- 启动STANDBY节点dmwatcher
- 启动PRIMARY节点dmwatcher
- 确认状态是否修复
此处不用关闭PRIMARY,确保应用正常使用
故障切换测试
DSC单点故障
模拟DSC0故障,故障前
检测到故障并自动切换PRIMARY节点为DSC1
DSC0重新加入后,自动切换回DSC0为PRIMARY
DSC1非PRIMARY 无影响,仅检测到故障,在此不展示
所有DSC故障
发现节点故障,此时DW仍然以STANDBY模式打开,需要通过takeover来接管
可以手动切换为DW0变成PRIMARY
login
用户名:SYSDBA
密码:
[monitor] 2022-04-19 16:07:49: 登录监视器成功!
takeover
[monitor] 2022-04-19 16:07:58: 开始使用实例DW0接管
[monitor] 2022-04-19 16:07:58: 通知守护进程DW0切换TAKEOVER状态
[monitor] 2022-04-19 16:07:58: 守护进程(DW0)状态切换 [OPEN-->TAKEOVER]
[monitor] 2022-04-19 16:07:58: 切换守护进程DW0为TAKEOVER状态成功
[monitor] 2022-04-19 16:07:58: 实例DW0开始执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句
[monitor] 2022-04-19 16:07:58: 实例DW0执行SP_SET_GLOBAL_DW_STATUS(0, 7)语句成功
[monitor] 2022-04-19 16:07:58: 实例DW0开始执行SP_APPLY_KEEP_PKG()语句
[monitor] 2022-04-19 16:07:59: 实例DW0执行SP_APPLY_KEEP_PKG()语句成功
[monitor] 2022-04-19 16:07:59: 实例DW0开始执行ALTER DATABASE MOUNT语句
[monitor] 2022-04-19 16:07:59: 实例DW0执行ALTER DATABASE MOUNT语句成功
[monitor] 2022-04-19 16:07:59: 实例DW0开始执行ALTER DATABASE PRIMARY语句
[monitor] 2022-04-19 16:07:59: 实例DW0执行ALTER DATABASE PRIMARY语句成功
[monitor] 2022-04-19 16:07:59: 通知实例DW0修改所有归档状态无效
[monitor] 2022-04-19 16:07:59: 修改所有实例归档为无效状态成功
[monitor] 2022-04-19 16:07:59: 实例DW0开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2022-04-19 16:07:59: 实例DW0执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2022-04-19 16:07:59: 实例DW0开始执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句
[monitor] 2022-04-19 16:07:59: 实例DW0执行SP_SET_GLOBAL_DW_STATUS(7, 0)语句成功
[monitor] 2022-04-19 16:07:59: 通知守护进程DW0切换OPEN状态
[monitor] 2022-04-19 16:07:59: 守护进程(DW0)状态切换 [TAKEOVER-->OPEN]
[monitor] 2022-04-19 16:08:00: 切换守护进程DW0为OPEN状态成功
[monitor] 2022-04-19 16:08:00: 通知组(GRP1)的守护进程执行清理操作
[monitor] 2022-04-19 16:08:00: 清理守护进程(DW0)请求成功
[monitor] 2022-04-19 16:08:00: 使用实例DW0接管成功
接管后变为
此时可以正常对外服务
[dmdba@dmdw0 bin]$ ./disql SYSDBA/SYSDBA@192.168.56.24:5238
服务器[192.168.56.24:5238]:处于主库打开状态
登录使用时间 : 1.576(ms)
disql V8
SQL>
DSC重新加入
DSC节点通过PRIMARY备份恢复完成修复后正常启动自动加入
此时DW保持PRIMARY,DSC以STANDBY加入
手动切换
如果希望切回DSC为PRIMARY可以手动切换一下
login
用户名:SYSDBA
密码:
[monitor] 2022-04-19 16:15:19: 登录监视器成功!
switchover DSC0
[monitor] 2022-04-19 16:15:25: 开始切换实例DSC0
[monitor] 2022-04-19 16:15:25: 通知守护进程DW0切换SWITCHOVER状态
[monitor] 2022-04-19 16:15:25: 守护进程(DW0)状态切换 [OPEN-->SWITCHOVER]
[monitor] 2022-04-19 16:15:26: 切换守护进程DW0为SWITCHOVER状态成功
[monitor] 2022-04-19 16:15:26: 通知守护进程DSC0切换SWITCHOVER状态
[monitor] 2022-04-19 16:15:26: 守护进程(DSC0)状态切换 [OPEN-->SWITCHOVER]
[monitor] 2022-04-19 16:15:27: 切换守护进程DSC0为SWITCHOVER状态成功
[monitor] 2022-04-19 16:15:27: 实例DW0开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor] 2022-04-19 16:15:27: 实例DW0执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor] 2022-04-19 16:15:27: 实例DSC0开始执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句
[monitor] 2022-04-19 16:15:27: 实例DSC0执行SP_SET_GLOBAL_DW_STATUS(0, 6)语句成功
[monitor] 2022-04-19 16:15:27: 实例DW0开始执行ALTER DATABASE MOUNT语句
[monitor] 2022-04-19 16:15:27: 实例DW0执行ALTER DATABASE MOUNT语句成功
[monitor] 2022-04-19 16:15:27: 实例DSC0开始执行SP_APPLY_KEEP_PKG()语句
[monitor] 2022-04-19 16:15:27: 实例DSC0执行SP_APPLY_KEEP_PKG()语句成功
[monitor] 2022-04-19 16:15:27: 实例DSC0开始执行ALTER DATABASE MOUNT语句
[monitor] 2022-04-19 16:15:27: 实例DSC0执行ALTER DATABASE MOUNT语句成功
[monitor] 2022-04-19 16:15:27: 实例DW0开始执行ALTER DATABASE STANDBY语句
[monitor] 2022-04-19 16:15:27: 实例DW0执行ALTER DATABASE STANDBY语句成功
[monitor] 2022-04-19 16:15:27: 实例DSC0开始执行ALTER DATABASE PRIMARY语句
[monitor] 2022-04-19 16:15:28: 实例DSC0执行ALTER DATABASE PRIMARY语句成功
[monitor] 2022-04-19 16:15:28: 通知实例DSC0修改所有归档状态无效
[monitor] 2022-04-19 16:15:28: 修改所有实例归档为无效状态成功
[monitor] 2022-04-19 16:15:28: 实例DW0开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2022-04-19 16:15:28: 实例DW0执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2022-04-19 16:15:28: 实例DSC0开始执行ALTER DATABASE OPEN FORCE语句
[monitor] 2022-04-19 16:15:29: 实例DSC0执行ALTER DATABASE OPEN FORCE语句成功
[monitor] 2022-04-19 16:15:29: 实例DW0开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor] 2022-04-19 16:15:29: 实例DW0执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor] 2022-04-19 16:15:29: 实例DSC0开始执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句
[monitor] 2022-04-19 16:15:29: 实例DSC0执行SP_SET_GLOBAL_DW_STATUS(6, 0)语句成功
[monitor] 2022-04-19 16:15:29: 通知守护进程DW0切换OPEN状态
[monitor] 2022-04-19 16:15:29: 守护进程(DW0)状态切换 [SWITCHOVER-->OPEN]
[monitor] 2022-04-19 16:15:30: 切换守护进程DW0为OPEN状态成功
[monitor] 2022-04-19 16:15:30: 通知守护进程DSC0切换OPEN状态
[monitor] 2022-04-19 16:15:31: 守护进程(DSC0)状态切换 [SWITCHOVER-->OPEN]
[monitor] 2022-04-19 16:15:31: 切换守护进程DSC0为OPEN状态成功
[monitor] 2022-04-19 16:15:31: 通知组(GRP1)的守护进程执行清理操作
[monitor] 2022-04-19 16:15:31: 清理守护进程(DSC0)请求成功
[monitor] 2022-04-19 16:15:31: 清理守护进程(DW0)请求成功
[monitor] 2022-04-19 16:15:31: 实例DSC0切换成功
切换后
总结
本文针对DM8 DSC+DW的搭建,操作流程,集群修复及故障切换基于一个测试场景进行了整理,实际生产环境必定有更多需要关注的细节,有机会会在后续进行分享。
达梦云适配技术社区
https://eco.dameng.com/