当前现状
一共4台服务器,每台服务器开启2个数据库为MPP服务,部署情况如下
1、服务器1
主库:10.0.0.30:7236 EP01
从库:10.0.0.30:7237 EP04_1
2、服务器2
主库:10.0.0.31:7236 EP02
从库:10.0.0.31:7237 EP01_1
3、服务器3
主库:10.0.0.32:7236 EP03
从库:10.0.0.32:7237 EP02_1
4、服务器4
主库:10.0.0.33:7236 EP04
从库:10.0.0.33:7237 EP03_1
模拟故障场景1描述
将服务器2和服务器3 中 EP02,EP02_1 服务停止,然后通过服务器3的EP03连接集群,使用global和local参数登录
模拟故障场景1过程
1、关闭EP02的节点
[dmdba@dmdbpri ~]$
[dmdba@dmdbpri ~]$
[dmdba@dmdbpri ~]$ chkconfig --list |grep DmService
DmService_EP01_1 0:off 1:off 2:on 3:on 4:on 5:on 6:off
DmService_EP02 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[dmdba@dmdbpri ~]$
[dmdba@dmdbpri ~]$
[dmdba@dmdbpri ~]$
[dmdba@dmdbpri ~]$ service DmService_EP02 stop
Stopping DmService_EP02: [ OK ]
[dmdba@dmdbpri ~]$
[dmdba@dmdbpri ~]$
[dmdba@dmdbpri ~]$
[dmdba@dmdbpri ~]$
2、关闭EP02_1的节点
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$ chkconfig --list |grep DmService
DmService_EP02_1 0:off 1:off 2:on 3:on 4:on 5:on 6:off
DmService_EP03 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[dmdba@dmdbs1 ~]$ service DmService_EP02_1 stop
Stopping DmService_EP02_1: [ OK ]
[dmdba@dmdbs1 ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:76:18:88 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.32/24 brd 10.0.0.255 scope global eth1
inet6 fe80::20c:29ff:fe76:1888/64 scope link
valid_lft forever preferred_lft forever
[dmdba@dmdbs1 ~]$ ss -lntp
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 :::7236 :::* users:(("dmserver",2636,5))
LISTEN 0 128 :::7336 :::* users:(("dmserver",2636,9))
LISTEN 0 128 :::7436 :::* users:(("dmserver",2636,25))
LISTEN 0 128 :::4236 :::* users:(("dmap",2372,5))
LISTEN 0 128 :::7536 :::* users:(("dmwatcher",2787,4))
LISTEN 0 128 :::7537 :::* users:(("dmwatcher",2787,6))
LISTEN 0 128 :::22 :::*
LISTEN 0 128 *:22 *:*
LISTEN 0 100 ::1:25 :::*
LISTEN 0 100 127.0.0.1:25 *:*
LISTEN 0 50 :::6363 :::*
LISTEN 0 50 :::6364 :::*
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$
3、通过global全局连接:集群异常,无法连接
[dmdba@dmdbs1 ~]$ disql SYSDBA/SYSDBA@10.0.0.32:7236#"{mpp_type=global}"
[-6024]:Remote node global login failed.
disql V8
username:
password:
[-70028]:Create SOCKET connection failure.
username:
password:
[-70028]:Create SOCKET connection failure.
Can not connect to SERVER after three tries, exit DISQL
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$
4、通过local本地连接:正常
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$ disql SYSDBA/SYSDBA@10.0.0.32:7236#"{mpp_type=local}"
Server[10.0.0.32:7236]:mode is primary, state is open
login used time : 1.365(ms)
disql V8
SQL>
SQL>
SQL>
SQL> exit
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$
模拟故障场景1结论
达梦V8大规模并行处理 MPP中,某个EP节点涉及的整个主服务和备服务都异常的话,整个集群是不能使用的。
模拟故障场景2描述
接上故障,启动服务器3 中 EP02_1 服务,然后通过服务器3的EP03连接集群,使用global和local参数登录
模拟故障场景2过程
1、EP02_1恢复正常
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$ service DmService_EP02_1 start
Starting DmService_EP02_1: [ OK ]
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$
2、通过global全局连接:正常
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$ disql SYSDBA/SYSDBA@10.0.0.32:7236#"{mpp_type=global}"
Server[10.0.0.32:7236]:mode is primary, state is open
login used time : 2.155(ms)
disql V8
SQL>
SQL>
SQL>
2、通过local本地连接:正常
[dmdba@dmdbs1 ~]$
[dmdba@dmdbs1 ~]$ disql SYSDBA/SYSDBA@10.0.0.32:7236#"{mpp_type=local}"
Server[10.0.0.32:7236]:mode is primary, state is open
login used time : 1.325(ms)
disql V8
SQL>
模拟故障场景2结论
梦V8大规模并行处理 MPP中,某个EP节点只有还有存活的节点,整个集群是正常使用,不受单节点故障影响。