南大通用数据库-Gbase-8a-学习-09-集群节点替换

一、测试环境

名称
cpuIntel® Core™ i5-1035G1 CPU @ 1.00GHz
操作系统CentOS Linux release 7.9.2009 (Core)
内存4G
逻辑核数3
原有节点1-IP192.168.142.10
替换节点2-IP192.168.142.11
数据库版本8.6.2.43-R33.132743

二、模拟场景

192.168.142.11节点由于磁盘损坏,我们需要做集群节点替换,先将操作步骤的1、2、3做好,也就是做到将节点变为不可用和删除event,等用户修改磁盘,我们开始跑替换脚本。

生产环境节点替换,建议停止业务对数据库进行操作。

三、操作步骤

1、查看集群状态

[root@localhost gcinstall]# gcadmin
CLUSTER STATE:  ACTIVE
CLUSTER MODE:   NORMAL

=====================================================================
|               GBASE COORDINATOR CLUSTER INFORMATION               |
=====================================================================
|   NodeName   |       IpAddress       |gcware |gcluster |DataState |
---------------------------------------------------------------------
| coordinator1 |    192.168.142.10     | OPEN  |  OPEN   |    0     |
---------------------------------------------------------------------
=================================================================
|                GBASE DATA CLUSTER INFORMATION                 |
=================================================================
|NodeName |       IpAddress       |gnode |syncserver |DataState |
-----------------------------------------------------------------
|  node1  |    192.168.142.10     | OPEN |   OPEN    |    0     |
-----------------------------------------------------------------
|  node2  |    192.168.142.11     | OPEN |   OPEN    |    0     |
-----------------------------------------------------------------
[root@localhost gcinstall]# gcadmin showdistribution

              Distribution ID: 5 | State: new | Total segment num: 2

     Primary Segment Node IP                           Segment ID         Duplicate Segment node IP
========================================================================================================================
|    192.168.142.11                              |       1          |    192.168.142.10                                |
------------------------------------------------------------------------------------------------------------------------
|    192.168.142.10                              |       2          |    192.168.142.11                                |
========================================================================================================================

2、将节点变为不可用

语法介绍:

gcadmin setnodestate IP [failure| unavailable |normal]
名称描述
normal正常
failure故障,记录后续event。如果服务器能恢复,可以设置回normal,集群会自动同步;如确认节点数据丢失,可以再设置为不可用状态。
unavaliable不可用状态,节点已经明确判断不可恢复,数据肯定丢失,此时集群不再记录event,后续请通过【节点替换】功能修复。 如确认肯定没有ddl,dml等会导致event的操作发生,也可强制改成normal。
[gbase@localhost gcinstall]$ gcadmin setnodestate 192.168.142.11 unavailable
load gbase client dll start ......
load gbase client dll end ......

after set node state into unavailable,can not set the state into normal,
must run gcadmin replacenodes to replace this node ,after that command node state can return into normal.
you realy want to set node state into unavailable(yes or no)?
yes
get node data state by ddl fevent log start ......
get node data state by ddl fevent log end ......
get node data state by dml fevent log start ......
get node data state by dml fevent log end ......
get node data state by dml storage fevent log start ......
get node data state by dml storage fevent log end ......

check data server node data state by fevent log start ......
check data server node data state by fevent log end ......

3、查看集群状态

192.168.142.11节点已经变为:UNAVAILABLE 状态。

[gbase@localhost gcinstall]$ gcadmin
CLUSTER STATE:  ACTIVE
CLUSTER MODE:   NORMAL

=====================================================================
|               GBASE COORDINATOR CLUSTER INFORMATION               |
=====================================================================
|   NodeName   |       IpAddress       |gcware |gcluster |DataState |
---------------------------------------------------------------------
| coordinator1 |    192.168.142.10     | OPEN  |  OPEN   |    0     |
---------------------------------------------------------------------
========================================================================
|                    GBASE DATA CLUSTER INFORMATION                    |
========================================================================
|NodeName |       IpAddress       |    gnode    |syncserver |DataState |
------------------------------------------------------------------------
|  node1  |    192.168.142.10     |    OPEN     |   OPEN    |    0     |
------------------------------------------------------------------------
|  node2  |    192.168.142.11     | UNAVAILABLE |           |          |
------------------------------------------------------------------------

4、清理故障节点event

[gbase@localhost gcinstall]$ gcadmin rmdmlstorageevent 2 192.168.142.11

[gbase@localhost gcinstall]$ gcadmin rmddlevent 2 192.168.142.11

[gbase@localhost gcinstall]$ gcadmin rmdmlevent 2 192.168.142.11

语法:

gcadmin rmdmlevent type [eventid | tablename | nodeip | tablename nodeip]

gcadmin rmddlevent type [eventid | tablename | nodeip | tablename nodeip]

gcadmin rmdmlstorageevent type [eventid | tablename | nodeip | tablename nodeip]
描述
0eventid 按照具体的事件编号清理
1tablename 清理指定表名字的所有事件
2nodeip 清理指定IP上的所有事件
3tablename nodeip 清理指定表名字,指定IP上的所有事件

5、模拟现场更换磁盘

192.168.142.11停服务,查看进程是否存在。

[root@localhost opt]# service gcware stop

[root@localhost opt]# ps -ef|grep gbase
root       8271   3575  0 12:32 pts/0    00:00:00 grep --color=auto gbase

这一步生产环境不用做。
192.168.142.11删除相关文件。

[root@localhost opt]# rm -rf gcluster/

[root@localhost opt]# rm -rf gnode/

6、执行替换

注意点:替换操作会检查集群ddl锁,如果有ddl语句在运行,将等待其完成,如果确认其可以清理,可以kill掉,以便加快节点替换进度。

[gbase@localhost gcinstall]$ ./replace.py --host=192.168.142.11 --rootPwd=qwer1234 --dbaUser=gbase --dbaUserPwd=gbase --sync_coordi_metadata_timeout=1500 --overwrite
check os password ...
check os password successful
check database password ...
check database password successful
192.168.142.11
Are you sure to replace install these nodes ([Y,y]/[N,n])? y
Starting all gcluster nodes...
get table id and set dmlstorageevent on node [::ffff:192.168.142.10], please wait a moment
load gbase client dll start ......
load gbase client dll end ......

check node data map and cluster state start ......
check node data map and cluster state end ......

get distribution information start ......
get distribution information end ......

check ip start ......
check ip end ......

switch cluster mode into READONLY start ......
wait all ddl statement stop ......

all ddl statement stoped
switch cluster mode into READONLY end ......

delete all fevent log on replace nodes start ......
delete ddl event log on node 192.168.142.11 start
delete ddl event log on node 192.168.142.11 end
delete dml event log on node 192.168.142.11 start
delete dml event log on node 192.168.142.11 end
delete dml storage event log on node 192.168.142.11 start
delete dml storage event log on node 192.168.142.11 end
delete all fevent log on replace nodes end ......

sync metedata start ......
sync coordinator metedata start ......
sync coordinator metedata end,spend time 0 ms ......
sync data server metedata start ......
copy script to data node begin
copy script to data node end
build data packet begin
build data packet end
copy data packet to target node begin
copy data packet to target node end
extract data packet begin
extract data packet end
sync dataserver metedata end,spend time 20169 ms ......
sync metedata end ......

set sync data flag start ......
create database start ......
create database end ......

create database and set table dml storage event spend time 6191 ms ......
set sync data flag end ......

restore cluster mode start ......
restore cluster mode end ......

restore node state start ......
restore node state end ......

all nodes replace success end
replace nodes spend time: 89500 ms
Replace gcluster nodes successfully.

替换的相关日志查看

[root@localhost gcinstall]# tail -f replace.log 
restore cluster mode end ......

restore node state start ......
restore node state end ......

all nodes replace success end
replace nodes spend time: 89500 ms
2022-08-14 12:35:37,566-root-INFO gcadmin replacenodes ended.
2022-08-14 12:35:37,567-root-DEBUG kill gclusterd or gbased to started for udf(or other) in replace.
2022-08-14 12:35:39,546-root-INFO success to kill gclusterd or gbased on 192.168.142.11

7、查看集群状态

[root@localhost gcinstall]# gcadmin
CLUSTER STATE:  ACTIVE
CLUSTER MODE:   NORMAL

=====================================================================
|               GBASE COORDINATOR CLUSTER INFORMATION               |
=====================================================================
|   NodeName   |       IpAddress       |gcware |gcluster |DataState |
---------------------------------------------------------------------
| coordinator1 |    192.168.142.10     | OPEN  |  OPEN   |    0     |
---------------------------------------------------------------------
=================================================================
|                GBASE DATA CLUSTER INFORMATION                 |
=================================================================
|NodeName |       IpAddress       |gnode |syncserver |DataState |
-----------------------------------------------------------------
|  node1  |    192.168.142.10     | OPEN |   OPEN    |    0     |
-----------------------------------------------------------------
|  node2  |    192.168.142.11     | OPEN |   OPEN    |    0     |
-----------------------------------------------------------------

我这边集群中没有什么数据量,所以集群状态很快恢复正常。
实际中可能为:
(1)节点会有一段时间处于readonly状态,故障节点为REPLACE状态。
(2)集群恢复,故障节点为OFFLINE状态。
(3)节点上线,设置同步状态。

注意点:
集群替换后记得查看dml、ddl事件是否在稳步下降。

[root@localhost gcinstall]# gcadmin showddlevent
Event count:0
[root@localhost gcinstall]# gcadmin showdmlevent
Event count:0
[root@xdw0 ~]# gcadmin showdmlstorageevent
Event count:0
  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值