GBASE南大通用是具有自主知识产权的国产数据库厂商,GBase 8a集群是GBASE南大通用提供的分布式列存架构分析型数据库,主要应用在政府、党委、安全敏感部门、国防、统计、审计、银监、证监等领域,以及电信、金融、电力等拥有海量业务数据的行业。
GBase 8a 数据库集群是通过主副本机制实现的高可用。当 一个节点完全损坏是,需要在一台新服务器上做节点替换。Replace工具是GBASE南大通用专门用于在 GBase 8a MPP Cluster 中进行节点替换的工具。
现以三节点集群模拟节点故障,并恢复的全过程。
一、模拟节点故障,设置节点不可用
[gbase@node1 gcinstall_95]$ gcadmin setnodestate 192.168.2.183 unavailable
after set node state into unavailable,can not set the state into normal,
must run gcadmin replacenodes to replace this node ,after that command node state can return into normal.
you realy want to set node state into unavailable(yes or no)?
yes
get node data state by ddl fevent log start …
get node data state by ddl fevent log end …
get node data state by dml fevent log start …
get node data state by dml fevent log end …
get node data state by dml storage fevent log start …
get node data state by dml storage fevent log end …
check data server node data state by fevent log start …
check data server node data state by fevent log end …
set node [192.168.2.183] state to unavailable successful
[gbase@node1 gcinstall_95]$
二、清除故障节点event
[gbase@node1 gcinstall_95]$ gcadmin rmfeventlog 192.168.2.183
after rmfeventlog 192.168.2.183, fevent log will be removed, must run gcadmin replacenodes to replace this node.
you realy want to remove node 192.168.2.183 fevent log(yes or no)?
yes
delete ddl event log on node 192.168.2.183 start
delete ddl event log on node 192.168.2.183 end
delete dml event log on node 192.168.2.183 start
delete dml event log on node 192.168.2.183 end
delete dml storage event log on node 192.168.2.183 start
delete dml storage event log on node 192.168.2.183 end
[gbase@node1 gcinstall_95]$
[gbase@node1 gcinstall_95]$ gcadmin showdistribution
三、获取现有集群的distribution
[gbase@node1 gcinstall_95]$ gcadmin getdistribution 1 distribution_info_1.xml
gcadmin getdistribution 1 distribution_info_1.xml …
get segments information
write segments information to file [distribution_info_1.xml]
gcadmin getdistribution information successful
[gbase@node1 gcinstall_95]$ cat distribution_info_1.xml
<segment>
<primarynode ip="192.168.2.182"/>
<duplicatenodes>
<duplicatenode ip="192.168.2.183"/>
</duplicatenodes>
</segment>
<segment>
<primarynode ip="192.168.2.183"/>
<duplicatenodes>
<duplicatenode ip="192.168.2.181"/>
</duplicatenodes>
</segment>
</segments>
</distribution>
四、修改文件,去掉替换的节点,如果节点上是备份分片,改成主分片
[gbase@node1 gcinstall_95]$ cat distribution_info_2.xml
<segment>
<primarynode ip="192.168.2.182"/>
</segment>
<segment>
<primarynode ip="192.168.2.181"/>
</segment>
</segments>
</distribution>
[gbase@node1 gcinstall_95]$ [gbase@node1 gcinstall_95]$ cp gcChangeInfo.xml new_gcChangeInfo.xml [gbase@node1 gcinstall_95]$ [gbase@node1 gcinstall_95]$ cat gcChangeInfo.xml <?xml version="1.0" encoding="utf-8"?> [gbase@node1 gcinstall_95]$ [gbase@node1 gcinstall_95]$ cat new_gcChangeInfo.xml <?xml version="1.0" encoding="utf-8"?>
五、生成新distribution,并进行初始化
[gbase@node1 gcinstall_95]$ gcadmin distribution new_gcChangeInfo.xml
gcadmin generate distribution …
gcadmin generate distribution successful
[gbase@node1 gcinstall_95]$ gcadmin showdistribution
[gbase@node1 gcinstall_95]$ gccli -uroot
GBase client 9.5.2.23.119588. Copyright © 2004-2021, GBase. All Rights Reserved.
gbase> initnodedatamap;
Query OK, 0 rows affected, 5 warnings (Elapsed: 00:00:01.60)
gbase>
gbase> select index_name,isReplicate,data_distribution_id,vc_id from gbase.table_distribution;
±------------------------------±------------±---------------------±--------+
| index_name | isReplicate | data_distribution_id | vc_id |
±------------------------------±------------±---------------------±--------+
| gclusterdb.rebalancing_status | NO | 2 | vc00001 |
| gclusterdb.dual | YES | 2 | vc00001 |
| testdb.t1 | NO | 1 | vc00001 |
±------------------------------±------------±---------------------±--------+
3 rows in set (Elapsed: 00:00:00.00)
六、进行数据重分布
gbase> rebalance instance;
Query OK, 1 row affected (Elapsed: 00:00:01.18)
gbase>
gbase> select index_name,isReplicate,data_distribution_id,vc_id from gbase.table_distribution;
±------------------------------±------------±---------------------±--------+
| index_name | isReplicate | data_distribution_id | vc_id |
±------------------------------±------------±---------------------±--------+
| gclusterdb.rebalancing_status | NO | 2 | vc00001 |
| gclusterdb.dual | YES | 2 | vc00001 |
| testdb.t1 | NO | 2 | vc00001 |
±------------------------------±------------±---------------------±--------+
3 rows in set (Elapsed: 00:00:00.00)
gbase> exit
Bye
七、执行节点替换命令
[gbase@node1 gcinstall_95]$ /opt/gcinstall_95/replace.py --host=192.168.2.183 --type=data --dbaUser=gbase --dbaUserPwd=gbase --overwrite --sync_coordi_metadata_timeout=120 --parallel_pack=1
192.168.2.183
Are you sure to replace install these nodes ([Y,y]/[N,n])? y
Starting all gcluster nodes…
check ip start …
check ip end …
switch cluster mode into READONLY start …
wait all ddl statement stop …
all ddl statement stoped
switch cluster mode into READONLY end …
delete all fevent log on replace nodes start …
delete ddl event log on node 192.168.2.183 start
delete ddl event log on node 192.168.2.183 end
delete dml event log on node 192.168.2.183 start
delete dml event log on node 192.168.2.183 end
delete dml storage event log on node 192.168.2.183 start
delete dml storage event log on node 192.168.2.183 end
delete all fevent log on replace nodes end …
sync dataserver metedata begin …
copy script to data node begin
copy script to data node end
build data packet begin
build data packet end
copy data packet to target node begin
copy data packet to target node end
extract data packet begin
extract data packet end
sync dataserver metedata end, spend time 86607 ms …
create distribution begin …
restore node state start …
restore node state end …
create distribution end
replace nodes spend time: 119474 ms
synchronize data node metadata success
please rebalance instance then remove old distribution after rebalance complete success
Replace gcluster nodes successfully.
[gbase@node1 gcinstall_95]$ gccli -uroot
GBase client 9.5.2.23.119588. Copyright © 2004-2021, GBase. All Rights Reserved.
gbase> select index_name,isReplicate,data_distribution_id,vc_id from gbase.table_distribution;
±------------------------------±------------±---------------------±--------+
| index_name | isReplicate | data_distribution_id | vc_id |
±------------------------------±------------±---------------------±--------+
| gclusterdb.rebalancing_status | NO | 3 | vc00001 |
| testdb.t1 | NO | 2 | vc00001 |
| gclusterdb.dual | YES | 3 | vc00001 |
±------------------------------±------------±---------------------±--------+
3 rows in set (Elapsed: 00:00:00.00)
gbase> exit
Bye
[gbase@node1 gcinstall_95]$ gcadmin showdistribution
[gbase@node1 gcinstall_95]$ gccli -uroot
GBase client 9.5.2.23.119588. Copyright © 2004-2021, GBase. All Rights Reserved.
gbase> rebalance instance;
Query OK, 1 row affected (Elapsed: 00:00:01.13)
gbase> select index_name,isReplicate,data_distribution_id,vc_id from gbase.table_distribution;
±------------------------------±------------±---------------------±--------+
| index_name | isReplicate | data_distribution_id | vc_id |
±------------------------------±------------±---------------------±--------+
| gclusterdb.rebalancing_status | NO | 3 | vc00001 |
| testdb.t1 | NO | 2 | vc00001 |
| gclusterdb.dual | YES | 3 | vc00001 |
±------------------------------±------------±---------------------±--------+
3 rows in set (Elapsed: 00:00:00.00)
gbase>
gbase> select index_name,status,percentage,distribution_id from gclusterdb.rebalancing_status;
八、删除掉旧的nodedatamap和distribution
gbase> refreshnodedatamap drop 2;
Query OK, 0 rows affected, 5 warnings (Elapsed: 00:00:01.21)
gbase> exit
Bye
[gbase@node1 gcinstall_95]$ gcadmin rmdistribution 2
cluster distribution ID [2]
it will be removed now
please ensure this is ok, input [Y,y] or [N,n]: y
gcadmin remove distribution [2] success
[gbase@node1 gcinstall_95]$
[gbase@node1 gcinstall_95]$