从tidb4.0开始,不在使用传统的ansible-playbook来管理集群节点,换成了更加抽象的tiup组件,官方文档里面也推荐使用tiup组件的方式去部署安装管理tidb集群:
先检查当前集群状态,使用命令tiup cluster display hshclu:
tiup cluster display hshclu
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster display hshclu
Cluster type: tidb
Cluster name: hshclu
Cluster version: v4.0.8
SSH type: builtin
Dashboard URL:
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
172.1.2.91:9093 alertmanager 172.1.2.91 9093/9094 linux/x86_64 Up /tidb/tidb-data/alertmanager-9093 /tidb/tidb-deploy/alertmanager-9093
172.1.2.91:3000 grafana 172.1.2.91 3000 linux/x86_64 Up - /tidb/tidb-deploy/grafana-3000
172.1.2.101:2379 pd 172.1.2.101 2379/2380 linux/x86_64 Up /tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379
172.1.2.102:2379 pd 172.1.2.102 2379/2380 linux/x86_64 Up /tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379
172.1.2.92:2379 pd 172.1.2.92 2379/2380 linux/x86_64 Up|L /tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379
172.1.2.93:2379 pd 172.1.2.93 2379/2380 linux/x86_64 Up|UI /tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379
172.1.2.94:2379 pd 172.1.2.94 2379/2380 linux/x86_64 Up /tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379
172.1.2.91:9090 prometheus 172.1.2.91 9090 linux/x86_64 Up /tidb/tidb-data/prometheus-9090 /tidb/tidb-deploy/prometheus-9090
172.1.2.103:4000 tidb 172.1.2.103 4000/10080 linux/x86_64 Up - /tidb/tidb-deploy/tidb-4000
172.1.2.95:4000 tidb 172.1.2.95 4000/10080 linux/x86_64 Up - /tidb/tidb-deploy/tidb-4000
172.1.2.96:4000 tidb 172.1.2.96 4000/10080 linux/x86_64 Up - /tidb/tidb-deploy/tidb-4000
172.1.2.97:4000 tidb 172.1.2.97 4000/10080 linux/x86_64 Up - /tidb/tidb-deploy/tidb-4000
172.1.2.91:9000 tiflash 172.1.2.91 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /tidb/tidb-data/tiflash-9000 /tidb/tidb-deploy/tiflash-9000
172.1.2.100:20160 tikv 172.1.2.100 20160/20180 linux/x86_64 Up /tidb/tidb-data/tikv-20160 /tidb/tidb-deploy/tikv-20160
172.1.2.104:20160 tikv 172.1.2.104 20160/20180 linux/x86_64 Up /tidb/tidb-data/tikv-20160 /tidb/tidb-deploy/tikv-20160
172.1.2.98:20160 tikv 172.1.2.98 20160/20180 linux/x86_64 Up /tidb/tidb-data/tikv-20160 /tidb/tidb-deploy/tikv-20160
172.1.2.99:20160 tikv 172.1.2.99 20160/20180 linux/x86_64 Up /tidb/tidb-data/tikv-20160 /tidb/tidb-deploy/tikv-20160
Total nodes: 17
删除tikv节点server,使用类似tiup cluster scale-in hshclu --node 172.1.2.104:20160的命令:
tiup cluster scale-in hshclu --node -y 172.1.2.104:20160
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster scale-in hshclu --node 172.1.2.104:20160
This operation will delete the 172.1.2.104:20160 nodes in `hshclu` and all their data.
Scale-in nodes...
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/hshclu/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/hshclu/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.100
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.92
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.93
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.94
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.101
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.102
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.98
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.99
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.97
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.104
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.95
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.96
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.103
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [ Serial ] - ClusterOperate: operation=ScaleInOperation, options={Roles:[] Nodes:[172.1.2.104:20160] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:false NativeSSH:false SSHType: CleanupData:false CleanupLog:false RetainDataRoles:[] RetainDataNodes:[]}
The component `tikv` will become tombstone, maybe exists in several minutes or hours, after that you can use the prune command to clean it
+ [ Serial ] - UpdateMeta: cluster=hshclu, deleted=`''`
+ [ Serial ] - UpdateTopology: cluster=hshclu
+ Refresh instance configs
- Regenerate config pd -> 172.1.2.92:2379 ... Done
- Regenerate config pd -> 172.1.2.93:2379 ... Done
- Regenerate config pd -> 172.1.2.94:2379 ... Done
- Regenerate config pd -> 172.1.2.101:2379 ... Done
- Regenerate config pd -> 172.1.2.102:2379 ... Done
- Regenerate config tikv -> 172.1.2.98:20160 ... Done
- Regenerate config tikv -> 172.1.2.99:20160 ... Done
- Regenerate config tikv -> 172.1.2.100:20160 ... Done
- Regenerate config tidb -> 172.1.2.95:4000 ... Done
- Regenerate config tidb -> 172.1.2.96:4000 ... Done
- Regenerate config tidb -> 172.1.2.97:4000 ... Done
- Regenerate config tidb -> 172.1.2.103:4000 ... Done
- Regenerate config tiflash -> 172.1.2.91:9000 ... Done
- Regenerate config prometheus -> 172.1.2.91:9090 ... Done
- Regenerate config grafana -> 172.1.2.91:3000 ... Done
- Regenerate config alertmanager -> 172.1.2.91:9093 ... Done
+ [ Serial ] - SystemCtl: host=172.1.2.91 action=reload prometheus-9090.service
Scaled cluster `hshclu` in successfully
删除pd节点服务器,使用命令tiup cluster scale-in hshclu --node -y 172.1.2.102:2379:
tiup cluster scale-in hshclu --node 172.1.2.101:2379
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster scale-in hshclu --node 172.1.2.101:2379
This operation will delete the 172.1.2.101:2379 nodes in `hshclu` and all their data.
Scale-in nodes...
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/hshclu/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/hshclu/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.92
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.93
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.94
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.101
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.102
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.98
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.99
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.100
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.104
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.95
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.96
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.97
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.103
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [ Serial ] - ClusterOperate: operation=ScaleInOperation, options={Roles:[] Nodes:[172.1.2.101:2379] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:false NativeSSH:false SSHType: CleanupData:false CleanupLog:false RetainDataRoles:[] RetainDataNodes:[]}
Stopping component pd
Stopping instance 172.1.2.101
Stop pd 172.1.2.101:2379 success
Destroying component pd
Destroying instance 172.1.2.101
Destroy 172.1.2.101 success
- Destroy pd paths: [/tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379/log /tidb/tidb-deploy/pd-2379 /etc/systemd/system/pd-2379.service]
Stopping component node_exporter
Stopping component blackbox_exporter
Destroying monitored 172.1.2.101
Destroying instance 172.1.2.101
Destroy monitored on 172.1.2.101 success
Delete public key 172.1.2.101
Delete public key 172.1.2.101 success
+ [ Serial ] - UpdateMeta: cluster=hshclu, deleted=`'172.1.2.101:2379'`
+ [ Serial ] - UpdateTopology: cluster=hshclu
+ Refresh instance configs
- Regenerate config pd -> 172.1.2.92:2379 ... Done
- Regenerate config pd -> 172.1.2.93:2379 ... Done
- Regenerate config pd -> 172.1.2.94:2379 ... Done
- Regenerate config pd -> 172.1.2.102:2379 ... Done
- Regenerate config tikv -> 172.1.2.98:20160 ... Done
- Regenerate config tikv -> 172.1.2.99:20160 ... Done
- Regenerate config tikv -> 172.1.2.100:20160 ... Done
- Regenerate config tikv -> 172.1.2.104:20160 ... Done
- Regenerate config tidb -> 172.1.2.95:4000 ... Done
- Regenerate config tidb -> 172.1.2.96:4000 ... Done
- Regenerate config tidb -> 172.1.2.97:4000 ... Done
- Regenerate config tidb -> 172.1.2.103:4000 ... Done
- Regenerate config tiflash -> 172.1.2.91:9000 ... Done
- Regenerate config prometheus -> 172.1.2.91:9090 ... Done
- Regenerate config grafana -> 172.1.2.91:3000 ... Done
- Regenerate config alertmanager -> 172.1.2.91:9093 ... Done
+ [ Serial ] - SystemCtl: host=172.1.2.91 action=reload prometheus-9090.service
Scaled cluster `hshclu` in successfully
tiup cluster scale-in hshclu --node -y 172.1.2.102:2379
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster scale-in hshclu --node 172.1.2.102:2379
This operation will delete the 172.1.2.102:2379 nodes in `hshclu` and all their data.
Scale-in nodes...
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/hshclu/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/hshclu/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.92
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.93
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.94
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.102
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.98
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.99
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.100
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.104
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.95
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.96
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.97
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.103
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [ Serial ] - ClusterOperate: operation=ScaleInOperation, options={Roles:[] Nodes:[172.1.2.102:2379] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:false NativeSSH:false SSHType: CleanupData:false CleanupLog:false RetainDataRoles:[] RetainDataNodes:[]}
Stopping component pd
Stopping instance 172.1.2.102
Stop pd 172.1.2.102:2379 success
Destroying component pd
Destroying instance 172.1.2.102
Destroy 172.1.2.102 success
- Destroy pd paths: [/tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379/log /tidb/tidb-deploy/pd-2379 /etc/systemd/system/pd-2379.service]
Stopping component node_exporter
Stopping component blackbox_exporter
Destroying monitored 172.1.2.102
Destroying instance 172.1.2.102
Destroy monitored on 172.1.2.102 success
Delete public key 172.1.2.102
Delete public key 172.1.2.102 success
+ [ Serial ] - UpdateMeta: cluster=hshclu, deleted=`'172.1.2.102:2379'`
+ [ Serial ] - UpdateTopology: cluster=hshclu
+ Refresh instance configs
- Regenerate config pd -> 172.1.2.92:2379 ... Done
- Regenerate config pd -> 172.1.2.93:2379 ... Done
- Regenerate config pd -> 172.1.2.94:2379 ... Done
- Regenerate config tikv -> 172.1.2.98:20160 ... Done
- Regenerate config tikv -> 172.1.2.99:20160 ... Done
- Regenerate config tikv -> 172.1.2.100:20160 ... Done
- Regenerate config tikv -> 172.1.2.104:20160 ... Done
- Regenerate config tidb -> 172.1.2.95:4000 ... Done
- Regenerate config tidb -> 172.1.2.96:4000 ... Done
- Regenerate config tidb -> 172.1.2.97:4000 ... Done
- Regenerate config tidb -> 172.1.2.103:4000 ... Done
- Regenerate config tiflash -> 172.1.2.91:9000 ... Done
- Regenerate config prometheus -> 172.1.2.91:9090 ... Done
- Regenerate config grafana -> 172.1.2.91:3000 ... Done
- Regenerate config alertmanager -> 172.1.2.91:9093 ... Done
+ [ Serial ] - SystemCtl: host=172.1.2.91 action=reload prometheus-9090.service
Scaled cluster `hshclu` in successfully
删除tidb节点服务器,也可以参考文章:https://blog.csdn.net/csdnhsh/article/details/115031982
tiup cluster scale-in hshclu --node -y 172.1.2.103:4000
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster scale-in hshclu --node 172.1.2.103:4000
This operation will delete the 172.1.2.103:4000 nodes in `hshclu` and all their data.
Scale-in nodes...
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/hshclu/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/hshclu/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.92
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.93
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.94
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.98
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.99
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.100
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.104
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.95
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.96
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.91
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.97
+ [Parallel] - UserSSH: user=tidb, host=172.1.2.103
+ [ Serial ] - ClusterOperate: operation=ScaleInOperation, options={Roles:[] Nodes:[172.1.2.103:4000] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:false NativeSSH:false SSHType: CleanupData:false CleanupLog:false RetainDataRoles:[] RetainDataNodes:[]}
Stopping component tidb
Stopping instance 172.1.2.103
Stop tidb 172.1.2.103:4000 success
Destroying component tidb
Destroying instance 172.1.2.103
Destroy 172.1.2.103 success
- Destroy tidb paths: [/tidb/tidb-deploy/tidb-4000 /etc/systemd/system/tidb-4000.service /tidb/tidb-deploy/tidb-4000/log]
Stopping component node_exporter
Stopping component blackbox_exporter
Destroying monitored 172.1.2.103
Destroying instance 172.1.2.103
Destroy monitored on 172.1.2.103 success
Delete public key 172.1.2.103
Delete public key 172.1.2.103 success
+ [ Serial ] - UpdateMeta: cluster=hshclu, deleted=`'172.1.2.103:4000'`
+ [ Serial ] - UpdateTopology: cluster=hshclu
+ Refresh instance configs
- Regenerate config pd -> 172.1.2.92:2379 ... Done
- Regenerate config pd -> 172.1.2.93:2379 ... Done
- Regenerate config pd -> 172.1.2.94:2379 ... Done
- Regenerate config tikv -> 172.1.2.98:20160 ... Done
- Regenerate config tikv -> 172.1.2.99:20160 ... Done
- Regenerate config tikv -> 172.1.2.100:20160 ... Done
- Regenerate config tikv -> 172.1.2.104:20160 ... Done
- Regenerate config tidb -> 172.1.2.95:4000 ... Done
- Regenerate config tidb -> 172.1.2.96:4000 ... Done
- Regenerate config tidb -> 172.1.2.97:4000 ... Done
- Regenerate config tiflash -> 172.1.2.91:9000 ... Done
- Regenerate config prometheus -> 172.1.2.91:9090 ... Done
- Regenerate config grafana -> 172.1.2.91:3000 ... Done
- Regenerate config alertmanager -> 172.1.2.91:9093 ... Done
+ [ Serial ] - SystemCtl: host=172.1.2.91 action=reload prometheus-9090.service
Scaled cluster `hshclu` in successfully
最后可以再次检查集群状态,还是使用之前的命令tiup cluster display hshclu:
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.3.2/tiup-cluster display hshclu
Cluster type: tidb
Cluster name: hshclu
Cluster version: v4.0.8
SSH type: builtin
Dashboard URL:
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
172.1.2.91:9093 alertmanager 172.1.2.91 9093/9094 linux/x86_64 Up /tidb/tidb-data/alertmanager-9093 /tidb/tidb-deploy/alertmanager-9093
172.1.2.91:3000 grafana 172.1.2.91 3000 linux/x86_64 Up - /tidb/tidb-deploy/grafana-3000
172.1.2.92:2379 pd 172.1.2.92 2379/2380 linux/x86_64 Up|L /tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379
172.1.2.93:2379 pd 172.1.2.93 2379/2380 linux/x86_64 Up|UI /tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379
172.1.2.94:2379 pd 172.1.2.94 2379/2380 linux/x86_64 Up /tidb/tidb-data/pd-2379 /tidb/tidb-deploy/pd-2379
172.1.2.91:9090 prometheus 172.1.2.91 9090 linux/x86_64 Up /tidb/tidb-data/prometheus-9090 /tidb/tidb-deploy/prometheus-9090
172.1.2.95:4000 tidb 172.1.2.95 4000/10080 linux/x86_64 Up - /tidb/tidb-deploy/tidb-4000
172.1.2.96:4000 tidb 172.1.2.96 4000/10080 linux/x86_64 Up - /tidb/tidb-deploy/tidb-4000
172.1.2.97:4000 tidb 172.1.2.97 4000/10080 linux/x86_64 Up - /tidb/tidb-deploy/tidb-4000
172.1.2.91:9000 tiflash 172.1.2.91 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /tidb/tidb-data/tiflash-9000 /tidb/tidb-deploy/tiflash-9000
172.1.2.100:20160 tikv 172.1.2.100 20160/20180 linux/x86_64 Up /tidb/tidb-data/tikv-20160 /tidb/tidb-deploy/tikv-20160
172.1.2.104:20160 tikv 172.1.2.104 20160/20180 linux/x86_64 Tombstone /tidb/tidb-data/tikv-20160 /tidb/tidb-deploy/tikv-20160
172.1.2.98:20160 tikv 172.1.2.98 20160/20180 linux/x86_64 Up /tidb/tidb-data/tikv-20160 /tidb/tidb-deploy/tikv-20160
172.1.2.99:20160 tikv 172.1.2.99 20160/20180 linux/x86_64 Up /tidb/tidb-data/tikv-20160 /tidb/tidb-deploy/tikv-20160
Total nodes: 14
There are some nodes can be pruned:
Nodes: [172.1.2.104:20160]
You can destroy them with the command: `tiup cluster prune hshclu`
大家是不是会有疑问,为什么会有一行Tombstone 呢?我重新执行了几遍tiup cluster display hshclu,这个Tombstone的记录还是顽强的存在,这个现象值得后续继续研究下。
172.1.2.104:20160 tikv 172.1.2.104 20160/20180 linux/x86_64 Tombstone /tidb/tidb-data/tikv-20160 /tidb/tidb-deploy/tikv-20160