不小心删除了node节点后如何重新加入集群

案发过程:本想给node01节点上的标签删除掉的,结果执行错了命令,导致node01节点消失!

[root@master test-yaml]# kubectl delete node node01 disk=ssd 
node "node01" deleted
Error from server (NotFound): nodes "disk=ssd" not found
[root@master test-yaml]# kubectl get node
NAME     STATUS   ROLES           AGE   VERSION
master   Ready    control-plane   13d   v1.30.0
node02   Ready    <none>          13d   v1.30.0

恢复过程

1. node01节点清理配置数据

# 1.先停止相关服务
[root@node01 ~]# systemctl stop kubelet 
[root@node01 ~]# systemctl stop docker
[root@node01 ~]# systemctl stop cri-docker

# 2.删除相关旧配置文件
[root@node01 ~]# rm -rf /var/lib/cni/
[root@node01 ~]# rm -rf /var/lib/kubelet/
[root@node01 ~]# rm -rf /etc/cni/
[root@node01 ~]# rm -rf /etc/kubernetes/

# 3.重新启动相关服务
[root@node01 ~]# systemctl start kubelet 
[root@node01 ~]# systemctl start docker
[root@node01 ~]# systemctl start cri-docker

2. master节点重新生成join token

[root@master ~]# kubeadm token create --print-join-command
kubeadm join 192.168.0.11:6443 --token 4dx9gu.sb95v5mqq3an77ns --discovery-token-ca-cert-hash sha256:1ff346f4ddd8de598cc6998148d2856b5c5aff4c5ba401796eb772b2c936057 

[root@master test-yaml]# kubeadm token list
TOKEN                     TTL         EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
4dx9gu.sb95v5mqq3an77ns   23h         2024-07-26T06:51:02Z   authentication,signing   <none>                                                     system:bootstrappers:kubeadm:default-node-token

3.在node01节点执行join命令

由于k8s版本1.30较高,需加上--cri-socket=unix:///var/run/cri-dockerd.sock命令

[root@node01 ~]# kubeadm join 192.168.0.11:6443 --token 30vo5d.q0jmlzkvzorx8drq --discovery-token-ca-cert-hash sha256:1ff346f4ddd8de598cc6998148d2856b5c5aff4c5ba401796eb772b2c9360571 --cri-socket=unix:///var/run/cri-dockerd.sock
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 501.078678ms
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

[root@node01 ~]# 

4.验证集群状态

[root@master ~]# kubectl get node -A
NAME     STATUS     ROLES           AGE   VERSION
master   Ready      control-plane   13d   v1.30.0
node01   NotReady   <none>          25s   v1.30.0
node02   Ready      <none>          13d   v1.30.0
[root@master ~]# kubectl get node -o wide 
NAME     STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                 CONTAINER-RUNTIME
master   Ready    control-plane   13d   v1.30.0   192.168.0.11   <none>        CentOS Linux 7 (Core)   3.10.0-1160.119.1.el7.x86_64   docker://26.1.4
node01   Ready    <none>          31s   v1.30.0   192.168.0.12   <none>        CentOS Linux 7 (Core)   3.10.0-1160.119.1.el7.x86_64   docker://26.1.4
node02   Ready    <none>          13d   v1.30.0   192.168.0.13   <none>        CentOS Linux 7 (Core)   3.10.0-1160.119.1.el7.x86_64   docker://26.1.4

您好,针对您的问题,可以通过以下步骤操作: 1. 首先查看故障节点的状态,可以使用以下命令检查: curl -XGET 'http://localhost:9200/_cat/nodes?v' 该命令将列出所有ES节点的信息,例如: ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 127.0.0.1 50 90 5 0.04 0.10 0.23 mdi * es-master-1 127.0.0.1 30 60 3 0.01 0.05 0.19 mdi - es-node-1 127.0.0.1 30 60 3 0.01 0.05 0.19 mdi - es-node-2 127.0.0.1 50 90 5 0.04 0.10 0.23 mdi - es-node-3 从上面的结果中可以看到,有三个节点处于正常状态,一个节点(state列未知)处于故障状态。 2. 接下来,您需要在故障节点上执行重启操作,可以使用以下命令: systemctl restart elasticsearch.service 这个命令将会重启ES服务。 3. 然后,您需要确保配置文件中的所有集群相关信息都是正确的,如下所示: cluster.name: your_cluster_name node.name: your_node_name network.host: your_host_ip #发现节点的地址信息,可以是多个 discovery.seed_hosts: ["192.168.10.10","192.168.10.11","192.168.10.12"] cluster.initial_master_nodes: ["node1","node2","node3"] 在上面的代码中,“discovery.seed_hosts”属性指定了节点地址列表,“cluster.initial_master_nodes”属性指定了初始主节点列表。确保这些信息与集群配置一致。 4. 最后,执行以下命令,加入集群: bin/elasticsearch 这个命令将启动您的ES节点并将其添加到集群中。 希望上述步骤能够解决您的问题,如果您有其他问题或疑问,请随时联系我。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值