问题:
当我们升级或者处理某个master节点服务器时,我们将k8s-master02节点剔除处理完问题后,重新加入集群报如下错误:
[check-etcd] Checking that the etcd cluster is healthy error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://192.168.XXX.XXX:2379 with maintenance client: context deadline exceeded To see the stack trace of this error execute with --v=5 or higher
解决方法:
在现有集群master节点上,执行查看当前master节点etcd是否有k8s-master02相关的etcd
$ kubectl get pods -n kube-system | grep etcd
etcd-k8s-master01 1/1 Running 0
etcd-k8s-master03 1/1 Running 0
经查看没有k8s-master02相关的etcd pod
我们进入其中一个etcd
$ kubectl exec -it etcd-k8s-master01 sh -n kube-system
进入pod容器后,执行如下指令
#设置etcd版本
$ export ETCDCTL_API=3
#查看 etcd 集群成员列表
$ etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" member list
5c464663bfe0fb08, started, k8s-master01, https://192.168.XX.X1:2380, https://192.168.XX.X1:2379, false
8164c6e8e41efd3d, started, k8s-master02, https://192.168.XX.X2:2380, https://192.168.XX.X2:2379, false
bd53c1cba61d0cb6, started, k8s-master03, https://192.168.XX.X3:2380, https://192.168.XX.X3:2379, false
# 删除 etcd 集群成员 k8s-master02
$ etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" member remove 8164c6e8e41efd3d
#提示如下内容则说明删除成功
Member 8164c6e8e41efd3d removed from cluster ee7981bace12ae411
# 再次查看 etcd 集群成员列表
$ etcdctl --cacert="/etc/kubernetes/pki/etcd/ca.crt" --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key" member list
5c464663bfe0fb08, started, k8s-master01, https://192.168.XX.X1:2380, https://192.168.XX.X1:2379, false
bd53c1cba61d0cb6, started, k8s-master03, https://192.168.XX.X3:2380, https://192.168.XX.X3:2379, false
# 退出容器
$ exit
然后重新回到k8s-master02节点
执行如下指令
$kubeadm reset -f
再执行加入集群的的token
kubeadm join k8sapi-proxy:6443 --token rr2un6.y9aejpnotfnram5e --discovery-token-ca-cert-hash sha256:a42ec9fe9539b68dd8776bba62575c5aafae7e6c19d353c \
--control-plane --certificate-key 81c969ce86347970c94799880b900c36aa95068fff668aa18441ac2
join之后,显示如下消息,就说明添加成功
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.
To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.
根据提示执行这三个
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
再任何一个master节点上使用命令
$kubectl get nodes
即可查看全部的节点:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 57m v1.18.0
k8s-master1 Ready master 21m v1.18.0
k8s-master2 Ready master 21m v1.18.0
k8s-node1 Ready <none> 56m v1.18.0
k8s-node2 Ready <none> 56m v1.18.0
k8s-node3 Ready <none> 56m v1.18.0
k8s-node4 Ready <none> 56m v1.18.0
k8s-node5 Ready <none> 56m v1.18.0
这样子就成功添加好master节点了哈