kubernets将一个master控制节点踢出后重新加入集群发现报错
failed to dial endpoint https://10.0.1.81:2379 with maintenance client: context deadline exceeded
etcd cluster is not healthy
报错结果显示etcd集群运行不正常,其实在kubeadm join加入集群时打开 -v=5 选项,提高日志等级就可以发现,提示刚刚踢出的master节点etcd无法连接。原因是将master节点踢出后,etcd集群并未将其踢出etcd集群,需要我们手动进行移除。
下载etcdctl命令行工具
export RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4)
wget https://github.com/etcd-io/etcd/releases/download/${RELEASE}/etcd-${RELEASE}-linux-amd64.tar.gz
tar xvf etcd-${RELEASE}-linux-amd64.tar.gz
cd etcd-${RELEASE}-linux-amd64
sudo mv etcd etcdctl etcdutl /usr/local/bin
随便登陆一台现有的master机器并查看现有的etcd集群列表,果然其中还有那台已经退出集群的服务器10.0.1.81:2379
etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
12637f5ec2bd02b8, started, blog-k8s-n0, https://10.0.1.81:2380, https://10.0.1.81:2379, false
17d58f8d29164d23, started, k8s-master1, https://10.0.1.32:2380, https://10.0.1.32:2379, false
19441808830db070, started, k8s-master2, https://10.0.1.33:2380, https://10.0.1.33:2379, false
现在将这台服务器的etcd从集群中移除
etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 12637f5ec2bd02b8
移除之后,新节点就可以正常加入集群了