主因:因为维护服务器,有台机器被关掉很久,再次开启之后,查看,发现etcd和api-server都起不来。
查看etcd集群状态,发现有个节点连不上。
[root@master prometheus]# kubectl exec -it etcd-master2 -n kube-system bin/sh
# /usr/local/bin/etcdctl --ca-file=/etc/kubernetes/pki/etcd/ca.crt --cert-file=/etc/kubernetes/pki/etcd/server.crt --key-file=/etc/kubernetes/pki/etcd/server.key --endpoints=https://172.31.17.51:2379 cluster-health
cluster may be unhealthy: failed to list members
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 172.31.17.51:2379: connect: connection refused
error #0: dial tcp 172.31.17.51:2379: connect: connection refused
# /usr/local/bin/etcdctl --ca-file=/etc/kubernetes/pki/etcd/ca.crt --cert-file=/etc/kubernetes/pki/etcd/server.crt --key-file=/etc/kubernetes/pki/etcd/server.key --endpoints=https://172.31.17.56:2379 cluster-health member a7660c1c5ea85750 is healthy: got healthy result from https://172.31.17.56:2379
member bfee443ebe27f676 is healthy: got healthy result from https://172.31.17.57:2379
failed to check the health of member ca24dc1ff29d5b69 on https://172.31.17.51:2379: Get https://172.31.17.51:2379/health: dial tcp 172.31.17.51:2379: connect: connection refused
member ca24dc1ff29d5b69 is unreachable: [https://172.31.17.51:2379] are all unreachable
cluster is degraded
移除master节点然后再重新加入
1、先在有问题的master上运行kubeadm reset
,最好把iptables也清理一下iptables -F
2、去到另外的master节点上生成token
#生成token
[root@master2 ~]# kubeadm token create --print-join-command
kubeadm join 172.31.17.49:9443 --token kjjguy.pmqxvb1nmgf1nq4q --discovery-token-ca-cert-hash sha256:dcadd5b87024c304e5e396ba06d60a4dbf36509a627a6a949c126172e9c61cfb
#生成key
[root@master2 ~]# kubeadm init phase upload-certs --upload-certs
W0805 14:41:18.070434 16460 version.go:101] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://storage.googleapis.com/kubernetes-release/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
W0805 14:41:18.070565 16460 version.go:102] falling back to the local client version: v1.16.2
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
99f1e662cb82630fa41b3cab0d6f40f930af22ef2549b5539fa22dff8db1a2db
拿到上述内容之后,拼接;前面的token加上-control-plane --certificate-key ,在刚移除的节点上运行,重新加入集群。
kubeadm join 172.31.17.49:9443 --token kjjguy.pmqxvb1nmgf1nq4q --discovery-token-ca-cert-hash sha256:dcadd5b87024c304e5e396ba06d60a4dbf36509a627a6a949c126172e9c61cfb --control-plane --certificate-key 99f1e662cb82630fa41b3cab0d6f40f930af22ef2549b5539fa22dff8db1a2db