本文记录了如何修复etcd集群中问题节点的方法。
- 检查节点监控状态:
etcdctl --endpoints=https://172.19.121.60:2379 \
--ca-file=/opt/kubernetes/ssl/ca.pem \
--cert-file=/opt/kubernetes/ssl/etcd.pem \
--key-file=/opt/kubernetes/ssl/etcd-key.pem cluster-health
- 返回结果如下:
member 179d106ba852032b is healthy: got healthy result from https://172.19.121.62:2379
member 1f4b42d355aaf7b9 is healthy: got healthy result from https://172.19.121.60:2379
member dcab8e2ed4e917fd is unreachable: no available published client urls
- 移除问题节点:
etcdctl --endpoints=https://172.19.121.60:2379 \
--ca-file=/opt/kubernetes/ssl/ca.pem \
--cert-file=/opt/kubernetes/ssl/etcd.pem \
--key-file=/opt/kubernetes/ssl/etcd-key.pem \
member remove dcab8e2ed4e917fd
- 检查并修改问题节点配置:
# vim /etc/etcd/etcd.conf
ETCD_INITIAL_CLUSTER_STATE="new"
修改为
ETCD_INITIAL_CLUSTER_STATE="existing"
- 删除etcd数据库:
rm -fr /var/lib/etcd/*
- 把节点加入集群:
- 不要加2379端口,2379是数据用的端口,要加2380,集群通讯端口,下面的端口是错的
etcdctl --endpoints=https://172.19.121.60:2379 --ca-file=/opt/kubernetes/ssl/ca.pem --cert-file=/opt/kubernetes/ssl/etcd.pem --key-file=/opt/kubernetes/ssl/etcd-key.pem member add etcd-node2 https://172.19.121.61:2380
- 启动该节点:
systemctl start etcd
- 再次检查集群状态:
etcdctl --endpoints=https://172.19.121.60:2379 \
--ca-file=/opt/kubernetes/ssl/ca.pem \
--cert-file=/opt/kubernetes/ssl/etcd.pem \
--key-file=/opt/kubernetes/ssl/etcd-key.pem cluster-health
- 返回以下即为正常:
member 179d106ba852032b is healthy: got healthy result from https://172.19.121.62:2379
member 1f4b42d355aaf7b9 is healthy: got healthy result from https://172.19.121.60:2379
member 793e602a959dabe5 is healthy: got healthy result from https://172.19.121.61:2379
cluster is healthy