1.etcd集群启动失败
[root@K8S1 work]# systemctl start etcd
Job for etcd.service failed because the control process exited with error code.
See "systemctl status etcd.service" and "journalctl -xe" for details.
报错:
/data/k8s/bin/etcd --config-file=/data/k8s/etcd/etcd.config.yml
{"level":"warn","ts":1721638579.0741978,"caller":"fileutil/fileutil.go:57",
"msg":"check file permission","error":"directory \"/data/k8s/etcd/data\" exist,
but the permission is \"drwxr-xr-x\". The recommended permission is \"
-rwx------\" to prevent possible unprivileged access to the data"}
panic: freepages: failed to get all reachable pages
(page 3486174660924946226: out of bounds: 2630)
--报错:
Error: snapshot missing hash but --skip-hash-check=false
The etcd member01's error is failed to find database snapshot file (snap: snapshot file doesn't exist),
and the errors of member02 and member03 are same: freepages: failed to get all reachable pages.
2.检查ETCD的快照
[root@K8S1 snap]# cd /data/k8s/etcd/data
mv member ../ --备份一下不能启动的。
三个节点均将db文件移出到其他位置。
三个节点关闭 etcd服务。
systemctl stop etcd
--保持空文件目录
[root@K8S1 data]# ll
total 0
[root@K8S1 data]# pwd
/data/k8s/etcd/data
[root@K8S2 data]# ll
total 0
[root@K8S2 data]# pwd
/data/k8s/etcd/data
[root@K8S3 data]# pwd
/data/k8s/etcd/data
[root@K8S3 data]# ll
total 0
3.备份还原
--1号节点恢复。
/data/k8s/bin/etcdutl snapshot restore /data/etcd_backup/etcd12-backup-20240720.db \
--data-dir=/data/k8s/etcd/data \
--name K8S1 \
--initial-cluster "K8S1=https://192.168.1.12:2380,K8S2=https://192.168.1.13:2380,K8S3=https://192.168.1.14:2380" \
--initial-cluster-token etcd-k8s-cluster \
--initial-advertise-peer-urls https://192.168.1.12:2380
--2号节点恢复。
/data/k8s/bin/etcdutl snapshot restore /data/etcd_backup/etcd13-backup-20240720.db \
--data-dir=/data/k8s/etcd/data \
--name K8S2 \
--initial-cluster "K8S1=https://192.168.1.12:2380,K8S2=https://192.168.1.13:2380,K8S3=https://192.168.1.14:2380" \
--initial-cluster-token etcd-k8s-cluster \
--initial-advertise-peer-urls https://192.168.1.13:2380
--3号节点恢复。
/data/k8s/bin/etcdutl snapshot restore /data/etcd_backup/etcd14-backup-20240720.db \
--data-dir=/data/k8s/etcd/data \
--name K8S3 \
--initial-cluster "K8S1=https://192.168.1.12:2380,K8S2=https://192.168.1.13:2380,K8S3=https://192.168.1.14:2380" \
--initial-cluster-token etcd-k8s-cluster \
--initial-advertise-peer-urls https://192.168.1.14:2380
4.启动并查看ETCD状态
[root@K8S1 work]# ETCDCTL_API=3 /data/k8s/bin/etcdctl \
> -w table --cacert=/etc/kubernetes/cert/ca.pem \
> --cert=/etc/kubernetes/cert/etcd/etcd.pem \
> --key=/etc/kubernetes/cert/etcd/etcd-key.pem \
> --endpoints=${ETCD_ENDPOINTS} member list
+------------------+---------+------+---------------------------+---------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+------+---------------------------+---------------------------+------------+
| ac7e57d44f030e8 | started | K8S1 | https://192.168.1.12:2380 | https://192.168.1.12:2379 | false |
| 40ba37809e1a423f | started | K8S2 | https://192.168.1.13:2380 | https://192.168.1.13:2379 | false |
| ef1e393cbeded112 | started | K8S3 | https://192.168.1.14:2380 | https://192.168.1.14:2379 | false |
+------------------+---------+------+---------------------------+---------------------------+------------+