etcd灾难恢复
环境说明:
组件 | 版本 |
---|---|
kubelet | v1.25.2 |
kubectl | v1.25.2 |
kubeadm | v1.25.2 |
containerd | 1.6.12 |
etcd | 3.5.5-0 |
节点信息:
root@k8s-worker-2:/etc/kubernetes# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-worker-2 Ready control-plane 22h v1.25.2 100.86.40.215 <none> Ubuntu 18.04.6 LTS 4.15.0-175-generic containerd://1.6.12
1、创建一个deployment
资源
root@k8s-worker-2:~# kubectl create deployment nginx --image=nginx --replicas=3
deployment.apps/nginx created
2、创建etcd快照
root@k8s-worker-2:/etc/kubernetes# kubectl exec -it -n kube-system etcd-k8s-worker-2 -- etcdctl --endpoints 100.86.40.215:2379 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt snapshot save /etc/kubernetes/pki/etcd/etcd-snapshot-2022-12-13.db
{"level":"info","ts":"2022-12-13T08:19:08.196Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/etc/kubernetes/pki/etcd/etcd-snapshot-2022-12-13.db.part"}
{"level":"info","ts":"2022-12-13T08:19:08.209Z","logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2022-12-13T08:19:08.210Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"100.86.40.215:2379"}
{"level":"info","ts":"2022-12-13T08:19:08.645Z","logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2022-12-13T08:19:08.882Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"100.86.40.215:2379","size":"3.1 MB","took":"now"}
{"level":"info","ts":"2022-12-13T08:19:08.883Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/etc/kubernetes/pki/etcd/etcd-snapshot-2022-12-13.db"}
Snapshot saved at /etc/kubernetes/pki/etcd/etcd-snapshot-2022-12-13.db
3、查看快照信息
root@k8s-worker-2:/etc/kubernetes# kubectl exec -it -n kube-system etcd-k8s-worker-2 -- etcdctl --endpoints 100.86.40.215:2379 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt --write-out=table snapshot status /etc/kubernetes/pki/etcd/etcd-snapshot-2022-12-13.db
Deprecated: Use `etcdutl snapshot status` instead.
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 2d9223fc | 96937 | 865 | 3.1 MB |
+----------+----------+------------+------------+
4、删除deployment资源,模拟故障
root@k8s-worker-2:~# kubectl delete deployment nginx
deployment.apps "nginx" deleted
5、将etcdctl
命令cp
到本地
-
kubectl
命令无法将etcdctl
命令直接cp
到本地 -
crictl
没有cp
命令 -
所以我们采用将
etcdctl
复制到容器和本机映射的持久目录下
root@k8s-worker-2:/etc/kubernetes# kubectl exec -it -n kube-system etcd-k8s-worker-2 -- cp /usr/local/bin/etcdctl /etc/kubernetes/pki/etcd/etcdctl
6、备份 /var/lib/etcd
目录
root@k8s-worker-2:/etc/kubernetes# cp -r /var/lib/etcd{,-bak}
7、进行etcd
恢复
7.1、停止etcd
容器
-
只需要将
/etc/kubernetes/manifests/etcd.yaml
文件挪到其他目录即可
root@k8s-worker-2:/etc/kubernetes# mv /etc/kubernetes/manifests/etcd.yaml /etc/kubernetes/
7.2、查看当前运行的容器
-
可以看到
etcd
容器已经不存在了
root@k8s-worker-2:/etc/kubernetes# crictl ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD
ca08d1e0ad430 dbfceb93c69b6 4 minutes ago Exited kube-controller-manager 6 7cf5296b00224 kube-controller-manager-k8s-worker-2
ead3815c53774 ca0ea1ee3cfd3 4 minutes ago Exited kube-scheduler 6 729cade34ff32 kube-scheduler-k8s-worker-2
23d89007b20d7 97801f8394908 5 minutes ago Running kube-apiserver 3 029e8c27b42a6 kube-apiserver-k8s-worker-2
f127263ddcd4b 97801f8394908 15 minutes ago Exited kube-apiserver 2 029e8c27b42a6 kube-apiserver-k8s-worker-2
b7d2a00d9af95 a4ca41631cc7a 21 hours ago Running coredns 0 ea982d524385b coredns-57977755fc-klbhm
d2f7468f46360 a4ca41631cc7a 21 hours ago Running coredns 0 d6d1afbe109f5 coredns-57977755fc-c4w9h
215f24dd4054b 1c7d8c51823b5 21 hours ago Running kube-proxy 0 390a643fed7c0 kube-proxy-fc8dx
bc90141515308 b5c6c9203f83e 22 hours ago Running kube-flannel 0 666ea113a9d54 kube-flannel-ds-9lzvv
950b59a74b1b8 b5c6c9203f83e 22 hours ago Exited install-cni 0 666ea113a9d54 kube-flannel-ds-9lzvv
9c31b2bcc182c fcecffc7ad4af 22 hours ago Exited install-cni-plugin 0 666ea113a9d54 kube-flannel-ds-9lzvv
7.3、删除/var/lib/etcd
目录
root@k8s-worker-2:/etc/kubernetes# rm -rf /var/lib/etcd
7.4、恢复快照
-
参数都可以在
/etc/kubernetes/manifests/etcd.yaml
中找到当前环境的信息。
root@k8s-worker-2:/etc/kubernetes# /etc/kubernetes/pki/etcd/etcdctl --endpoints 100.86.40.215:2379 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt snapshot restore /etc/kubernetes/pki/etcd/etcd-snapshot-2022-12-13.db --name=k8s-worker-2 --initial-cluster=k8s-worker-2=https://100.86.40.215:2380 --initial-advertise-peer-urls=https://100.86.40.215:2380 --data-dir=/var/lib/etcd
Deprecated: Use `etcdutl snapshot restore` instead.
2022-12-13T15:43:18+08:00 info snapshot/v3_snapshot.go:248 restoring snapshot {"path": "/etc/kubernetes/pki/etcd/etcd-snapshot-2022-12-13.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:254\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:129\ngithub.com/spf13/cobra.(*Command).execute\n\t/usr/local/google/home/siarkowicz/.gvm/pkgsets/go1.16.15/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/usr/local/google/home/siarkowicz/.gvm/pkgsets/go1.16.15/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/usr/local/google/home/siarkowicz/.gvm/pkgsets/go1.16.15/global/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.5/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/usr/local/google/home/siarkowicz/.gvm/gos/go1.16.15/src/runtime/proc.go:225"}
2022-12-13T15:43:18+08:00 info membership/store.go:141 Trimming membership information from the backend...
2022-12-13T15:43:18+08:00 info membership/cluster.go:421 added member {"cluster-id": "c74a0c530bf911ab", "local-member-id": "0", "added-peer-id": "a62f41442cd82a29", "added-peer-peer-urls": ["https://100.86.40.215:2380"]}
2022-12-13T15:43:18+08:00 info snapshot/v3_snapshot.go:269 restored snapshot {"path": "/etc/kubernetes/pki/etcd/etcd-snapshot-2022-12-13.db", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap"}
7.5、查看集群恢复状态
-
稍等片刻后,发现之前删除的
deplayment
已经恢复正常
root@k8s-worker-2:/etc/kubernetes# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-8bb445cbd-6k4dl 1/1 Running 0 41m
default nginx-8bb445cbd-nzrgx 1/1 Running 0 41m
default nginx-8bb445cbd-vskcl 1/1 Running 0 42m
kube-flannel kube-flannel-ds-9lzvv 1/1 Running 0 22h
kube-system coredns-57977755fc-c4w9h 1/1 Running 0 21h
kube-system coredns-57977755fc-klbhm 1/1 Running 0 21h
kube-system etcd-k8s-worker-2 1/1 Running 0 21h
kube-system kube-apiserver-k8s-worker-2 0/1 Running 4 (15s ago) 21h
kube-system kube-controller-manager-k8s-worker-2 1/1 Running 7 (3m59s ago) 21h
kube-system kube-proxy-fc8dx 1/1 Running 0 21h
kube-system kube-scheduler-k8s-worker-2 1/1 Running 7 (4m ago) 21h