介绍
Velero工具提供了备份和还原 Kubernetes 集群资源和持久卷。可以通过云提供商或在本地运行Velero。Velero 支持以下特性:
备份集群并提供灾难恢复。
将集群资源迁移到其他集群。
将生产集群复制到开发和测试集群。
Velero包括:
一个在集群中运行的服务端
在本地运行的命令行客户端
可以备份或恢复集群中的所有对象,也可以按类型、命名空间和/或标签过滤对象
1 安装velero
1.1 安装对象存储
[root@slave002 ~]# mkdir /opt/minio
[root@slave002 ~]# docker run -d\
-p 9000:9000 \
-p 9001:9001 \
--restart=always \
--name minio \
-v /opt/minio:/data \
-e "MINIO_ROOT_USER=root" \
-e "MINIO_ROOT_PASSWORD=tenxcloud" \
quay.io/minio/minio:RELEASE.2021-10-23T03-28-24Z server /data --console-address ":9001"
[root@slave002 ~]# docker ps -a | grep mini
69c874a2bf1a 172.22.44.20/system_containers/minio:RELEASE.2021-10-23T03-28-24Z "/usr/bin/docker-ent…" 14 seconds ago Up 6 seconds 0.0.0.0:9000-9001->9000-9001/tcp, :::9000-9001->9000-9001/tcp minio
登录minio控制台,并创建velero要使用的bucket:
1.2 安装velero客户端
客户端一般就安装在你部署velero服务端的节点上:
[root@master velero]# cd /root/velero
[root@master velero]# wget https://github.com/vmware-tanzu/velero/releases/download/v1.8.1/velero-v1.8.1-linux-amd64.tar.gz
[root@master velero]# tar -xvf velero-v1.8.1-linux-amd64.tar.gz
[root@master velero]# cp velero-v1.8.1-linux-amd64/velero /usr/local/bin/velero
[root@master velero]# chmod +x /usr/local/bin/velero
[root@master velero]# velero version
Client:
Version: v1.8.1
Git commit: 18ee078dffd9345df610e0ca9f61b31124e93f50
Server:
Version: v1.8.1
1.3 安装velero服务端
在进行安装之前,可以提前准备镜像:
[root@master velero]# docker pull velero/velero-plugin-for-aws:v1.2.1
[root@master velero]# docker pull velero/velero:v1.8.1
# 第一步,准备minio的密钥文件
[root@master01 velero]# cat credentials-velero
[default]
aws_access_key_id = root # MinIO 的用户名
aws_secret_access_key = tenxcloud # MinIO 的密码
# 第二步,开始安装velero服务端
[root@master velero]# velero install --use-restic \
> --provider aws \
> --plugins velero/velero-plugin-for-aws:v1.2.1 \
> --bucket velero \
> --secret-file ./credentials-velero \
> --use-volume-snapshots=false \
> --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://172.22.44.23:9000
[root@master velero]# kubectl get po -n velero -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
restic-25kgk 1/1 Running 0 6m1s 172.31.122.212 slave001 <none> <none>
restic-27ntn 1/1 Running 0 6m1s 172.31.193.119 slave005 <none> <none>
restic-l94rk 1/1 Running 0 6m1s 172.31.111.21 slave002 <none> <none>
restic-zckjk 1/1 Running 0 6m1s 172.31.234.238 slave003 <none> <none>
velero-78fbc48cf6-glnx5 1/1 Running 0 6m2s 172.31.111.19 slave002 <none> <none>
注意,我这里是在master节点上执行的velero install 安装命令,你也可以在其他节点上执行安装,但要提前准备好kubeconfig文件,因为velero 命令默认读取 kubectl 配置的集群上下文(使用kubectl命令时默认使用~/.kube/config这个kubeconfig文件来访问k8s集群),或者通过–kubeconfig来指定其他路径下的kubeconfig文件。
2 备份演示
建议:
做velero的备份恢复演示之前,建议先使用etcdctl备份一下etcd的全量数据,防止后续集群出现问题(以防万一):
[root@master ~]# cd /root/etcd-bak/
[root@master etcd-bak]# ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key snapshot save snapshotdb.db
[root@master etcd-bak]# ls -ltrh
总用量 76M
-rw------- 1 root root 76M 11月 28 22:43 snapshotdb.db
2.1 备份某个ns
2.1.1 执行备份
[root@master velero]# velero backup create back-test-velerobak --include-namespaces=back-test
Backup request "back-test-velerobak" submitted successfully.
Run `velero backup describe back-test-velerobak` or `velero backup logs back-test-velerobak` for more details.
2.1.2 查看备份数据
[root@master velero]# velero get backup
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
back-test-velerobak Completed 0 0 2023-11-28 23:42:50 +0800 CST 29d default <none>
再次查看minio控制台,可以看到已经有数据了:
2.1.3 故障模拟
[root@master velero]# kubectl delete deploy --all -nback-test
deployment.apps "nginx-kjcpx" deleted
[root@master velero]#
[root@master velero]# kubectl get deploy,po -nback-test
No resources found in back-test namespace.
[root@master velero]#
2.1.4 恢复数据
[root@master velero]# velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
back-test-velerobak Completed 0 0 2023-11-28 23:42:50 +0800 CST 29d default <none>
# 从back-test-velerobak这个备份中恢复
[root@master velero]# velero restore create --from-backup back-test-velerobak --wait
Restore request "back-test-velerobak-20231128235020" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore will continue in the background.
...............
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe back-test-velerobak-20231128235020` and `velero restore logs back-test-velerobak-20231128235020`.
#可以看到,数据已经恢复成功
[root@master velero]# kubectl get deploy,po -nback-test
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-kjcpx 1/1 1 1 78s
NAME READY STATUS RESTARTS AGE
pod/nginx-kjcpx-fc8f458-m4mjs 1/1 Running 0 78s
2.2 备份整个集群
2.2.1 执行备份
# 备份整个集群的话,需要的时间长一些,等velero backup get可以看到Completed了,说明备份完成
[root@master velero]# velero backup create k8s-all
Backup request "k8s-all" submitted successfully.
Run `velero backup describe k8s-all` or `velero backup logs k8s-all` for more details.
[root@master velero]#
2.2.2 查看备份数据
[root@master velero]# velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
back-test-velerobak Completed 0 0 2023-11-28 23:42:50 +0800 CST 29d default <none>
k8s-all Completed 0 0 2023-11-29 00:03:43 +0800 CST 29d default <none>
查看minio控制台:
2.2.3 故障模以
# 现在,我们将back-test删除掉
[root@master velero]# kubectl delete deploy,svc --all -nback-test
deployment.apps "nginx-kjcpx" deleted
service "nginx" deleted
[root@master velero]#
[root@master velero]# kubectl get all -nback-test
No resources found in back-test namespace.
[root@master velero]#
[root@master velero]#
[root@master velero]# kubectl delete ns back-test
namespace "back-test" deleted
2.2.4 恢复数据
# 开始恢复数据,可能需要时间比较长,等 velero restore get看到Completed说明恢复完成
[root@master velero]# velero restore create --from-backup k8s-all
Restore request "k8s-all-20231129001500" submitted successfully.
Run `velero restore describe k8s-all-20231129001500` or `velero restore logs k8s-all-20231129001500` for more details.
[root@master velero]#
[root@master velero]# velero restore get
NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
back-test-velerobak-20231128235020 back-test-velerobak Completed 2023-11-28 23:50:20 +0800 CST 2023-11-28 23:50:35 +0800 CST 0 3 2023-11-28 23:50:20 +0800 CST <none>
k8s-all-20231129001500 k8s-all Completed 2023-11-29 00:15:00 +0800 CST <nil> 0 0 2023-11-29 00:15:00 +0800 CST <none>
[root@master velero]#
# 可以看到,数据已经恢复成功
[root@master velero]# kubectl get all -nback-test
NAME READY STATUS RESTARTS AGE
pod/nginx-kjcpx-fc8f458-m4mjs 1/1 Running 0 6m6s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx ClusterIP 10.99.247.107 <none> 80/TCP 3m19s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-kjcpx 1/1 1 1 4m25s
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-kjcpx-fc8f458 1 1 1 5m31s
[root@master velero]#
3 Velero和restic的关系 - - - 待整理
参考:
https://zhuanlan.zhihu.com/p/590435402
Velero 支持使用称为 restic 的免费开源备份工具备份和还原 Kubernetes 卷。 此支持被视为 Beta 质量。 请查看 限制 列表,以了解它是否适合您的用例。
添加了 Restic 集成,为您提供了一个现成的解决方案,用于备份和还原几乎任何类型的 Kubernetes 卷。 这种集成是 Velero 功能的补充,而不是现有功能的替代。 但是,如果您需要为存储平台使用卷快照插件,或者使用的是 EFS,AzureFile,NFS,emptyDir,local 或任何其他没有本机快照概念的卷类型,restic 可能适合您 。
Restic 并不局限于特定的存储平台,这意味着该集成还为将来实现跨卷类型数据迁移的工作铺平了道路。
注意:
不支持 hostPath 卷,但是支持 本地卷类型。
安装 restic
4 问题记录
3.1 全量恢复的时候报错PartiallyFailed
其实,在上面全量恢复的时候报错了(验证了被删除的数据,也恢复过来了):
具体信息:
[root@master velero]# velero restore get
NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
back-test-velerobak-20231128235020 back-test-velerobak Completed 2023-11-28 23:50:20 +0800 CST 2023-11-28 23:50:35 +0800 CST 0 3 2023-11-28 23:50:20 +0800 CST <none>
k8s-all-20231129001500 k8s-all PartiallyFailed 2023-11-29 00:15:00 +0800 CST 2023-11-29 00:19:23 +0800 CST 10 463 2023-11-29 00:15:00 +0800 CST <none>
[root@master velero]#
# 查看详细信息
[root@master velero]# velero restore describe k8s-all-20231129001500
Name: k8s-all-20231129001500
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: PartiallyFailed (run 'velero restore logs k8s-all-20231129001500' for more information)
Total items to be restored: 2420
Items restored: 2420
Started: 2023-11-29 00:15:00 +0800 CST
Completed: 2023-11-29 00:19:23 +0800 CST
Warnings:
Velero: <none>
Cluster: could not restore, CustomResourceDefinition "approvals.tmf.tenxcloud.com" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, CustomResourceDefinition "arthas.tmf.tenxcloud.com" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, CustomResourceDefinition "installplans.operators.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, CustomResourceDefinition "instances.daas.tenxcloud.com" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, CustomResourceDefinition "issuers.cert-manager.io" already exists. Warning: the in-cluster version is different than the backed-up version.
could not restore, CustomResourceDefinition "istioinstalls.mesh.t7d.io" already exists. Warning: the in-cluster version is different than the backed-up version.
······此处省略······
Errors:
Velero: <none>
Cluster: <none>
Namespaces:
demo: error restoring pods/demo/demo-2048-izuym-696bd5fdc4-fdpcx: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
error restoring pods/demo/dev-1-cxwew-6b44d9bb8-zjqvb: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
error restoring pods/demo/dev-2-eqcwm-8476ddb95d-h2xlz: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
error restoring pods/demo/java-2048-okgrb-86bbfc679d-8cpgd: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
error restoring pods/demo/nginx-b-cabml-757cfcc554-79z5k: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
error restoring pods/demo/nginx-c-ozeyu-77f77fdc6d-8c86n: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
error restoring pods/demo/nginx-mkzwo-79798b56c-b4pk2: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
error restoring pods/demo/nignx-a-warxw-79ccb87d9c-fwx8m: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
error restoring pods/demo/tmf-a-sycdz-59f9678c8b-dqhvp: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
error restoring pods/demo/tmf-b-fswcr-5cd7b8c54f-t55qz: Internal error occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container: v1.Container.Resources: v1.ResourceRequirements.Limits: unmarshalerDecoder: quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$', error found in #10 byte of ...|ormat":""},"memory":|..., bigger context ...|additionalProperties":{},"amount":"2","format":""},"memory":{"additionalProperties":{},"amount":"2",|...
Backup: k8s-all
Namespaces:
Included: all namespaces found in the backup
Excluded: <none>
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: auto
Preserve Service NodePorts: auto
[root@master velero]#
看来,velero备份数据的时候,要考虑一些环境、服务配置等因素的,不是一定就能备份、恢复成功。
那本次遇到的问题,到底是什么原因呢?
因为我在项目中开启了资源配额(应用包、yaml编排文件、helm包等,是使用厂商容器云控制台配置的),应该是由于这些的参数不符合规范,所以导致的报错。
报错中的关键字:
error restoring pods/demo/tmf-b-fswcr-5cd7b8c54f-t55qz: Internal error
occurred: v1.Pod.Spec: v1.PodSpec.InitContainers: []v1.Container:
v1.Container.Resources: v1.ResourceRequirements.Limits:
unmarshalerDecoder: quantities must match the regular expression
‘^([±]?[0-9.]+)([eEinumkKMGTP][-+]?[0-9])$’, error found in #10
byte of …|ormat":“”},“memory”😐…, bigger context
…|additionalProperties":{},“amount”:“2”,“format”:“”},“memory”:{“additionalProperties”:{},“amount”:“2”,|…