1、错误信息
arm架构上部署velero实现灾备和迁移时发现resitc启动失败,报错信息如下
time="2022-11-14T13:19:17Z" level=info msg="Setting log-level to INFO"
time="2022-11-14T13:19:17Z" level=info msg="Starting Velero restic server v1.9.2 (82a100981cc66d119cf9b1d121f45c5c9dcf99e1-dirty)" logSource="pkg/cmd/cli/restic/server.go:87"
2022-11-14T13:19:17.870Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": ":8080"}
An error occurred: unexpected directory structure for host-pods volume, ensure that the host-pods volume corresponds to the pods subdirectory of the kubelet root directory
2、排查思路
1. 查看pod日志
- 错误信息如上错误信息
2. 初步怀疑可能是什么资源未正常添加
kubectl get cm,secret -nvelero
NAME TYPE DATA AGE
secret/cloud-credentials Opaque 1 57m
secret/default-token-wbts9 kubernetes.io/service-account-token 3 57m
secret/velero-restic-credentials Opaque 1 53m
secret/velero-token-z72s6 kubernetes.io/service-account-token 3 57m
- 发现未生成cm(这里是和我本地正常环境进行对比发现,不过原因不再此处)
创建cm
kubectl -nvelero create configmap kube-root-ca.crt --from-file=/etc/kubernetes/pki/ca.crt
创建的configmap如下
apiVersion: v1
items:
- apiVersion: v1
data:
ca.crt: |
-----BEGIN CERTIFICATE-----
# 这里是ca证书信息/etc/kubernetes/pki/ca.crt,可以直接使用cat查看到
-----END CERTIFICATE-----
kind: ConfigMap
metadata:
annotations:
kubernetes.io/description: Contains a CA bundle that can be used to verify the
kube-apiserver when using internal endpoints such as the internal service
IP or kubernetes.default.svc. No other usage is guaranteed across distributions
of Kubernetes clusters.
creationTimestamp: "2022-11-14T13:05:08Z"
name: kube-root-ca.crt
namespace: velero
resourceVersion: "xxx"
uid: xxxxxxxxxxxxxxxxx
kind: List
metadata:
resourceVersion: "xxxxx"
selfLink: "/api/v1/namespaces/velero/configmaps/kube-root-ca.crt"
- 这里肯定是没有解决掉(但是这一步也是必不可少的)
2. 根据报错信息估计怀疑并未挂载到正确的pods持久化目录位置
1. 默认目录位置
- docker:/var/lib/docker/containers(Docker Root Dir。使用
docker info
即可查看)- kubelet:/var/lib/kubelet/pods(–root-dir。可以使用
systemctl cat kubelet
查看)
- 发现这里kubelet的默认路径已被修改(生成环境下需要大容量估计一般都会被修改吧)
查看
/usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
配置文件如下
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS --node-ip=xxx.xxx.xxx.xxx --root-dir=/dcos/data/docker/kubelet --feature-gates=SupportPodPidsLimit=false,SupportNodePidsLimit=false
- 果断修改velero的
daemonset.apps/restic
中涉及的挂在路径
volumes:
- hostPath:
path: /dcos/data/docker/kubelet/pods
type: ""
- 默认路径
- 修改后
- 修改后服务正常启动
3、结论
- 其实可以根据错误日志可以很快定位问题所在,但是生产环境下一定要小心谨慎操作