环境:
kubernetes version 1.23.1
ceph version 14.2.22
问题一:
问题:
ceph状态health HEALTH_WARN
解决办法:
[root@ceph1 ceph]# ceph -s
cluster:
id: 4831a37b-30b6-4a5d-8732-dfc0a738d31e
health: HEALTH_WARN
mon is allowing insecure global_id reclaim
禁用不安全模式
[root@ceph1 ceph]# ceph config set mon auth_allow_insecure_global_id_reclaim false
[root@ceph1 ceph]# ceph -s
cluster:
id: 4831a37b-30b6-4a5d-8732-dfc0a738d31e
health: HEALTH_OK
问题二:
问题:
unexpected error getting claim reference: selfLink was empty, can’t make reference
问题说明:
Kubernetes1.20版本之后已经删除了selfLink
解决办法:
在/etc/kubernetes/manifests/kube-apiserver.yaml手动添加 selfLink
...
...
...
spec:
containers:
- command:
- kube-apiserver
- --feature-gates=RemoveSelfLink=false **## 手动添加**
- --advertise-address=10.24.230.11
...
...
...
退出保存之后,会自动应用yaml文件,等待pod重启完成之后即可。
问题三
问题:
问题说明:
rdb块存储的一些feature 低版本kernel不支持,需要disable。
解决办法:
通过如下命令disable:
[root@ceph1 ceph]# rbd info rbd/image1
rbd image 'image1':
size 1024 MB in 256 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.374d6b8b4567
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
[root@ceph1 ceph]# rbd feature disable rbd/image1 exclusive-lock object-map fast-diff deep-flatten
[root@ceph1 ceph]# rbd info rbd/image1
rbd image 'image1':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 374081928b1
block_name_prefix: rbd_data.374081928b1
format: 2
features: layering
op_features:
flags:
create_timestamp: Tue Sep 27 10:11:53 2022
access_timestamp: Tue Sep 27 10:11:53 2022
modify_timestamp: Tue Sep 27 10:11:53 2022
问题四
问题:
Failed to provision volume with StorageClass “ceph-rbd”: failed to get admin secret from [“default”/“ceph-secret”]: failed to get secret from [“default”/“ceph-secret”]: Cannot get secret of type kubernetes.io/rbd
问题说明:
k8s的controller获取ceph的admin secret失败了
解决办法:
由于我们创建的cepe-secret这个secret在default namespace下面,而controller在kube-system下面故没有权限获取,所以我们在kube-system下面创建cepe-secret,删除pvc和storageclass资源,然后更新storageclass配置之后重新创建storageclass和pvc资源
问题五
问题:
Failed to provision volume with StorageClass “ceph-rbd”: failed to create rbd image: executable file not found in $PATH, command output:
问题说明:
k8s使用stroageclass动态申请ceph存储资源的时候,需要controller-manager使用rbd命令去和ceph集群交互,而k8s的controller-manager使用的默认镜像k8s.gcr.io/kube-controller-manager中没有集成ceph的rbd客户端。官方建议我们使用外部的provisioner来解决这个问题。
解决办法:
[root@master ~]# git clone https://github.com/kubernetes-incubator/external-storage.git
[root@master ~]# cd external-storage/ceph/rbd/deploy
[root@master ~]# sed -r -i "s/namespace: [^ ]+/namespace: kube-system/g" ./rbac/clusterrolebinding.yaml ./rbac/rolebinding.yaml
[root@master ~]# kubectl -n kube-system apply -f ./rbac
[root@master ~]# kubectl describe deployments.apps -n kube-system rbd-provisioner
Name: rbd-provisioner
Namespace: kube-system
CreationTimestamp: Wed, 03 Jun 2020 18:59:14 +0800
Labels: <none>
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=rbd-provisioner
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: Recreate
MinReadySeconds: 0
Pod Template:
Labels: app=rbd-provisioner
Service Account: rbd-provisioner
Containers:
rbd-provisioner:
Image: quay.io/external_storage/rbd-provisioner:latest
Port: <none>
Host Port: <none>
Environment:
PROVISIONER_NAME: ceph.com/rbd
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: rbd-provisioner-c968dcb4b (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 6m5s deployment-controller Scaled up replica set rbd-provisioner-c968dcb4b to 1
修改storageclass的provisioner为我们新增加的provisioner:
[root@master ~]# vim sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
provisioner: ceph.com/rbd
parameters:
monitors: 172.18.2.172:6789,172.18.2.178:6789,172.18.2.189:6789
adminId: admin
adminSecretName: ceph-secret
adminSecretNamespace: kube-system
pool: kube
userId: kube
userSecretName: ceph-kube-secret
userSecretNamespace: default
fsType: ext4
imageFormat: "2"
imageFeatures: "layering"
[root@master ~]# kubectl delete pvc ceph-sc-claim
[root@master ~]# kubectl delete sc ceph-rbd
[root@master ~]# kubectl apply -f sc.yaml
[root@master ~]# kubectl apply -f pvc.yaml
问题六
问题:
pvc依旧是pending状态,查看log信息也不再报错了
问题说明:
rbd-provisioner镜像里的ceph-common的版本和物理机ceph集群的版本不一致导致
解决办法:
升级镜像里的ceph版本