某运营商的Kubernetes项目物理机停机维护,重启后Kubernetes部分pod无法挂载PVC,请求超时。该Kubernetes集群的后端存储使用ceph rbd块存储,检查ceph集群状态异常。
[root@ceph-node01~]# ceph -s
cluster:
id: 3c09c565-7421-411d-b9e0-5a370967556f
health: HEALTH_WARN
Reduced data availability: 12 pgs inactive, 12 pgs incomplete
12 slow ops, oldest one blocked for 5553 sec, daemons [osd.0,osd.3] have slow ops.
services:
mon: 3 daemons, quorum ceph-node01,ceph-node02,ceph-node03
mgr: mon_mgr(active)
osd: 9 osds: 9 up, 9 in
data:
pools: 3 pools, 512 pgs
objects: 64.26 k objects, 561 GiB
usage: 3.17 TiB used, 5.61 TiB / 8.78 TiB avail
pgs: 9.375% pgs not active
116 active+clean
12 incomplete
io:
client: 5.57MiB/s rd, 144KiB/s wr, 346op/s rd, 15op/s wr
集群检查提示PG_AVAILABILITY(数据可用性降低),这说明群集无法满足群集中某些数据的潜在读取或写入请求。即有12个PG处于不允许为IO请求提供服务的状态。
PG检查
上面查看集群状态时,也可以发现pgs出现incomplete状态,执行health查看错误帮助。