本来头一天晚上还是正常的ceph存储集群,经历了一个不明真相的夜晚之后启动机器发现集群的健康检查状态已经是HEALTH_ERR了:
[root@node1 ~]# ceph -s
cluster 056c396d-639c-4312-9ea0-794c92e57329
health HEALTH_ERR
38 pgs are stuck inactive for more than 300 seconds
64 pgs degraded
38 pgs stuck inactive
26 pgs stuck unclean
64 pgs undersized
monmap e1: 3 mons at {node1=192.168.4.1:6789/0,node2=192.168.4.2:6789/0,node3=192.168.4.3:6789/0}
election epoch 16, quorum 0,1,2 node1,node2,node3
osdmap e53: 6 osds: 2 up, 2 in; 64 remapped pgs
flags sortbitwise
pgmap v122: 64 pgs, 1 pools, 0 bytes data, 0 objects
69636 kB used, 20389 MB / 20457 MB avail
38 undersized+degraded+peered
26 active+undersized+degraded
于是查看了集群内的机器中被使用于集群中的硬盘的所有者和所属组:
[root@node1 ~]# for host in node{1..3}; do ssh $hos