1.查看最后几条日志信息
ceph log last cephadm
2.当daemon出现error,或是stop状态,可以使用以下命令启动
ceph orch daemon restart rgw.rgw.ceph3.sfepof
3.报错如下:
overall HEALTH_WARN 1 pools have many more objects per pg than average
ceph health detail #查看集群告警详情
#根据告警信息确定需调整PG数目的pool,然后执行如下,具体num可根据公式Total PGs=(OSDs * 100)/pool size(副本数) Nearest power of 2确定具体的值
ceph osd pool set <pool-name> pg_num 64
ceph osd pool set <pool-name> pgp_num 64
ceph osd pool set <pool-name> pg_num_min 64
4.集群告警
clock skew detected on mon.ceph-node2
1.检查各节点ntp时间同步
2.重启mon
systemctl restart ceph-e1ba1fb4-0b00-11ec-b24d-781dbacebe19@mon.ceph-node2.service
- osd显示down out
ceph orch daemon restart osd.<id>
6.集群告警1
HEALTH_WARN 1 clients failing to respond to cache pressure
处理办法:杀死问题客户端
$ ceph tell mds.0 session evict id=558067
7.集群告警2
[WRN] RECENT_CRASH: 2 daemons have recently crashed
mon.ceph-node1 crashed on host ceph-node1 at 2022-05-12T06:46:57.743001Z
mon.ceph-node1 crashed on host ceph-node1 at 2022-05-15T06:46:41.164404Z
处理办法:归档告警信息
$ ceph crash archive-all
7.集群告警2
HEALTH_ERR 3 scrub errors; Possible data damage: 3 pgs inconsistent
[ERR] OSD_SCRUB_ERRORS: 3 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 3 pgs inconsistent
pg 21.2f is active+clean+inconsistent, acting [7,15,11]
pg 22.6 is active+clean+scrubbing+deep+inconsistent+repair, acting [15,26,2]
pg 22.c is active+clean+scrubbing+deep+inconsistent+repair, acting [27,6,11]
处理办法:修改PG数据
$ ceph pg repair 21.2f