在k8s中运行的rook一直还好。但突然出现如下错误
[rook@rook-ceph-tools-5db564c5c5-r899b /]$ ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average
先尝试通过扩大pg数量解决
# 关闭自动扩展
ceph osd pool set myfs-data0 pg_autoscale_mode off
# 增加pg及pgp数量
ceph osd pool set myfs-data0 pg_num 128
ceph osd pool set myfs-data0 pgp_num 128
# 查看df结果不满意,问题依旧
[rook@rook-ceph-tools-5db564c5c5-r899b /]$ ceph df
--- RAW STORAGE ---
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.8 TiB 1.4 TiB 440 GiB 443 GiB 23.78
ssd 7.3 TiB 5.5 TiB 1.7 TiB 1.8 TiB 24.28
TOTAL 9.1 TiB 6.9 TiB 2.2 TiB 2.2 TiB 24.18
--- POOLS ---
POOL ID STORED OBJECTS USED %USED MAX AVAIL
device_health_metrics 1 1.3 MiB 13 4.0 MiB 0 1.8 TiB
replicapool 2 626 GiB 162.40k 1.8 TiB 24.91 1.8 TiB
myfs-metadata 4 1.1 GiB 1.28k 3.4 GiB 0.06 1.8 TiB
myfs-data0 5 115 GiB 2.34M 345 GiB 5.74 1.8 TiB
[rook@rook-ceph-tools-5db564c5c5-r899b /]$ ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average
[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average
pool myfs-data0 objects per pg (18247) is more than 17.63 times cluster average (1035)
原因是因为cephfs所在的pool每个pg中对应的objects数量大于了设定值。
后通过gitlib中的issues 找到对应方案,修改对应设置。系统重回健康
[rook@rook-ceph-tools-5db564c5c5-r899b /]$ ceph config get mgr mon_pg_warn_max_object_skew
10.000000
[rook@rook-ceph-tools-5db564c5c5-r899b /]$ ceph config set mgr mon_pg_warn_max_object_skew 0