问题描述:
[root@ceph-mon01 ~]# ceph -s
cluster:
id: 92d4f66b-94a6-4c40-8941-734f3c44eb4f
health: HEALTH_ERR
1 filesystem is offline
1 filesystem is online with fewer MDS than max_mds
1 pools have many more objects per pg than average
Reduced data availability: 256 pgs inactive
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon03,ceph-mon02 (age 5d)
mgr: ceph-mon03(active, since 5d), standbys: ceph-mon02, ceph-mon01
mds: cephfs:0
osd: 9 osds: 9 up (since 43h), 9 in (since 43h); 224 remapped pgs
rgw: 1 daemon active (ceph-mon01)
task status:
data:
pools: 9 pools, 480 pgs
objects: 34.60k objects, 8.5 GiB
usage: 128 GiB used, 142 GiB / 270 GiB avail
172995/103797 objects misplaced (166.667%)
256 unknown
224 active+clean+remapped
解决过程
ceph health detail
...
PG_AVAILABILITY Reduced data availability: 1024 pgs inactive
pg 4.3c8 is stuck inactive for 246794.767182, current state unknown, last acting []
pg 4.3ca is stuck inactive for 246794.767182, current state unknown, last acting []
1、检查 osd tree (本处有,datacenter0, default 两个pg副本入口点)
[root@ceph-mon01 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-9 0.26367 datacenter datacenter0
-10 0.26367 room room0
-11 0.08789 rack rack0
-3 0.08789 host ceph-osd01
0 hdd 0.02930 osd.0 up 1.00000 1.00000
1 hdd 0.02930 osd.1 up 1.00000 1.00000
7 hdd 0.02930 osd.7 up 1.00000 1.00000
-12 0.08789 rack rack1
-1 0 root default
2、查看crushmap 信息(查看直到只有一个cruwh rules , id 0 , 副本入口点为:default )
[root@ceph-mon01 ~]# ceph osd getcrushmap -o test.bin
33
[root@ceph-mon01 ~]# crushtool -d test.bin -o test.txt
[root@ceph-mon01 ~]# cat test.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
…..
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
3、查看现有pool
[root@ceph-mon01 ~]# ceph osd pool ls
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.buckets.index
default.rgw.buckets.non-ec
default.rgw.buckets.data
cephfs_data
cephfs_metadata
4、查看现有pool使用的crush_rule规划(本示例查看以,使用的是 crush_rule 0 的规划,)
[root@ceph-mon01 ~]# ceph osd dump |grep crush_rule
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 47 flags hashpspool stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 49 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 51 flags hashpspool stripe_width 0 application rgw
5、修改 crushmap 信息
对于比较熟的 crush 配置比较熟悉的老手推荐使用, 线上业务集群慎用。
5.1 导出crush map
把 ceph 的二进制格式的 crush map 导出并转换为文本格式
# 把二进制格式的crush map导出到test.bin文件中
ceph osd getcrushmap -o test.bin
# 用 crushtool 工具把 test.bin 里的二进制数据转换成文本形式保存到 test.txt 文档里。
crushtool -d test.bin -o test.txt
5.2 修改test.txt
# 1、将take default改成take datacenter0,修改权重,
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule datacenter_rule { # 规则集的命名,创建pool时可以指定rule集
id 1 # id设置为1
type replicated # 定义pool类型为replicated(还有esurecode模式)
min_size 1 # pool中最小指定的副本数量不能小1
max_size 10 # pool中最大指定的副本数量不能大于10
step take datacenter0 # 定义pg查找副本的入口点
step chooseleaf firstn 0 type host # 深度优先、隔离默认为host,设置为host
step emit # 结束
}
# end crush map
5.3 把重新写的 ceph crush 导入 ceph 集群
# 把 test1 转换成二进制形式
crushtool -c test.txt -o new.bin
# 把 test2 导入集群
ceph osd setcrushmap -i new.bin
5.4 修改现有存储池的crush_rule
重新导入集群后,需要把之前存在过的pool池的crush_rule都修一下,否则集群会出现unknown状态有无法达到activee+clean状态
ceph osd pool set cephfs_data crush_rule datacenter_rule
ceph osd pool set cephfs_metadata crush_rule datacenter_rule
5.5 查看集群状态
ceph osd dump |grep crush_rule # 发现使用的crush_rule的id变为1
pool 8 'cephfs_data' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 180 flags hashpspool stripe_width 0 application cephfs
pool 9 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 183 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
5.6、查看pool的使用的crush_rule名称
[root@ceph-mon01 ceph-ceph-mon01]# ceph osd pool ls
.rgw.root
default.rgw.control
default.rgw.meta
default.rgw.log
default.rgw.buckets.index
default.rgw.buckets.non-ec
default.rgw.buckets.data
cephfs_data
cephfs_metadata
[root@ceph-mon01 ceph-ceph-mon01]# ceph osd pool get default.rgw.log crush_rule
crush_rule: replicated_rule
[root@ceph-mon01 ceph-ceph-mon01]# ceph osd pool get cephfs_data crush_rule
crush_rule: datacenter_rule
pgs: 53.333% pgs unknown 问题消除
[root@ceph-mon01 ~]# ceph -s
cluster:
id: 92d4f66b-94a6-4c40-8941-734f3c44eb4f
health: HEALTH_ERR
1 filesystem is offline
1 filesystem is online with fewer MDS than max_mds
1 pools have many more objects per pg than average
services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon03,ceph-mon02 (age 5d)
mgr: ceph-mon03(active, since 5d), standbys: ceph-mon02, ceph-mon01
mds: cephfs:0
osd: 9 osds: 9 up (since 45h), 9 in (since 45h); 224 remapped pgs
rgw: 1 daemon active (ceph-mon01)
task status:
data:
pools: 9 pools, 480 pgs
objects: 34.60k objects, 8.5 GiB
usage: 128 GiB used, 142 GiB / 270 GiB avail
pgs: 172995/103797 objects misplaced (166.667%)
256 active+clean
224 active+clean+remapped
recovery: 2.0 MiB/s, 7 objects/s
[root@ceph-mon01 ceph-ceph-mon01]#