ceph-cursh规则实战及PGS unknown 问题处理

问题描述:

[root@ceph-mon01 ~]# ceph -s

  cluster:

    id:     92d4f66b-94a6-4c40-8941-734f3c44eb4f

    health: HEALTH_ERR

            1 filesystem is offline

            1 filesystem is online with fewer MDS than max_mds

            1 pools have many more objects per pg than average

            Reduced data availability: 256 pgs inactive

  services:

    mon: 3 daemons, quorum ceph-mon01,ceph-mon03,ceph-mon02 (age 5d)

    mgr: ceph-mon03(active, since 5d), standbys: ceph-mon02, ceph-mon01

    mds: cephfs:0

    osd: 9 osds: 9 up (since 43h), 9 in (since 43h); 224 remapped pgs

    rgw: 1 daemon active (ceph-mon01)

  task status:

  data:

    pools:   9 pools, 480 pgs

    objects: 34.60k objects, 8.5 GiB

    usage:   128 GiB used, 142 GiB / 270 GiB avail

               172995/103797 objects misplaced (166.667%)

             256 unknown

             224 active+clean+remapped

解决过程

ceph health detail

...

PG_AVAILABILITY Reduced data availability: 1024 pgs inactive

    pg 4.3c8 is stuck inactive for 246794.767182, current state unknown, last acting []

    pg 4.3ca is stuck inactive for 246794.767182, current state unknown, last acting []

1、检查 osd tree  (本处有,datacenter0, default 两个pg副本入口点)

[root@ceph-mon01 ~]# ceph osd tree

ID  CLASS WEIGHT  TYPE NAME                   STATUS REWEIGHT PRI-AFF

 -9       0.26367 datacenter datacenter0                             

-10       0.26367     room room0                                     

-11       0.08789         rack rack0                                 

 -3       0.08789             host ceph-osd01                        

  0   hdd 0.02930                 osd.0           up  1.00000 1.00000

  1   hdd 0.02930                 osd.1           up  1.00000 1.00000

  7   hdd 0.02930                 osd.7           up  1.00000 1.00000

-12       0.08789         rack rack1                                 

 -1             0 root default                                   

2、查看crushmap 信息(查看直到只有一个cruwh rules , id 0 , 副本入口点为:default

[root@ceph-mon01 ~]# ceph osd getcrushmap -o test.bin

33

[root@ceph-mon01 ~]# crushtool -d test.bin -o test.txt

[root@ceph-mon01 ~]# cat test.txt

# begin crush map

tunable choose_local_tries 0

tunable choose_local_fallback_tries 0

…..

# rules

rule replicated_rule {

        id 0

        type replicated

        min_size 1

        max_size 10

        step take default

        step chooseleaf firstn 0 type host

        step emit

}

3、查看现有pool

[root@ceph-mon01 ~]# ceph osd pool ls

.rgw.root

default.rgw.control

default.rgw.meta

default.rgw.log

default.rgw.buckets.index

default.rgw.buckets.non-ec

default.rgw.buckets.data

cephfs_data

cephfs_metadata

4、查看现有pool使用的crush_rule规划(本示例查看以,使用的是 crush_rule 0 的规划,)

[root@ceph-mon01 ~]# ceph osd dump |grep crush_rule

pool 1 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 47 flags hashpspool stripe_width 0 application rgw

pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 49 flags hashpspool stripe_width 0 application rgw

pool 3 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode warn last_change 51 flags hashpspool stripe_width 0 application rgw

5、修改 crushmap 信息

对于比较熟的 crush 配置比较熟悉的老手推荐使用, 线上业务集群慎用。

5.1 导出crush map

把 ceph 的二进制格式的 crush map 导出并转换为文本格式

# 把二进制格式的crush map导出到test.bin文件中

ceph osd getcrushmap -o test.bin

​# 用 crushtool 工具把 test.bin 里的二进制数据转换成文本形式保存到 test.txt 文档里。

crushtool -d test.bin -o test.txt

5.2 修改test.txt

# 1、将take default改成take datacenter0,修改权重,

rule replicated_rule {

    id 0

    type replicated

    min_size 1

    max_size 10

    step take default

    step chooseleaf firstn 0 type host

    step emit

}

rule datacenter_rule {                       # 规则集的命名,创建pool时可以指定rule集

    id 1                               # id设置为1

    type replicated                # 定义pool类型为replicated(还有esurecode模式)

    min_size 1                     # pool中最小指定的副本数量不能小1

    max_size 10                    # pool中最大指定的副本数量不能大于10

    step take datacenter0          # 定义pg查找副本的入口点

    step chooseleaf firstn 0 type host  # 深度优先、隔离默认为host,设置为host

    step emit                      # 结束

}

# end crush map

5.3 把重新写的 ceph crush 导入 ceph 集群

# 把 test1 转换成二进制形式

crushtool -c test.txt -o new.bin

# 把 test2 导入集群

ceph osd setcrushmap -i new.bin

5.4 修改现有存储池的crush_rule

重新导入集群后,需要把之前存在过的pool池的crush_rule都修一下,否则集群会出现unknown状态有无法达到activee+clean状态

ceph osd pool set cephfs_data crush_rule datacenter_rule

ceph osd pool set cephfs_metadata crush_rule datacenter_rule

5.5 查看集群状态

ceph osd dump  |grep crush_rule # 发现使用的crush_rule的id变为1

pool 8 'cephfs_data' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 180 flags hashpspool stripe_width 0 application cephfs

pool 9 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 183 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs

5.6、查看pool的使用的crush_rule名称

[root@ceph-mon01 ceph-ceph-mon01]# ceph osd pool ls

.rgw.root

default.rgw.control

default.rgw.meta

default.rgw.log

default.rgw.buckets.index

default.rgw.buckets.non-ec

default.rgw.buckets.data

cephfs_data

cephfs_metadata

[root@ceph-mon01 ceph-ceph-mon01]# ceph osd pool get default.rgw.log crush_rule

crush_rule: replicated_rule

[root@ceph-mon01 ceph-ceph-mon01]# ceph osd pool get cephfs_data crush_rule

crush_rule: datacenter_rule

  pgs:     53.333% pgs unknown  问题消除

[root@ceph-mon01 ~]# ceph -s

  cluster:

    id:     92d4f66b-94a6-4c40-8941-734f3c44eb4f

    health: HEALTH_ERR

            1 filesystem is offline

            1 filesystem is online with fewer MDS than max_mds

            1 pools have many more objects per pg than average

  services:

    mon: 3 daemons, quorum ceph-mon01,ceph-mon03,ceph-mon02 (age 5d)

    mgr: ceph-mon03(active, since 5d), standbys: ceph-mon02, ceph-mon01

    mds: cephfs:0

    osd: 9 osds: 9 up (since 45h), 9 in (since 45h); 224 remapped pgs

    rgw: 1 daemon active (ceph-mon01)

  task status:

  data:

    pools:   9 pools, 480 pgs

    objects: 34.60k objects, 8.5 GiB

    usage:   128 GiB used, 142 GiB / 270 GiB avail

    pgs:     172995/103797 objects misplaced (166.667%)

             256 active+clean

             224 active+clean+remapped

    recovery: 2.0 MiB/s, 7 objects/s

[root@ceph-mon01 ceph-ceph-mon01]#

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值