解决 ceph warn: pool <pool name> has too few pgs

原创 2017年03月07日 19:58:34

pool xxx has too few pgs

  • 问题:
root@node-3:~# ceph -s
    cluster 04bb3d14-acee-431e-88cd-7256bccd10a1
     health HEALTH_WARN
            pool images has too few pgs
     monmap e1: 1 mons at {node-3=10.10.10.2:6789/0}
            election epoch 1, quorum 0 node-3
     osdmap e63: 4 osds: 4 up, 4 in
      pgmap v299609: 1636 pgs, 13 pools, 82237 MB data, 11833 objects
            90099 MB used, 14804 GB / 14892 GB avail
                1636 active+clean
  client io 0 B/s rd, 3478 B/s wr, 2 op/s
root@node-3:~# ceph osd pool get images pg_num   
pg_num: 100
root@node-3:~# ceph df
GLOBAL:
    SIZE       AVAIL      RAW USED     %RAW USED 
    14892G     14804G       90099M          0.59 
POOLS:
    NAME                   ID     USED       %USED     MAX AVAIL     OBJECTS
    rbd                    0           0         0        14791G           0
    .rgw                   1         394         0        14791G           2
    images                 2      64278M      0.42        14791G        8049
    volumes                3       3016M      0.02        14791G         814
    backups                4           0         0        14791G           0
    compute                5      14942M      0.10        14791G        2915
    .rgw.root              6         848         0        14791G           3
    .rgw.control           7           0         0        14791G           8
    .rgw.gc                8           0         0        14791G          32
    .users.uid             9         778         0        14791G           4
    .users                 10         14         0        14791G           1
    .rgw.buckets.index     11          0         0        14791G           1
    .rgw.buckets           12       300k         0        14791G           4
root@node-3:~# ceph osd pool ls detail
pool 0 'rbd' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 1 '.rgw' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 2 flags hashpspool stripe_width 0
pool 2 'images' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 100 pgp_num 100 last_change 62 flags hashpspool stripe_width 0
        removed_snaps [1~3]
pool 3 'volumes' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 50 flags hashpspool stripe_width 0
        removed_snaps [1~3]
pool 4 'backups' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 24 flags hashpspool stripe_width 0
pool 5 'compute' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 42 flags hashpspool stripe_width 0
        removed_snaps [1~3]
pool 6 '.rgw.root' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 28 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 7 '.rgw.control' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 30 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 8 '.rgw.gc' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 32 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 9 '.users.uid' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 34 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 10 '.users' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 39 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 11 '.rgw.buckets.index' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 51 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 12 '.rgw.buckets' replicated size 1 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 53 owner 18446744073709551615 flags hashpspool stripe_width 0
  • 分析解决:

    1. 源码分析:
      在源码中搜索has too few pgs就定位到了mon/PGMonitor.cc中的get_health函数,代码块如下。在master版本中没有搜到,后来切换到0.94.6中,搜索到了;master中是“ pool images has many more objects per pg than average (too few pgs?)”
    2. 源码流程:
      1. 先从pg_map中取所有的pool的总的objects num,然后取所有pool的pg num, 然后计算average_objects_per_pg,这个是指整个集群中的所有objects和所有的pg的比率,也就是整个集群中每个pg有多少个objects;
      2. 取每个pool的objects num,然后取每个pool的pg num,然后计算objects_per_pg,这个值是每个pool中objects和pg的比率,也就是这个pool中每个pg有多少个objects.
      3. 计算ratio=objects_per_pg / average_objects_per_pg,这个比率是pool和集群平均值的比率。
      4. 取配置文件中的mon_pg_warn_max_object_skew这个参数的值,(默认为10)。
      5. 比较ratio和mon_pg_warn_max_object_skew这个参数值,然后就会出现了问题中的Health_warn警告。
  • 源码:

if (!pg_map.pg_stat.empty()) 
  for (ceph::unordered_map<int,pool_stat_t>::const_iterator p = pg_map.pg_pool_sum.begin();
       p != pg_map.pg_pool_sum.end();
       ++p) {
    const pg_pool_t *pi = mon->osdmon()->osdmap.get_pg_pool(p->first);
    if (!pi)
      continue;   // in case osdmap changes haven't propagated to PGMap yet
    if (pi->get_pg_num() > pi->get_pgp_num()) {
      ostringstream ss;
      ss << "pool " << mon->osdmon()->osdmap.get_pool_name(p->first) << " pg_num "
         << pi->get_pg_num() << " > pgp_num " << pi->get_pgp_num();
      summary.push_back(make_pair(HEALTH_WARN, ss.str()));
      if (detail)
        detail->push_back(make_pair(HEALTH_WARN, ss.str()));
    }
    int average_objects_per_pg = pg_map.pg_sum.stats.sum.num_objects / pg_map.pg_stat.size();
    if (average_objects_per_pg > 0 &&
        pg_map.pg_sum.stats.sum.num_objects >= g_conf->mon_pg_warn_min_objects &&
        p->second.stats.sum.num_objects >= g_conf->mon_pg_warn_min_pool_objects) {
      int objects_per_pg = p->second.stats.sum.num_objects / pi->get_pg_num();
      float ratio = (float)objects_per_pg / (float)average_objects_per_pg;
      if (g_conf->mon_pg_warn_max_object_skew > 0 &&
          ratio > g_conf->mon_pg_warn_max_object_skew) {
        ostringstream ss;
        ss << "pool " << mon->osdmon()->osdmap.get_pool_name(p->first) << " has too few pgs";
        summary.push_back(make_pair(HEALTH_WARN, ss.str()));
        if (detail) {
          ostringstream ss;
          ss << "pool " << mon->osdmon()->osdmap.get_pool_name(p->first) << " objects per pg ("
             << objects_per_pg << ") is more than " << ratio << " times cluster average ("
             << average_objects_per_pg << ")";
          detail->push_back(make_pair(HEALTH_WARN, ss.str()));
        }
      }
    }

解决:
1. 增加相应pool的pg数
2. 删除多余的无用的pool,所以在ceph cluster中创建测试pool的时候,不要给太多的pg。
3. 修改mon_pg_warn_max_object_skew参数值,增加这个值,提高阈值。

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/ygtlovezf/article/details/60778091

openstack 与 ceph (osd 部署)

OSD 部署 (ceph-0.8.17) 目标 ceph 节点中, 每个磁盘(10块) 创建独立 Raid0 每个磁盘创建独立对应 OSD 创建CEPH 集群 CEPH...
  • signmem
  • signmem
  • 2015-07-24 10:41:26
  • 2192

ceph -s集群报错too many PGs per OSD

背景集群状态报错,如下:# ceph -s cluster 1d64ac80-21be-430e-98a8-b4d8aeb18560 health HEALTH_WARN ...
  • styshoo
  • styshoo
  • 2017-03-17 05:43:46
  • 1021

解决too many PGs per OSD的问题

当一个集群中创建的pg个数过多时(创建的pool过多或者pool指定的pg过多),Mon就会报出如下警告: $ ceph -s cluster 27d39faa-48ae-4356-a8e3...
  • scaleqiao
  • scaleqiao
  • 2016-03-04 17:47:25
  • 6436

Ceph 集群状态监控细化

需求 在做Ceph的监控报警系统时,对于Ceph集群监控状态的监控,最初只是简单的OK、WARN、ERROR,按照Ceph的status输出来判断的,仔细想想,感觉这些还不够,因为WARN、ERROR...
  • ygtlovezf
  • ygtlovezf
  • 2017-12-05 15:24:21
  • 855

spring整合memcached注意事项-poolname

Memecached JavaClient在使用前需初始化SockIOPool,该类只有一个protected的构造方法,因此外部需使用其提供的静态方法getInstance来获取SockIOPool...
  • maerdym
  • maerdym
  • 2013-08-25 11:41:57
  • 5140

ceph集群reweight调整各osd权重使各osd pg数均衡

ceph中各osd的pg数量是近似均匀的,可以认为各pg下的数据容量大致相等,因此从原理上来说保证各osd pg相等,则各osd上的磁盘使用量也差不多相同,但是由于算法做不到绝对均匀的原因某些osd上...
  • for_tech
  • for_tech
  • 2017-06-26 12:35:01
  • 2336

ceph部署时错误分析

执行ceph-deploy install ceph01 ceph02 ceph03时报错解决办法: 经过查询方知道是由于网络慢的原因导致的 可现在ceph02、ceph03节点安装上cephap...
  • qq_36357820
  • qq_36357820
  • 2017-11-09 17:46:32
  • 118

HEALTH_WARN clock skew detected的解决办法

当你的ceph集群出现如下状况时, $ ceph -s     cluster 3a4399c0-2458-475f-89be-ff961fbac537      health HEALTH_WARN...
  • scaleqiao
  • scaleqiao
  • 2015-07-21 22:36:55
  • 9911

手动部署 ceph 环境说明 (luminous 版)

环境说明 CentOS Linux release 7.2 (Core) 主机名 客户连接 数据同步 mon addr data addr osd journal osd disk ...
  • signmem
  • signmem
  • 2017-11-17 21:33:49
  • 558

手动部署 ceph mon (luminous 版)

环境 参考 手动部署 ceph 环境说明 (luminous 版) 模板创建脚本 ceph.conf.template [global] fsid = $cephuuid mon init...
  • signmem
  • signmem
  • 2017-11-20 10:03:47
  • 257
收藏助手
不良信息举报
您举报文章:解决 ceph warn: pool <pool name> has too few pgs
举报原因:
原因补充:

(最多只允许输入30个字)