频繁添加删除osd,导致osd无法up

###环境介绍

  1. 预上线系统,手工已经设置好crushmap,并且已经指定了osd.139所在的location
  2. 集群开启了noout(ceph osd set noout)
  3. ceph版本: 0.94.5
  4. osd设置了osd crush update on start = false,避免osd启动以后改变crushmap

###故障现象 在模拟单节点故障发生的过程中,多次手工添加和删除同一个osd(只删除数据和keyring,不动crushmap内容),最后发现新加的osd进程虽然已经启动,并且启动日志也无报错,但是始终无法进入up状态。

2016-04-01 11:19:16.868837 7fee3654b900  0 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 104255
.....

2016-04-01 11:19:19.295992 7fee3654b900  0 osd.139 12789 crush map has features 2200130813952, adjusting msgr requires for clients
2016-04-01 11:19:19.296008 7fee3654b900  0 osd.139 12789 crush map has features 2200130813952 was 8705, adjusting msgr requires for mons
2016-04-01 11:19:19.296016 7fee3654b900  0 osd.139 12789 crush map has features 2200130813952, adjusting msgr requires for osds
2016-04-01 11:19:19.296052 7fee3654b900  0 osd.139 12789 load_pgs
2016-04-01 11:19:19.296094 7fee3654b900  0 osd.139 12789 load_pgs opened 0 pgs
2016-04-01 11:19:19.296878 7fee3654b900 -1 osd.139 12789 log_to_monitors {default=true}
2016-04-01 11:19:19.305091 7fee246f1700  0 osd.139 12789 ignoring osdmap until we have initialized
2016-04-01 11:19:19.305239 7fee246f1700  0 osd.139 12789 ignoring osdmap until we have initialized
2016-04-01 11:19:19.305425 7fee3654b900  0 osd.139 12789 done with init, starting boot process

开启debug osd=20以后发现始终进行如下操作

2016-04-01 11:46:23.300790 7f9219d15700 20 osd.139 12813 update_osd_stat osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:23.300821 7f9219d15700  5 osd.139 12813 heartbeat: osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:25.200613 7f9231e86700  5 osd.139 12813 tick
2016-04-01 11:46:25.200644 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:25.200648 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:25.600974 7f9219d15700 20 osd.139 12813 update_osd_stat osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:25.601002 7f9219d15700  5 osd.139 12813 heartbeat: osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:26.200759 7f9231e86700  5 osd.139 12813 tick
2016-04-01 11:46:26.200784 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:26.200788 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:27.200867 7f9231e86700  5 osd.139 12813 tick
2016-04-01 11:46:27.200892 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:27.200895 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:28.201002 7f9231e86700  5 osd.139 12813 tick
2016-04-01 11:46:28.201022 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:28.201030 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:29.101147 7f9219d15700 20 osd.139 12813 update_osd_stat osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:29.101180 7f9219d15700  5 osd.139 12813 heartbeat: osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:29.201115 7f9231e86700  5 osd.139 12813 tick
2016-04-01 11:46:29.201128 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:29.201132 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:30.201237 7f9231e86700  5 osd.139 12813 tick
2016-04-01 11:46:30.201267 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:30.201271 7f9231e86700 10 osd.139 12813 do_waiters -- finish

###解决方法 1.在crush中删除对应的osd信息

ceph osd crush remove osd.139 #注意可能会导致数据迁移

2.启动osd服务,将osd添加回crushmap内。

ceph osd crush add 139 1.0 host=xxx

###总结 在频繁添加和删除osd的时候,可能触发了某些bug,导致osdmap无法实时更新,需要手工通过操作crushmap来刷新。

转载于:https://my.oschina.net/diluga/blog/651360

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值