ceph: how to fix if monitor IP changes

  1. 修正所有结点ip地址,之后的症状...
  • 在一个结点上执行:sudo ceph status , 从输出可看到,ceph还是在尝试连接“旧”地址;
  • systemctl status ceph-mon@xxx.service , 说unable to bind to ... "旧"地址;

单纯修改/etc/hosts和/etc/ceph/ceph.conf是没有用的! 因为ceph monitor 是把配置信息存在monmap中的, 不能随随便便更改, 因为monitor好比集群大脑,太重要了!以后最好给monitor分配私网IP地址。

    2. 怎么解决?

我偷个懒,把请教大牛的IRC聊天记录贴出了。主要思想就是把所有monitor先停下来,从集群中移走,直到集群剩下一个monitor,然后再一个一个从头加入进来

4:20:07 PM - zren: Hello, may I ask a quick question here: 
what should I do to recover my cluster after a long period of downtime?
2/3 nodes's IP has changed during this time. "ceph -s" still try to connect
the old IP even after I've set the new ip in /etc/hosts. 
And the "mon_host=" list in /etc/ceph/ceph.conf still shows the old IP addresses, 
should I correct the list manually?

4:28:15 PM - oms101: zren -> never done this but do change the /etc/ceph/ceph.conf
4:28:38 PM - oms101: This is definitely used by the client tools

4:52:04 PM - joao: <oms101> zren -> never done this but do change the /etc/ceph/ceph.conf
4:52:07 PM - joao: this may not be sufficient
4:52:15 PM - joao: how did the ip change?
4:52:51 PM - joao: did you properly moved the monitors to the new ips first?
4:53:03 PM - joao: i'm guessing no
4:53:22 PM - joao: so you'll likely have a monmap with the old ips in it
4:53:55 PM - joao: likelihood is that the monitors won't even be able to form quorum because
they have wrong ips for the monitors
4:53:56 PM - oms101: yes good point joao

4:54:28 PM - joao: in which case, your best chance will be extracting the current 
monmap from all monitors and injecting a new map
4:54:38 PM - oms101: http://docs.ceph.com/docs/master/man/8/monmaptool/
4:54:49 PM - oms101: is useful documentation on the monmaptool
4:54:49 PM - joao: this will mean shutting down your monitors, but given you likely don't even have quorum who cares anyway
4:56:01 PM - joao: if we're pointing to upstream, i'd instead suggest http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster
4:56:22 PM - joao: absolutely no clue if this has been mapped to our internal docs, although i hope so
4:56:48 PM - joao: omg
4:57:03 PM - zren: joao: first of all, thanks!  it changed because the network facility in server room was reconstructed by the IT guy, hah. 

5:22:07 PM - zren: joao: come back again;-) unfortunately, I got this error when trying to get the copy of monmap file according the link you point to:
5:22:07 PM - zren: ceph1:~ # ceph-mon -i `hostname` --extract-monmap /tmp/monmap
5:22:07 PM - zren: IO error: lock /var/lib/ceph/mon/ceph-ceph1/store.db/LOCK: Resource temporarily unavailable
5:22:07 PM - zren: 2016-06-29 17:14:04.155152 7fb9cf3607c0 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-ceph1': (22) Invalid argument
5:23:15 PM - joao: zren, the monitor is running
5:23:24 PM - joao: as i said, you need to shut them down
5:24:37 PM - zren: joao: Yes, according to the link, I stopped 2/3 nodes, so only one surviving monitor is left;-)
5:25:02 PM - joao: zren,you need to do that on *all* the monitors
5:25:11 PM - joao: you need the same map epoch on all the monitors
5:25:17 PM - joao: otherwise that will lead to inconsistencies

5:25:52 PM - joao: the idea is roughly to do
5:26:15 PM - zren: joao: OK, thanks! will try.. please treat me as a very newbie hah;-)
5:27:00 PM - joao: you only need to extract the monmap from the monitor with the latest monmap
5:27:11 PM - joao: but need to inject it into every monitor
5:27:23 PM - zren: joao: got it;-)
5:27:29 PM - smithfarm1 has left the room (Quit: Ping timeout: 121 seconds).
5:27:43 PM - joao: if by some chance you ended up running the cluster with quorum with less than 3 monitors, then you need to check which one has the latest monmap
5:28:02 PM - joao: in that case, extract the monmap on all the monitors and use the monmaptool to check the latest epoch
5:28:16 PM - joao: monmaptool --print /path/to/monmap
5:28:22 PM - joao: that will give you the map epoch
5:28:39 PM - joao: i can't emphasize this enough: use the latest epoch
下面步骤就是停掉所有monitor之后,恢复第二个monitor的大致方法
ceph2:~ # ceph mon remove ceph2 ceph2:~ # rm -rf /var/lib/ceph/mon/ceph-ceph2/* 
ceph2:~ # mkdir tmp ceph2:~ # ceph mon getmap -o tmp/monmap 
ceph2:~ # ceph auth get mon. -o tmp/keyring 
ceph2:~ # ceph-mon -i ceph2 --mkfs --monmap tmp/monmap --keyring tmp/keyring 
ceph2:~ # ceph-mon -i ceph2 --public-addr "new-ip":6789 
ceph2:~ # systemctl start ceph-mon@ceph2.service 
ceph2:~ # systemctl status ceph-mon@ceph2.service 

也可尝试下面命令,来自[3]:

    #Add the new monitor locations  
    # monmaptool --create --add mon0 192.168.32.2:6789 --add osd1 192.168.32.3:6789 \  
      --add osd2 192.168.32.4:6789 --fsid 61a520db-317b-41f1-9752-30cedc5ffb9a \  
      --clobber monmap  
       
    #Retrieve the monitor map  
    # ceph mon getmap -o monmap.bin  
       
    #Check new contents  
    # monmaptool --print monmap.bin  
       
    #Inject the monmap  
    # ceph-mon -i mon0 --inject-monmap monmap.bin  
    # ceph-mon -i osd1 --inject-monmap monmap.bin  
    # ceph-mon -i osd2 --inject-monmap monmap.bin 

参考文档:

[1]  http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

[2] http://docs.ceph.com/docs/master/man/8/monmaptool/

[3] http://os.51cto.com/art/201412/462140.htm

转载于:https://my.oschina.net/u/2475751/blog/702790

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值