openstack ceph故障排查
ceph的自动均衡在一定程度上能够提高数据的同步性与存储的均衡性,但是在某些情况下也会产生一些问题,比如ceph集群中大量服务器同时关机,磁盘损坏等问题,ceph的自动均衡机制会给集群造成一定的困扰。
问题现象:
openstack环境中所有云主机使用非常卡顿,甚至无法使用,ceph 所有osd均up,或者某个osd自动down,集群也一直在均衡。
1、查看ceph集群健康状态
ceph health detail
一般会显示出ceph集群中有多少osd,多少pgs,分别都在干什么等等。
ceph -s
如果显示数据是在均衡状态,但是长时间或者一直没有recovery io,这有可能在均衡中出现问题,还是查看health detail,看看是那些osd 陷入stuck。
pg 3.1a5 is stuck unclean for 27164.858187, current state active+remapped+wait_backfill, last acting [4,14]
pg 3.22 is stuck unclean for 14383.232098, current state active+remapped+wait_backfill, last acting [3,23]
pg 5.25 is stuck unclean for 11660.300263, current state active+remapped+wait_backfill, last acting [20,23]
比如进入osd.4中查看具体情况,进入osd.4所在节点,
ceph osd tree
查看osd4的日志
tail -f /var/log/ceph/ceph-osd.4.log
根据日志可以发现不少问题的线索。
1、有时可能会有某某节点显示no reply,则表示在peering过程中该节点没有回应或者没有心跳,可以进入相应节点,同样的查看该osd的日志,从中寻找线索。
2、根据health detail的结果分析pg的状态
寻找出问题的pg组,获取该pg位于的osd和节点名称,进入相应的节点,查询pg的信息
ceph pg x.xx query
一般的会显示出该pg的故障信息以及原因,可以根据此分别进行解决。
如果查询不了pg,表示在该节点上无法查询pg,则可以将该osd remove,将相应pg分流到其他节点上,然后query
删除osd
ceph osd reweight osd.1 0.0
根据均衡情况使用如下命令
ceph osd crush reweight osd.1 0
stop ceph-osd id=1
ceph osd crush rm osd.1
ceph osd crush rm node-2
ceph osd rm 1
ceph auth del osd.1
可以尝试修复pgs
ceph pg repair x.xx
一般的如果最后所有的节点都是active+的话基本等待均衡完成就可以了,该类方法可以针对pg down+peering等情况。
3、pg unfound
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#failures-osd-unfound
Second, you can identify which OSDs have been probed or might contain data:
ceph pg 2.4 query
“recovery_state”: [
{ “name”: “Started\/Primary\/Active”,
“enter_time”: “2012-03-06 15:15:46.713212”,
“might_have_unfound”: [
{ “osd”: 1,
“status”: “osd is down”}]},
In this case, for example, the cluster knows that osd.1 might have data, but it is down. The full range of possible states include:
already probed
querying
OSD is down
not queried (yet)
Sometimes it simply takes some time for the cluster to query possible locations.
It is possible that there are other locations where the object can exist that are not listed. For example, if a ceph-osd is stopped and taken out of the cluster, the cluster fully recovers, and due to some future set of failures ends up with an unfound object, it won’t consider the long-departed ceph-osd as a potential location to consider. (This scenario, however, is unlikely.)
If all possible locations have been queried and objects are still lost, you may have to give up on the lost objects. This, again, is possible given unusual combinations of failures that allow the cluster to learn about writes that were performed before the writes themselves are recovered. To mark the “unfound” objects as “lost”:
ceph pg 2.5 mark_unfound_lost revert|delete
This the final argument specifies how the cluster should deal with lost objects.
- The “delete” option will forget about them entirely.
- The “revert” option (not available for erasure coded pools) will either roll back to a previous version of the object or (if it was a new object) forget about it entirely. Use this with caution, as it may confuse applications that expected the object to exist.(慎用)