mds元信息缓存不释放问题

最新推荐文章于 2024-02-19 11:18:58 发布

Lucien168

最新推荐文章于 2024-02-19 11:18:58 发布

阅读量1.3k

点赞数

本文链接：https://blog.csdn.net/weixin_44389885/article/details/86621703

版权

1. 问题：

ceph集群警告信息如下：

ceph -s
health HEALTH_WARN
mds0: Client xxx-online00.gz01 failing to respond to cache pressure

2. 分析问题过程

2.1 官方解释

类型	描述
消息:	“Client name failing to respond to cache pressure”
代码:	MDS_HEALTH_CLIENT_RECALL,MDS_HEALTH_CLIENT_RECALL_MANY
描述:	客户端有各自的元数据缓存，客户端缓存中的条目（比如索引节点）也会存在于 MDS 缓存中，所以当 MDS 需要削减其缓存时（保持在 mds_cache_size 以下），它也会发消息给客户端让它们削减自己的缓存。如果有客户端没响应或者有缺陷，就会妨碍 MDS 将缓存保持在 mds_cache_size 以下， MDS 就有可能耗尽内存而后崩溃。如果某个客户端的响应时间超过了 mds_recall_state_timeout （默认为 60s ），这条消息就会出现。

2.2 查看客户端session

$ ceph daemon mds.ceph-epnfs-mds01.gz01 session ls
[
    {
        "id": 4746087,
        "num_leases": 9,
        "num_caps": 57368,
        "state": "open",
        "replay_requests": 0,
        "completed_requests": 1,
        "reconnecting": false,
        "inst": "client.4746087 10.1.7.1:0\/1700679012",
        "client_metadata": {
            "entity_id": "admin",
            "hostname": "test-hostname00",
            "kernel_version": "3.10.0-514.16.1.el7.x86_64"
        }
    }
]