Ceph reports clients failing to respond to cache pressure

Ceph reports clients failing to respond to cache pressure

环境

  • Red Hat Openshift Container Storage 4.x up to 4.8
  • Red Hat Openshift Data Foundation 4.9 or higher
  • Red Hat Ceph Storage 4.x or higher

问题

ceph -s reporting {x} clients failing to respond to cache pressure
    # ceph -s
      cluster:
        id:     11111111-2222-3333-4444-555555666666
        health: HEALTH_WARN
                1 clients failing to respond to cache pressure

决议

  • 在OCP/ODF平台中,当pod试图启动并且花费太多时间(分钟或小时)时,可以报告此错误,通常它试图挂载包含数百万文件的cepfs PV,并且pod留在CreateContainerError中。当pod最终启动时,ceph错误被清除。此问题的解决方法和根本原因描述在:

    在OpenShift中使用具有高文件计数的持久卷的Pods无法启动或花费过多时间

    和/或

    在Openshift Data Foundation/Openshift Container Storage中跳过SELinux重新标记的解决方法

  • 在与上一点不同的场景中,您可以增加/减少以下命令,以帮助客户端更快地释放caps:

    ceph config set mds mds_recall_max_caps xxxx (this should increase)
    ceph config set mds mds_recall_max_decay_rate x.xx (this should be decrease)
    ceph config set mds_session_cache_liveness_decay_rate xxx (this should decrease or increase based on issue)
    
  • If further assistance is required, kindly contact Red Hat Ceph Storage team for further investigation and recommendations.

根本原因

  • The cephfs client has limited caps so there isn’t much left to release. The reason is that MDS is recalling those caps and the client session has become quiet so client not releasing caps. The MDS is trying to reduce outstanding caps to reduce future work.

    The client is likely using those caps (open files / io) so it won’t give them up.

  • How mds_min_caps_per_client relates to inode usage here ?

    When cephfs client wants to operate on an inode, it will query the MDS in various ways, which will then grant the client a set of capabilities. These grant the client permissions to operate on the inode in various ways. If any client exceeds mds_min_caps_per_client limit and does not release caps when MDS revokes these caps then MDS reports this warning.

诊断步骤

Run the mds session ls command and make note of the recall_caps section, this should be over the value current defined for mds_recall_max_caps:

Raw

# oc rsh <active-mds-pod>
# ceph daemon mds.${mds_id} session ls
...
       {
        "id": 765432,
        "entity": {
            "name": {
                "type": "client",
                "num": 765432
            },
            "addr": {
                "type": "v1",
                "addr": "10.0.0.1:0",
                "nonce": 3231639080
            }
        },
        "state": "open",
        "num_leases": 0,
        "num_caps": 1151,
        "request_load_avg": 0,
        "uptime": 138612.869259214,
        "requests_in_flight": 0,
        "completed_requests": 1,
        "reconnecting": false,
        "recall_caps": {
            "value": 91353.23456789,             <------------ recall caps by mds
            "halflife": 60
        },
...
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值