排查【Jedis客户端触发 Too many open files 问题】

现象描述

  • 2022-08-08 11点开始好几个业务系统相继抛出错误,其中一个业务系统抛出的错误截图如下:

业务系统日志截图

  • UidService是用作分布式id生成的服务,打开uid-service服务日志如下,看到从 2022-08-08 23:05:32 开始大量抛出“打开的文件过多”的错误日志。熟悉Linux的同学,对这个都不应该默认,这个一般是应为单个进程打开的 File Descriptor(简称:文件描述符) 太多导致的(注:Linux中一切皆文件,当进程对文件进行读写或是打开网络连接时都会产生fd;OS处于安全考虑都会对fd的个数加以限制,CentOS中默认为每个进程1024个)。
2022-08-08 23:05:32.614  WARN 30239 --- [nio-8777-exec-8] o.s.r.s.RemoteInvocationTraceInterceptor : Processing of HessianServiceExporter remote call resulted in fatal exception: com.ccjt.eip.web.service.IUidService.nextProjectNo

redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
	at redis.clients.util.Pool.getResource(Pool.java:53) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226) ~[jedis-2.9.0.jar!/:na]
	at cn.hutool.db.nosql.redis.RedisDS.getJedis(RedisDS.java:136) ~[hutool-all-4.6.8.jar!/:na]
	at com.ccjt.eip.web.service.impl.UidServiceImpl.connectRedis(UidServiceImpl.java:45) ~[classes!/:0.0.1-SNAPSHOT]
	at com.ccjt.eip.web.service.impl.UidServiceImpl.nextProjectNo(UidServiceImpl.java:146) ~[classes!/:0.0.1-SNAPSHOT]
	at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_171]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_171]
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) ~[spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) [spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) [spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.remoting.support.RemoteInvocationTraceInterceptor.invoke(RemoteInvocationTraceInterceptor.java:78) ~[spring-context-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) [spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212) [spring-aop-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at com.sun.proxy.$Proxy69.nextProjectNo(Unknown Source) [na:na]
	at sun.reflect.GeneratedMethodAccessor57.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_171]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_171]
	at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:302) [hessian-4.0.60.jar!/:3.1.0]
	at com.caucho.hessian.server.HessianSkeleton.invoke(HessianSkeleton.java:217) [hessian-4.0.60.jar!/:3.1.0]
	at org.springframework.remoting.caucho.HessianExporter.doInvoke(HessianExporter.java:228) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.remoting.caucho.HessianExporter.invoke(HessianExporter.java:144) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.remoting.caucho.HessianServiceExporter.handleRequest(HessianServiceExporter.java:64) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.web.servlet.mvc.HttpRequestHandlerAdapter.handle(HttpRequestHandlerAdapter.java:53) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1040) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:943) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:909) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:652) [tomcat-embed-core-9.0.37.jar!/:4.0.FR]
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) [spring-webmvc-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:733) [tomcat-embed-core-9.0.37.jar!/:4.0.FR]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) [tomcat-embed-websocket-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) [spring-web-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:541) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:373) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1589) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_171]
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-9.0.37.jar!/:9.0.37]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: 打开的文件过多
	at redis.clients.jedis.Connection.connect(Connection.java:207) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:93) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1767) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:106) ~[jedis-2.9.0.jar!/:na]
	at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:899) ~[commons-pool2-2.8.0.jar!/:2.8.0]
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:429) ~[commons-pool2-2.8.0.jar!/:2.8.0]
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:354) ~[commons-pool2-2.8.0.jar!/:2.8.0]
	at redis.clients.util.Pool.getResource(Pool.java:49) ~[jedis-2.9.0.jar!/:na]
	... 63 common frames omitted
Caused by: java.net.SocketException: 打开的文件过多
	at java.net.Socket.createImpl(Socket.java:460) ~[na:1.8.0_171]
	at java.net.Socket.getImpl(Socket.java:520) ~[na:1.8.0_171]
	at java.net.Socket.setReuseAddress(Socket.java:1449) ~[na:1.8.0_171]
	at redis.clients.jedis.Connection.connect(Connection.java:174) ~[jedis-2.9.0.jar!/:na]
	... 70 common frames omitted

紧急处理

  • 由于这个是操作系统对用户启动单个进程作的限制(配置是针对用户,用ulimit -a可查看);默认为1024个。
配置查看
  1. 通过 ulimit -a 配合 lsof -p $(jps -l | grep ‘uid’ | awk ‘{print $1}’) | wc -l 查看

问题机器的配置截图

  1. cat /proc/$(jps -l | grep ‘uid’ | awk ‘{print $1}’)/limits,其中uid为服务查找关键词
[root@ejy-kf2-jyxt2 ~]# cat /proc/$(jps -l | grep uid | awk '{print $1}')/limits
Limit                     Soft Limit           Hard Limit           Units     
...
Max open files            1024                 1024                 files     
...
处理方法
  1. 重启应用,优先保证服务可用性(适合由于长期运行导致,fd累计增多达到上限【此处适用】)。
  2. 应用确实需要比较多的fd(最典型的例子就是ES服务了,它需要同时操作大量磁盘文件达到很高的IOPS),修改/etc/security/limits.conf文件,重启或sysctl -p后生效
[root@demo ~]# vi /etc/security/limits.conf
# 配置文件末尾添加如下配置,*表示针对所有用户,nofile表示fd最大个数
...
* soft nofile 204800
* hard nofile 204800
...

根原排查

  • 经过 紧急处理 处理后服务暂时可用,但是根本原因还不得而知;和同事讨论后得知此服务已经正常运行1年半左右,且中间没有重启过。结合fd的产生原因以及这个服务本身只是负责产生全局的id,应该不存在大量的文件操作和网络操作;如此的话不应该会触及系统默认1024个fd的限制才对。
代码定位
  • 从日志的异常栈中 at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:106) ~[jedis-2.9.0.jar!/:na] 提示找到Redis获取链接的方法。
public class UidServiceImpl implements IUidService {
    
    // 关键代码如下...start...
	private static Jedis jedis = null;
    private static String confname  = null;
    static {
        confname = ResourceBundle.getBundle("conf").getString("confname");
        Setting setting = new Setting("config/redis-" + Lists.newArrayList(Splitter.on("_").split(confname)).get(1) + ".setting");
        jedis = RedisDS.create(setting,null).getJedis();
        System.out.println("已连接上redis");
    }

    private void connectRedis(){
        confname = ResourceBundle.getBundle("conf").getString("confname");
        Setting setting = new Setting("config/redis-" + Lists.newArrayList(Splitter.on("_").split(confname)).get(1) + ".setting");
        jedis = RedisDS.create(setting,null).getJedis();
        System.out.println("已连接上redis");
    }

    @Override
    public synchronized Integer nextNoticeId() {
        String noticeIdKey = confname + "_" + IdKeyEnum.NOTICE.getkey();
        String idStr = null;
		
        try {
            idStr = jedis.get(noticeIdKey);
        }catch (Exception e){
            connectRedis();
            idStr = jedis.get(noticeIdKey);
        }
        
        //无缓存需要重新初始化
        if(idStr==null){
            try {
                //获取当前最大编号,存入缓存,自增一次后再返回出去
                idStr = Convert.toStr(initServiceImpl.initNoticeIdStartValue());
                jedis.set(noticeIdKey, idStr);
                jedis.incr(noticeIdKey);
                System.out.println(LocalDateTime.now().toString() + "noticeid\t" + jedis.get(noticeIdKey));
                return Convert.toInt(idStr) + 1;
            } catch (Exception e) {
                e.printStackTrace();
                return null;
            }
        }else{
            jedis.incr(noticeIdKey);
            System.out.println(LocalDateTime.now().toString() + "noticeid\t" + jedis.get(noticeIdKey));
            return Convert.toInt(idStr) + 1;
        }
    }

	// 关键代码如下...end...
}
  • 其中可能会产生问题的地方如下:
try {
   idStr = jedis.get(noticeIdKey);
} catch (Exception e){
    // connectRedis的操作:一旦产生异常,就重现创建链接池,并从中获取一个链接
    // 并没有对之前生成的链接池进行关闭
    connectRedis();
    idStr = jedis.get(noticeIdKey);
}
复现问题
  • 在测试环境发布一个测试用的项目,并通过重启redis服务来触发jedis.get方法的异常,使其走到connectRedis方法中
  1. 重启Redis服务
[root@VM-4-3-centos ~]# docker start myredis 
myredis
[root@VM-4-3-centos ~]# docker restart myredis 
myredis
[root@VM-4-3-centos ~]# docker restart myredis 
myredis
[root@VM-4-3-centos ~]# docker restart myredis 
myredis
[root@VM-4-3-centos ~]# docker restart myredis 
myredis
  1. 调用服务的/uid/notice/nextid接口(触发nextNoticeId方法)并查看fd【25932 为对应的】
[root@ejy-kf2-jyxt2 ~]# jps -l | grep uid
25932 uid-distribute-0.0.1-SNAPSHOT.jar
# 查看fd个数
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l 
81
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l 
83
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l 
84
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l 
85
[root@ejy-kf2-jyxt2 ~]# lsof -p 25932 | wc -l 
86
  1. 查看具体的fd占用情况 lsof -p $(jps -l | grep uid) | wc -l,发现未释放的链接池产生的fd变成了sock还存在

在这里插入图片描述

代码改造
  • 获取链接之前把老的 Jedis链接 以及 RedisDS 关闭
public class UidServiceImpl implements IUidService {
	private static RedisDS redisDS = null;
	private static Jedis jedis = null;
	private static String confname  = null;
	static {
	    confname = ResourceBundle.getBundle("conf").getString("confname");
	    Setting setting = new Setting("config/redis-" + Lists.newArrayList(Splitter.on("_").split(confname)).get(1) + ".setting");
	    jedis = RedisDS.create(setting,null).getJedis();
	    System.out.println("已连接上redis");
	}
	
	private void connectRedis(){
	    // 释放资源,测试发现只有通过jedis.close();才能把fd释放掉
	    if (null != jedis) {
            System.out.println("关闭jedis连接");
            jedis.close();
        }
        if (null != redisDS) {
            try {
                System.out.println("关闭连接池");
                redisDS.close();
            } catch (IOException ex) {
                ex.printStackTrace();
            }
        }
	    confname = ResourceBundle.getBundle("conf").getString("confname");
	    Setting setting = new Setting("config/redis-" + Lists.newArrayList(Splitter.on("_").split(confname)).get(1) + ".setting");
	    redisDS = RedisDS.create(setting,null);
	    jedis = redisDS.getJedis();
	    System.out.println("已连接上redis");
	}
}
  • 统计文件描述符个数
[root@ejy-kf2-jyxt2 ~]# lsof -p $(jps -l | grep uid | awk '{print $1}') | wc -l 
77
[root@ejy-kf2-jyxt2 ~]# lsof -p $(jps -l | grep uid | awk '{print $1}') | wc -l 
77
[root@ejy-kf2-jyxt2 ~]# lsof -p $(jps -l | grep uid | awk '{print $1}') | wc -l 
77
  • 查看具体的fd占用情况,fd一直保持在35u没有新增

修改代码后的截图

总结

  1. 这是一个由于链接资源没有关闭而导致的内存泄漏的实际案例。
  2. 代码有待优化,这里虽然初始化了redis连接池,然而并没有用到连接池的功能。
  3. 思考为什么链接断开后链接池没有检测到链接断开从而自动执行close动作。
  4. 这里我并没有完整复现Too many open files现象,有兴趣的同学可以把ulimit调小试下能否完整复现出线上的问题场景。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值