redis cluster 的ERR max number of clients reached 问题排查

早上发现微服务连不上redis cluster了,看来下日志如下

 

[root@win-jrh378d7scu 7005]# bin/redis-cli -c -h 15.31.213.183 -p 7005
15.31.213.183:7005> cluster info
ERR max number of clients reached
15.31.213.183:7005>

 

 

 


2019-03-26 22:00:30.011 http-nio-9090-exec-4 ERROR org.apache.juli.logging.DirectJDKLog.log(DirectJDKLog.java:182) - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool] with root cause
java.util.NoSuchElementException: Unable to validate object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:494) ~[commons-pool2-2.4.3.jar!/:2.4.3]
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361) ~[commons-pool2-2.4.3.jar!/:2.4.3]
at redis.clients.util.Pool.getResource(Pool.java:49) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisSlotBasedConnectionHandler.getConnectionFromSlot(JedisSlotBasedConnectionHandler.java:66) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:116) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:31) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisCluster.get(JedisCluster.java:124) ~[jedis-2.9.0.jar!/:?]
at com.hp.nova.utils.RedisClusterUtil.mget(RedisClusterUtil.java:152) ~[classes!/:0.0.1-SNAPSHOT]
at com.hp.nova.service.impl.SectionServiceImpl.getsectionlistBymutipulCode(SectionServiceImpl.java:286) ~[classes!/:0.0.1-SNAPSHOT]
at com.hp.nova.service.impl.SectionServiceImpl$$FastClassBySpringCGLIB$$73b5d4bc.invoke(<generated>) ~[classes!/:0.0.1-SNAPSHOT]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.3.20.RELEASE.jar!/:4.3.20.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:667) ~[spring-aop-4.3.20.RELEASE.jar!/:4.3.20.RELEASE]
at com.hp.nova.service.impl.SectionServiceImpl$$EnhancerBySpringCGLIB$$c52c4ab.getsectionlistBymutipulCode(<generated>) ~[classes!/:0.0.1-SNAPSHOT]
at com.hp.nova.service.impl.PlanServiceImpl.getChildListDetailByPlan(PlanServiceImpl.java:1051) ~[classes!/:0.0.1-SNAPSHOT]
at com.hp.nova.controller.PlanController.getMultiPlan(PlanController.java:107) ~[classes!/:0.0.1-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]


首先查看redis.conf中的maxclients大小为默认值,默认为10000。

通过lsof -p 17242 |wc -l  查看redis的连接数,发现连接数量超过10300. 所以出错。

依次重启6个节点的redis进程,再用lsof -p pid |wc -l 命令查看redis进程发现连接数变为60

 

但是最开始没有重启java微服务的进程,所以java里面还会报错,重启java微服务进程后就好了

2019-03-26 17:13:42.126 http-nio-9090-exec-1 ERROR org.apache.juli.logging.DirectJDKLog.log(DirectJDKLog.java:182) - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool] with root cause java.util.NoSuchElementException: Pool exhausted
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:452) ~[commons-pool2-2.4.3.jar!/:2.4.3]
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361) ~[commons-pool2-2.4.3.jar!/:2.4.3]

 

 

 

但是用了半天系统后lsof -p pid |wc -l发现有几个redis进程的连接数又到了5000多,我看了一下我们的yml配置如下,暂时不知道是因为每个微服务 maxTotal: 5000 #最大连接数   的原因导致真的redis连接数不够还是因为java或者.net core的程序问题导致没有释放redis cluster的连接数。先把每个redis的maxclients 设为50000观察几天再说

 

redis:
nodes: 15.31.213.3:7001,15.31.213.3:7002,15.31.213.239:7003,15.31.213.239:7004,15.31.213.183:7005,15.31.213.183:7006
commandTimeout: 10000 #redis操作的超时时间
maxTotal: 5000 #最大连接数
maxIdle: 30 #最大空闲连接数
minIdle: 5 #最小空闲连接数
maxWait: 3000 #获取连接最大等待时间 ms #default -1
pwd:

 

 

晚上又用lsof -p pid |wc -l  查看redis cluster的各个节点的连接数,发现每过几秒就增加5左右的连接数,最近加了定时器会30秒调用自己写的mget方法,所以仔细检查了这个方法

用到了Pipeline  但是没有close jedis的资源,如下

 

以前有问题代码

// 执行
        for (Entry<JedisPool, List<String>> entry : jedisPoolMap.entrySet()) {
            try {
                currentJedisPool = entry.getKey();
                keyList = entry.getValue();
                // 获取pipeline
                currentPipeline = currentJedisPool.getResource().pipelined();
                for (String key : keyList) {
                    currentPipeline.get(key);
                }
                // 从pipeline中获取结果
                res = currentPipeline.syncAndReturnAll();
                currentPipeline.close();
                for (int i = 0; i < keyList.size(); i++) {
                    resMap.put(keyList.get(i), res.get(i) == null ? null : res.get(i).toString());
                }
            } catch (Exception e) {
                logger.error("", e);
                return new HashMap<>();
            }

        }

 

 

修改后没问题的代码

// 执行
        for (Entry<JedisPool, List<String>> entry : jedisPoolMap.entrySet()) {
            Jedis jedis=null;
            Pipeline currentPipeline = null;
            try {
                currentJedisPool = entry.getKey();
                keyList = entry.getValue();
                // 获取pipeline
                jedis=currentJedisPool.getResource();
                currentPipeline = jedis.pipelined();
                for (String key : keyList) {
                    currentPipeline.get(key);
                }
                // 从pipeline中获取结果
                res = currentPipeline.syncAndReturnAll();
                
                for (int i = 0; i < keyList.size(); i++) {
                    resMap.put(keyList.get(i), res.get(i) == null ? null : res.get(i).toString());
                }
            } catch (Exception e) {
                logger.error("", e);
                return new HashMap<>();
            }
            finally
            {
                if(currentPipeline!=null)
                {
                    try {
                        currentPipeline.close();
                    } catch (IOException e) {
                        // TODO Auto-generated catch block
                        logger.error("",e);
                    }
                }
                if(jedis!=null)
                {
                    jedis.close();
                }
            }

        }

 

所以必须要把从JedisPool获取的资源close掉,不然就会连接数一直增长

jedis.close();

 

重新部署后,发现redis的连接数不会增长了降到了100左右,问题解决

 

 

 

 

首先查看redis.conf中的maxclients大小为默认值,默认为10000。

通过lsof -p pid |wc -l ,发现连接数量超过10500. 出错。

解决方法1:

 1. 增加redis的最大连接数:修改redis.conf文件的maxclient ,修改到50000.

2.  一般redis的连接使用完毕之后会释放,如果要用lsof命令发现链接始终没有减少,则检查代码,看下使用redis的代码部分是否执行类似close()的函数。将资源进行释放。

通过上述两个方法基本能解决这个问题。

转载于:https://www.cnblogs.com/xiaohanlin/p/10610161.html

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值