记一次HikariPool-1 - Connection is not available, request timed out after 60000ms异常排查过程

最新推荐文章于 2025-02-20 11:13:09 发布

盡盡

最新推荐文章于 2025-02-20 11:13:09 发布

阅读量3.9w

点赞数 30

分类专栏：报错中心文章标签：杂七杂八

本文链接：https://blog.csdn.net/qq_22172133/article/details/103454863

版权

报错中心专栏收录该内容

2 篇文章

订阅专栏

前言

前几天把spring boot从1.5升级到2.X之后，项目出了一堆问题，然后调了几天bug。最后好不容易能正常打包运行了，周五晚上在家里又收到信息，服务器挂了！！！好吧，看下日志：

org.springframework.dao.DataAccessResourceFailureException: Unable to acquire JDBC Connection; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
	at org.springframework.orm.jpa.vendor.HibernateJpaDialect.convertHibernateAccessException(HibernateJpaDialect.java:236) ~[spring-orm-5.0.12.RELEASE.jar!/:5.0.12.RELEASE]
	at org.springframework.orm.jpa.vendor.HibernateJpaDialect.translateExceptionIfPossible(HibernateJpaDialect.java:223) ~[spring-orm-5.0.12.RELEASE.jar!/:5.0.12.RELEASE]
	......
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_232]
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48) ~[hibernate-core-5.2.17.Final.jar!/:5.2.17.Final]
	......
	at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:139) ~[spring-tx-5.0.12.RELEASE.jar!/:5.0.12.RELEASE]
	... 133 common frames omitted
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 60000ms.
	......
	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:669) ~[HikariCP-2.7.9.jar!/:na]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:183) ~[HikariCP-2.7.9.jar!/:na]

一堆这个东西，什么意思不用说了吧，数据库连接超时。但是不知道为什么会出现这个问题，然后网上找找看，一堆让你改参数的，虽然感觉没用，但是死马当做活马医，反正没思路

spring:
  datasource:
    hikari:
      connection-timeout: 60000
      minimum-idle: 5
      maximum-pool-size: 10
      idle-timeout: 300000
      max-lifetime: 1200000
      auto-commit: true
      connection-test-query: SELECT 1
      validation-timeout: 3000
      read-only: false
      login-timeout: 5

放上去看看有没有效果吧，当然了，当时看不出来什么效果，第二天再看吧，结果不出意外，还是报这个错，然后网上再查好像还是只能找到让你改上面这些参数的，调大点啊，小点啊什么的，好吧，不搜了，自己看。

分析

既然是获取不到连接，无非就是数据库自身的问题，网络问题，连接池关闭，或者连接池满了（连接没释放）。这里先排除数据库和网络等问题，其实我一直猜是哪里使用的连接没有释放导致的问题，但是没法定位。先来看下数据库连接状态

没看出来什么问题，接着来吧。

上面已经说了，最近刚把spring boot升级到2.X，问题也是升级后才出现的，那么版本升级后变过哪些东西呢？数据库连接池，也就是hikari，spring boot 2默认采用了hikari连接池，这个和之前不一样，那会不会是连接池的问题呢？hikari好像是号称最快的连接池，本来是不想换的，这里没办法，换吧，这里又换成了之前的druid

<dependency>
     <groupId>com.alibaba</groupId>
     <artifactId>druid-spring-boot-starter</artifactId>
     <version>1.1.10</version>
</dependency>

配置改一改

spring:
  datasource:
    type: com.alibaba.druid.pool.DruidDataSource
    druid:
      initial-size: 1
      min-idle: 5
      max-active: 10
      max-wait: 60000
      test-while-idle: true
      time-between-eviction-runs-millis: 60000
      min-evictable-idle-time-millis: 30000
      validation-query: SELECT 'x'
      test-on-borrow: false
      test-on-return: false
      pool-prepared-statements: true
      max-pool-prepared-statement-per-connection-size: 20

然后呢？放上去继续等吧，等错，该干嘛干嘛去。果然还是报错了，意料之中，错误和上面差不多，连接超时。

现在只能从自身找问题了，那到底是谁占着连接不放呢？druid配置中有如下几个参数

# 连接在多少秒后应该考虑弃用
spring.datasource.druid.remove-abandoned-timeout-millis: 180
# 被弃用的连接在到达弃用超时后是否应该被移除
spring.datasource.druid.remove-abandoned: true
# 程序在回收连接的同时会打印日志
spring.datasource.druid.log-abandoned: true

注意：removeAbandoned是连接池的高级功能，理论上这中配置不应该出现在实际的生产环境，因为有时应用程序执行长事务，可能这种情况下，会被连接池误回收。我这里主要是为了定位连接泄漏的具体代码位置，生产环境中连接的关闭应该靠程序自己保证。

加上上面参数后，再来看看错误日志：

2019-12-07 18:59:19.269 ERROR 4189 --- [troy-1832973792] com.alibaba.druid.pool.DruidDataSource   : abandon connection, owner thread: http-nio-8999-exec-8, connected at : 1575716348048, open stackTrace
	at java.lang.Thread.getStackTrace(Thread.java:1559)
	at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1313)

	at com.sun.proxy.$Proxy168.findAll(Unknown Source)
	at com.mq.portal.service.impl.ArticleServiceImpl.findBriefShow(ArticleServiceImpl.java:198)
	at com.mq.portal.service.impl.ArticleServiceImpl$$FastClassBySpringCGLIB$$5901d7ef.invoke(<generated>)

	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)
ownerThread current state is WAITING, current stackTrace

	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
	at com.mq.portal.stats.StatsQueue.put(StatsQueue.java:22)
	at com.mq.portal.interceptor.StatsInterceptor.postHandle(StatsInterceptor.java:47)

	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)

2019-12-07 18:59:19.271 ERROR 4189 --- [troy-1832973792] com.alibaba.druid.pool.DruidDataSource   : abandon connection, owner thread: http-nio-8999-exec-11, connected at : 1575716357218, open stackTrace
	at java.lang.Thread.getStackTrace(Thread.java:1559)

	at com.sun.proxy.$Proxy161.findByDescription(Unknown Source)
	at com.mq.portal.service.impl.DictionaryServiceImpl.portalList(DictionaryServiceImpl.java:76)
	at com.mq.portal.service.impl.DictionaryServiceImpl$$FastClassBySpringCGLIB$$fb3b5daf.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)

	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)
ownerThread current state is RUNNABLE, current stackTrace
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)

	at com.mq.portal.stats.StatsQueue.put(StatsQueue.java:22)
	at com.mq.portal.interceptor.StatsInterceptor.postHandle(StatsInterceptor.java:47)

看ownerThread current state is WAITING, current stackTrace后面的，出现了我的东西（StatsQueue.put），这是一个内存队列，大致意思是上面的操作放弃了连接，原因是线程都在队列的put操作上等待（估计是整个服务器线程都用来在这卡着了，数据库连接池线程应该也没法用了）。

终于找到病根了，既然是队列put一直在等待，那么肯定是take有问题了，看消费线程

public class StatsProcessorThread implements Runnable {
    @Override
    public void run() {
        StatsQueue queue = StatsQueue.getInstance();

        RedisUtil redisUtil = (RedisUtil) SpringContextUtil
                .getApplicationContext().getBean("redisUtil");

        while (true) {
            try {

                StatsModel model = queue.take();
                //......

                }

            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

既然加了异常处理，那么没道理因为消费异常就导致take不出来啊？？？没走到take这一步来？前面就注入了一下redisUtil，有问题那也就是这一块有问题了，拿spring上下文出错了（为啥日志里面没有呢，不清楚）。最后在项目的初始化部分发现了问题

public class InitListener implements ServletContextListener {


    @Override
    public void contextInitialized(ServletContextEvent sce) {

        //初始化工作线程池和内存队列
        SolrUpdProcessorThreadPool.init();
        new Thread(new StatsProcessorThread()).start();
        new Thread(new TokenThread()).start();

        //获取spring上下文
        ServletContext sc = sce.getServletContext();
        ApplicationContext context = WebApplicationContextUtils.getWebApplicationContext(sc);
        SpringContextUtil.setApplicationContext(context);
    }
}

启动上面那个线程的时候spring 上下文还没获取呢，那么问题来了，谁动我代码了？这个可不是一个新功能啊！然后我到git上翻了一下之前的代码，神奇的发现之前好像也是这样写的。。。或许是指令重排序问题了，至于为什么spring boot升级后这块的指令排序就不一样了，我就不清楚了，总不至于再把项目改到之前版本试试吧。

解决方案

问题分析出来了，然后怎么解决呢？简单啊，上面代码那块的顺序调整一下就ok了啊

public class InitListener implements ServletContextListener {


    @Override
    public void contextInitialized(ServletContextEvent sce) {

        //操作1：获取spring上下文
        ServletContext sc = sce.getServletContext();
        ApplicationContext context = WebApplicationContextUtils.getWebApplicationContext(sc);
        SpringContextUtil.setApplicationContext(context);

        //操作2：初始化工作线程池和内存队列
        SolrUpdProcessorThreadPool.init();
        new Thread(new StatsProcessorThread()).start();
        new Thread(new TokenThread()).start();

    }
}

这次放上去后，果然好了，至少目前为止还没报错。但是如果真是指令重排序问题，以后可能还会出现这个问题，因为这里其实是没法确定操作1和操作2的执行顺序（根据happens-before规则）。怎么处理这块呢？可以考虑采用volatile，其实也就是考虑volatile的可见性：对一个volatile变量的读，总是能看到（任意线程）对这个volatile变量最后的写入。

说白了，这里涉及到的也就两个线程，InitListener和StatsProcessorThread，它们共用的变量是SpringContextUtil里面的applicationContext。其中InitListener写，StatsProcessorThread读，这样懂了吧，我们只要applicationContext用volatile修饰一下就可以了。

public class SpringContextUtil {
    private static volatile ApplicationContext applicationContext;

    public static ApplicationContext getApplicationContext() {
        return applicationContext;
    }

    public static void setApplicationContext(ApplicationContext applicationContext) {
        SpringContextUtil.applicationContext = applicationContext;
    }
}