国庆前公司进行了一次迭代上线,前几天正常运行,没发现问题,后面偶发性的出现了数据库超时的异常信息。异常信息记录如下:
ERROR com.alibaba.druid.pool.DruidDataSource - create connection SQLException, url: jdbc:mysql://xxx.xxx.xxx.xxx:3306/openplatform?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&serverTimezone=PRC, errorCode 0, state 08001
java.sql.SQLNonTransientConnectionException: Could not create connection to database server. Attempted reconnect 3 times. Giving up.
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63)
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73)
at com.mysql.cj.jdbc.ConnectionImpl.connectWithRetries(ConnectionImpl.java:905)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:830)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:455)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:240)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:199)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1596)
at com.alibaba.druid.pool.DruidAbstractDataSource.createPhysicalConnection(DruidAbstractDataSource.java:1662)
at com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread.run(DruidDataSource.java:2601)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
at com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:91)
at com.mysql.cj.NativeSession.connect(NativeSession.java:152)
at com.mysql.cj.jdbc.ConnectionImpl.connectWithRetries(ConnectionImpl.java:849)
... 7 common frames omitted
Caused by: java.net.ConnectException: Connection timed out (Connection timed out)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at com.mysql.cj.protocol.StandardSocketFactory.connect(StandardSocketFactory.java:155)
at com.mysql.cj.protocol.a.NativeSocketConnection.connect(NativeSocketConnection.java:65)
... 9 common frames omitted
刚开始偶发性比较少,不容易复现,因为数据库配置没变过,以为是数据库性能问题,和运维沟通过让他们继续观察后就没留意,4号左右开始偶发的次数比较频繁。
首先排查是不是sql的问题,sql如下:
SELECT
gf_cust_id AS cust_id,
cu_type,
product_code,
is_active,
is_stock,
data_content
FROM t_gf_cust_re_info
WHERE gf_cust_id=#{_parameter}
ORDER BY import_date DESC
LIMIT 1
很简单的单表查询,排除慢sql。
数据库性能方面:通过运维的排查、客户反映问题时数据库的负载和并发都在正常水平,所以排除了数据库性能方面的原因。
配置方面:判断druid连接不上数据库,有可能是连接池里面的连接失效了,在结合druid的配置参数,发现druid默认是不会将连接池中失效的连接进行重连或者从连接池中删除。如果应用程序正好从连接池里面拿到了一个失效的连接进行数据库访问,则会发生数据库连接不上的错误。
druid默认配置如下:
问题原因找到后,修改配置重新打包,上线,超时问题没有复现。
目前数据库配置如下:
spring:
datasource:
type: com.alibaba.druid.pool.DruidDataSource
url: jdbc:mysql://xxx.xxx.xxx.xxx:3306/openplatform?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&serverTimezone=PRC
username: xxx
password: xxx
driver-class-name: com.mysql.cj.jdbc.Driver
initialSize: 10
minIdle: 10
maxActive: 40
maxWait: 60000
test-on-borrow: true
test-while-idle: true
问题反思:
- 为什么druid默认配置会有这种设计思路,是出于性能还是什么问题考虑?
- druid的连接池有这种问题,那么如果不用druid连接池,springboot集成mybatis默认用哪个连接池?配置会不会有这个问题?
这两个问题留给我也留给大家。