背景:
在执行双机房部署的时候,因为应用长时间未访问数据库,导致后面访问的数据库的线程都被挂起。
现象分析:
- "Thread-74" daemon prio=10 tid=0x00007f1840044000 nid=0x387b runnable [0x00007f18bdb27000]
- java.lang.Thread.State: RUNNABLE
- at java.net.SocketInputStream.socketRead0(Native Method)
- at java.net.SocketInputStream.read(SocketInputStream.java:129)
- at oracle.net.ns.Packet.receive(Unknown Source)
- at oracle.net.ns.DataPacket.receive(Unknown Source)
- at oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
- at oracle.net.ns.NetInputStream.read(Unknown Source)
- at oracle.net.ns.NetInputStream.read(Unknown Source)
- at oracle.net.ns.NetInputStream.read(Unknown Source)
- at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1109)
- at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1080)
- at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:485)
- at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210)
- at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:804)
- at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1051)
- at oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.java:845)
- at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1156)
- at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1315)
- - locked <0x00000000be528248> (a oracle.jdbc.driver.T4CStatement)
- - locked <0x00000000c4d1b000> (a oracle.jdbc.driver.T4CConnection)
- at oracle.jdbc.driver.PhysicalConnection.doPingDatabase(PhysicalConnection.java:4614)
- at oracle.jdbc.driver.PhysicalConnection$1.run(PhysicalConnection.java:4590)
- at java.lang.Thread.run(Thread.java:662)
- Locked ownable synchronizers:
- DubboServerHandler-172.21.55.25:20883-thread-13" daemon prio=10 tid=0x00007f1828012800 nid=0x3867 in Object.wait() [0x00007f18be305000]
- java.lang.Thread.State: TIMED_WAITING (on object monitor)
- at java.lang.Object.wait(Native Method)
- - waiting on <0x00000000be51e0b0> (a java.lang.Thread)
- at java.lang.Thread.join(Thread.java:1194)
- - locked <0x00000000be51e0b0> (a java.lang.Thread)
- at oracle.jdbc.driver.PhysicalConnection.pingDatabase(PhysicalConnection.java:4596)
- at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at com.alibaba.druid.pool.vendor.OracleValidConnectionChecker.isValidConnection(OracleValidConnectionChecker.java:94)
- at com.alibaba.druid.pool.DruidAbstractDataSource.testConnectionInternal(DruidAbstractDataSource.java:1252)
- at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:954)
- at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:921)
- at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:911)
- at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:98)
- at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:113)
- at org.springframework.jdbc.datasource.TransactionAwareDataSourceProxy$TransactionAwareInvocationHandler.invoke(TransactionAwareDataSourceProxy.java:210)
- at $Proxy25.toString(Unknown Source)
- at java.lang.String.valueOf(String.java:2826)
- at java.lang.StringBuffer.append(StringBuffer.java:219)
- - locked <0x00000000be4cd210> (a java.lang.StringBuffer)
- at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:194)
- at org.springframework.orm.ibatis.SqlMapClientTemplate.executeWithListResult(SqlMapClientTemplate.java:249)
- at org.springframework.orm.ibatis.SqlMapClientTemplate.queryForList(SqlMapClientTemplate.java:296)
- at com.lianpay.bank.mng.dao.impl.BaseIbatisDAOImpl.queryAll(BaseIbatisDAOImpl.java:123)
- at com.lianpay.bank.mng.service.impl.QueryBankListServiceImpl.queryPayTypeList(QueryBankListServiceImpl.java:319)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
我们可以发现线程thread-13一直被阻塞在pingDatabase方法调用中,而线程Thread-74一直卡在socketRead0,初步分析是没有设置socket timeout时间。
解决:
参考文档:
http://agapple.iteye.com/blog/772507
http://www.importnew.com/2466.html
http://agapple.iteye.com/blog/1024508
查看上面两个参考文档之后,发现需要配置validationQueryTimeout,oracle.net.CONNECT_TIMEOUT=3000&oracle.net.READ_TIMEOUT=60000这几个参数。
源码分析:
问题1:我们设置了maxWait参数,为什么获取链接的时候未超时,报异常
问题2:我们设置了validationQuery,为什么没有定期发送ping包过去保活
一: maxWait参数主要作用在getConnectionInternal,而我们现在是卡在testConnectionInternal中,表示是已经获取到connection,但是卡在验证connection的有效性中。
- public DruidPooledConnection getConnectionDirect(long maxWaitMillis) throws SQLException {
- int notFullTimeoutRetryCnt = 0;
- for (;;) {
- // handle notFullTimeoutRetry
- DruidPooledConnection poolableConnection;
- try {
- poolableConnection = getConnectionInternal(maxWaitMillis);
- } catch (GetConnectionTimeoutException ex) {
- if (notFullTimeoutRetryCnt <= this.notFullTimeoutRetryCount && !isFull()) {
- notFullTimeoutRetryCnt++;
- if (LOG.isWarnEnabled()) {
- LOG.warn("not full timeout retry : " + notFullTimeoutRetryCnt);
- }
- continue;
- }
- throw ex;
- }
- if (isTestOnBorrow()) {
- boolean validate = testConnectionInternal(poolableConnection.getConnection());
- if (!validate) {
- if (LOG.isDebugEnabled()) {
- LOG.debug("skip not validate connection.");
- }
- Connection realConnection = poolableConnection.getConnection();
- discardConnection(realConnection);
- continue;
- }
2. validationQuery不是用来保证socket长链接不断开的。validationQuery的调用有两处,一是创建链接的时候,二是获取到链接的时候,会验证有效性。
- protected boolean testConnectionInternal(Connection conn) {
- String sqlFile = JdbcSqlStat.getContextSqlFile();
- String sqlName = JdbcSqlStat.getContextSqlName();
- if (sqlFile != null) {
- JdbcSqlStat.setContextSqlFile(null);
- }
- if (sqlName != null) {
- JdbcSqlStat.setContextSqlName(null);
- }
- try {
- if (validConnectionChecker != null) {
- return validConnectionChecker.isValidConnection(conn, validationQuery, validationQueryTimeout);
- }
- if (conn.isClosed()) {
- return false;
- }
- if (null == validationQuery) {
- return true;
- }
- Statement stmt = null;
- ResultSet rset = null;
- try {
- stmt = conn.createStatement();
- if (getValidationQueryTimeout() > 0) {
- stmt.setQueryTimeout(validationQueryTimeout);
- }
- rset = stmt.executeQuery(validationQuery);
- if (!rset.next()) {
- return false;
- }
- } finally {
- JdbcUtils.close(rset);
- JdbcUtils.close(stmt);
- }
- return true;
- } catch (Exception ex) {
- // skip
- return false;
- } finally {
- if (sqlFile != null) {
- JdbcSqlStat.setContextSqlFile(sqlFile);
- }
- if (sqlName != null) {
- JdbcSqlStat.setContextSqlName(sqlName);
- }
- }
- }
如果socket链接断开不可用,也不会导致应用异常,druid框架会不断的重试,抛弃不可用链接,直到拿到可用的链接
- public DruidPooledConnection getConnectionDirect(long maxWaitMillis) throws SQLException {
- int notFullTimeoutRetryCnt = 0;
- for (;;) {
- // handle notFullTimeoutRetry
- DruidPooledConnection poolableConnection;
- try {
- poolableConnection = getConnectionInternal(maxWaitMillis);
- } catch (GetConnectionTimeoutException ex) {
- if (notFullTimeoutRetryCnt <= this.notFullTimeoutRetryCount && !isFull()) {
- notFullTimeoutRetryCnt++;
- if (LOG.isWarnEnabled()) {
- LOG.warn("not full timeout retry : " + notFullTimeoutRetryCnt);
- }
- continue;
- }
- throw ex;
- }
- if (isTestOnBorrow()) {
- boolean validate = testConnectionInternal(poolableConnection.getConnection());
- if (!validate) {
- if (LOG.isDebugEnabled()) {
- LOG.debug("skip not validate connection.");
- }
- Connection realConnection = poolableConnection.getConnection();
- discardConnection(realConnection);
- continue;
- }
- } else {
- Connection realConnection = poolableConnection.getConnection();
- if (realConnection.isClosed()) {
- discardConnection(null); // 传入null,避免重复关闭
- continue;
- }
- if (isTestWhileIdle()) {
- final long currentTimeMillis = System.currentTimeMillis();
- final long lastActiveTimeMillis = poolableConnection.getConnectionHolder().getLastActiveTimeMillis();
- final long idleMillis = currentTimeMillis - lastActiveTimeMillis;
- long timeBetweenEvictionRunsMillis = this.getTimeBetweenEvictionRunsMillis();
- if (timeBetweenEvictionRunsMillis <= 0) {
- timeBetweenEvictionRunsMillis = DEFAULT_TIME_BETWEEN_EVICTION_RUNS_MILLIS;
- }
- if (idleMillis >= timeBetweenEvictionRunsMillis) {
- boolean validate = testConnectionInternal(poolableConnection.getConnection());
- if (!validate) {
- if (LOG.isDebugEnabled()) {
- LOG.debug("skip not validate connection.");
- }
- discardConnection(realConnection);
- continue;
- }
- }
- }
- }
- if (isRemoveAbandoned()) {
- StackTraceElement[] stackTrace = Thread.currentThread().getStackTrace();
- poolableConnection.setConnectStackTrace(stackTrace);
- poolableConnection.setConnectedTimeNano();
- poolableConnection.setTraceEnable(true);
- synchronized (activeConnections) {
- activeConnections.put(poolableConnection, PRESENT);
- }
- }
- if (!this.isDefaultAutoCommit()) {
- poolableConnection.setAutoCommit(false);
- }
- return poolableConnection;
- }
- }
ps:相关参数值可以根据实际场景进行设置
配置定义:
#initialSize
initialSize=10
#minPoolSize
minIdle=10
#maxPoolSize
maxActive=100
poolPreparedStatements=true
maxPoolPreparedStatementPerConnectionSize=100
#是否自动提交事务,如果应用中通过Spring或其他框架显示控制事务提交则可以设置为false,否则设置为true
defaultAutoCommit=false
#配置获取连接等待超时的时间
maxWait=10000
#配置间隔多久才进行一次检测,检测需要关闭的空闲连接,单位毫秒
timeBetweenEvictionRunsMillis=60000
#配置一个连接在池中最小生存的时间,单位毫秒
minEvictableIdleTimeMillis=300000
#连接检测属性配置
testWhileIdle=true
validationQuery=SELECT 'x' FROM DUAL
validationQueryTimeout=5000
connectionProperties=oracle.net.CONNECT_TIMEOUT=3000;oracle.net.READ_TIMEOUT=5000
removeAbandoned=true
removeAbandonedTimeoutMillis=300000
testOnBorrow=false
Spring配置文件:
<bean id="dataSource" class="com.alibaba.druid.pool.DruidDataSource" init-method="init" destroy-method="close">
<property name="driverClassName" value="${jdbc.driverClassName}" />
<property name="url" value="${jdbc.url}" />
<property name="username" value="${jdbc.username}" />
<property name="password" value="${jdbc.password}" />
<property name="initialSize" value="${initialSize}" />
<property name="minIdle" value="${minIdle}" />
<property name="maxActive" value="${maxActive}" />
<property name="defaultAutoCommit" value="${defaultAutoCommit}" />
<property name="validationQuery" value="${validationQuery}" />
<property name="testWhileIdle" value="${testWhileIdle}" />
<property name="testOnBorrow" value="${testOnBorrow}" />
<property name="poolPreparedStatements" value="${poolPreparedStatements}" />
<property name="maxPoolPreparedStatementPerConnectionSize" value="${maxPoolPreparedStatementPerConnectionSize}" />
<property name="maxWait" value="${maxWait}" />
<property name="timeBetweenEvictionRunsMillis" value="${timeBetweenEvictionRunsMillis}" />
<property name="minEvictableIdleTimeMillis" value="${minEvictableIdleTimeMillis}" />
<property name="validationQueryTimeout" value="${validationQueryTimeout}" />
<property name="connectionProperties" value="${connectionProperties}" />
<property name="removeAbandoned" value="${removeAbandoned}" />
<property name="removeAbandonedTimeoutMillis" value="${removeAbandonedTimeoutMillis}" />
</bean>