java 第一笔 数据库操作报错 分析与解决:获取不到连接 Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out.
一、问题起源
应用信息:jdk1.6;数据库:DB2;连接池:C3P0
现象:java应用迁移到云环境,应用启动切量后,第一笔数据库操作时报错:获取不到连接 Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out,后续的数据库操作正常。
报错堆栈如下:
com.shines.common.persist.ibatis.DataAccessException: Error code '0'
at com.shines.common.persist.ibatis.exception.SQLErrorCodeSQLExceptionTranslator.translate(SQLErrorCodeSQLExceptionTranslator.java:99)
Caused by: java.sql.SQLException: An attempt by a client to checkout a Connection has timed out.
at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:118) ~[mchange-commons-java-0.2.11.jar:0.2.11]
at com.mchange.v2.sql.SqlUtils.toSQLException(SqlUtils.java:77) ~[mchange-commons-java-0.2.11.jar:0.2.11]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:690) ~[c3p0-0.9.5.2.jar:0.9.5.2]
at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140) ~[c3p0-0.9.5.2.jar:0.9.5.2]
at com.ibatis.sqlmap.engine.transaction.jdbc.JdbcTransaction.init(JdbcTransaction.java:53) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:102) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryForObject(GeneralStatement.java:104) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:566) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:541) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryForObject(SqlMapSessionImpl.java:106) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.impl.SqlMapClientImpl.queryForObject(SqlMapClientImpl.java:83) ~[ibatis-2.3.0.677-6.jar:?]
... 14 more
-- timeout at awaitAvailable()
at com.mchange.v2.resourcepool.BasicResourcePool.awaitAvailable(BasicResourcePool.java:1467) ~[c3p0-0.9.5.2.jar:0.9.5.2]
at com.mchange.v2.resourcepool.BasicResourcePool.prelimCheckoutResource(BasicResourcePool.java:644) ~[c3p0-0.9.5.2.jar:0.9.5.2]
at com.mchange.v2.resourcepool.BasicResourcePool.checkoutResource(BasicResourcePool.java:554) ~[c3p0-0.9.5.2.jar:0.9.5.2]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutAndMarkConnectionInUse(C3P0PooledConnectionPool.java:758) ~[c3p0-0.9.5.2.jar:0.9.5.2]
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool.checkoutPooledConnection(C3P0PooledConnectionPool.java:685) ~[c3p0-0.9.5.2.jar:0.9.5.2]
at com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource.getConnection(AbstractPoolBackedDataSource.java:140) ~[c3p0-0.9.5.2.jar:0.9.5.2]
at com.ibatis.sqlmap.engine.transaction.jdbc.JdbcTransaction.init(JdbcTransaction.java:53) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.transaction.jdbc.JdbcTransaction.getConnection(JdbcTransaction.java:102) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryForObject(GeneralStatement.java:104) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:566) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryForObject(SqlMapExecutorDelegate.java:541) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryForObject(SqlMapSessionImpl.java:106) ~[ibatis-2.3.0.677-6.jar:?]
at com.ibatis.sqlmap.engine.impl.SqlMapClientImpl.queryForObject(SqlMapClientImpl.java:83) ~[ibatis-2.3.0.677-6.jar:?]
... 14 more
t.
二、问题分析与解决
分析
1、首先,由于是第一笔报错,排除了是C3P0配置中连接数不够的问题(连接数配置为10)
2、因此,怀疑是C3P0连接池的参数配置超时时间问题,此次从两个方面进行排查
(1)修改C3P0配置:timeout(原来配置为10s),改为30s,修改后,数据库操作正常。
(2)不修改C3P0配置:timeout,还是为10s,打印线程栈信息(jstack),发现有3个线程均在执行getHostByAddr方法。
具体信息如下:
"C3P0PooledConnectionPoolManager[identityToken->1bqvtpv9zqazxey1j4yfky|59b3ef14]-HelperThread-#2" daemon prio=10 tid=0x00007fa73800f1a0 nid=0x7d39 runnable [0x00007fa6ab1ef000]
java.lang.Thread.State: RUNNABLE
at java.net.Inet4AddressImpl.getHostByAddr(Native Method)
at java.net.InetAddress$1.getHostByAddr(InetAddress.java:880)
at java.net.InetAddress.getHostFromNameService(InetAddress.java:559)
at java.net.InetAddress.getHostName(InetAddress.java:502)
at java.net.InetAddress.getHostName(InetAddress.java:474)
at com.ibm.db2.jcc.t4.b.a(b.java:4997)
at com.ibm.db2.jcc.t4.b.a(b.java:1738)
at com.ibm.db2.jcc.am.jb.a(jb.java:544)
at com.ibm.db2.jcc.am.jb.<init>(jb.java:500)
at com.ibm.db2.jcc.t4.b.<init>(b.java:308)
at com.ibm.db2.jcc.DB2SimpleDataSource.getConnection(DB2SimpleDataSource.java:214)
at com.ibm.db2.jcc.DB2Driver.connect(DB2Driver.java:224)
at com.mchange.v2.c3p0.DriverManagerDataSource.getConnection(DriverManagerDataSource.java:175)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:220)
at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:206)
at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:203)
at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1138)
at com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1125)
at com.mchange.v2.resourcepool.BasicResourcePool.access$700(BasicResourcePool.java:44)
at com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1870)
at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:696)
3、到此,分析出是由于db2 jcc jar(版本:JDBC 3.0 driver (db2jcc.jar)下的 3.57.110)包会在获取数据库连接时调用jre的rt.jar中的getHostName方法,db2 jcc部分代码如下
try
{
if ((this.nd != null) && (this.nd.length() > 0) && (!this.nd.equalsIgnoreCase("localhost"))) {
localObject1 = InetAddress.getByName(this.nd);
localObject2 = InetAddress.getAllByName(((InetAddress)localObject1).getHostName());
if (localObject2.length >= 2) {
this.pd = new DB2ClientRerouteServerList();
this.nd = localObject2[0].getHostAddress();
this.pd.setPrimaryServerName(this.nd);
this.pd.setPrimaryPortNumber(this.od);
i = localObject2.length - 1;
String[] arrayOfString = new String[i];
int[] arrayOfInt = new int[i];
for (int j = 0; j < i; j++) {
arrayOfString[j] = localObject2[(j + 1)].getHostAddress();
arrayOfInt[j] = this.od;
}
this.hd = arrayOfString;
this.id = arrayOfInt;
this.pd.setAlternateServerName(arrayOfString);
this.pd.setAlternatePortNumber(arrayOfInt);
}
}
}
catch (UnknownHostException localUnknownHostException) {
a(bd.b(this, paramaf, oc.DNS_FAILURES, localUnknownHostException.getMessage(), "11903", localUnknownHostException));
}
}
paramDB2BaseDataSource.setClientRerouteServerList(this.pd);
}
}
经过修改jre的rt.jar包,发现 localObject1 = InetAddress.getByName(this.nd);处传入的参数this.nd为数据库的地址。
此处耗时较长原因为获取java的获取host方法需要使用dns到外网获取,而我们的应用所在机器是连不到外网的。
解决
最终解决方法为将应用所在机器的dns关闭,具体方法为:
删除linux服务器/etc/resolv.conf文件。
三、总结
此问题的解决方法
1、若应用所在服务器未开通互联网访问,可以关闭dns配置,将/etc/resolv.conf改名为/etc/resolv.conf.bak。
2、若应用所在服务器已开通互联网访问,可以对数据库作hostbind操作,在/etc/hosts文件中增加
数据库IP 数据库主机名称
3、若1,2不生效,修改数据库连接池的连接超时时间。(对于C3P0连接池来说,是参数:timeout)
4、未经验证,但是如果1,2仍然不可行的话,可以考虑:升级db2 jcc的jar包,不知道高级版本是否解决了获取host的问题;或者考虑升级jdk,确认高版本jdk是否解决了获取host慢的问题
备注:
- 之前遇到过zookeeper(zookeeper客户端使用版本:3.3.3)获取连接失败的问题,设置的获取连接超时时间为5s,报错获取连接失败。经过分析是zookeeper的创建连接方法也是调用了net的gethost()方法,而获取hostname较慢,将超时时间改为10s可以解决问题,但是修改并不彻底。
- 经过查看zookeeper,发现zookeeper在版本3.4.6之后已经修复了获取host这块的问题,因此我们将zookeeper客户端升级为3.4.11,彻底解决了此问题。