问题描述:
应用一直正常跑着,早上登陆,突然提示‘Could not roll back JDBC transaction;nested exception is java.sql.SQLRecoverableException:无法从套接字读取更多的数据’;
环境:
server2008 64位 R2+oracle11gR2+tomcat6.0.32
分析:
第一反应是数据库挂掉了,查看oracle的alert日志,有如下字样:
Errors in file d:\app\administrator\diag\rdbms\orclsc\orclsc\trace\orclsc_ora_2276.trc (incident=23417):
ORA-03137: TTC 鍗忚鍐呴儴閿欒: [12333] [5] [54] [53] [] [] [] []
追踪看orclsc_ora_2276.trc的内容大体如下:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Windows NT Version V6.1
CPU : 24 - type 8664, 12 Physical Cores
Process Affinity : 0x0x0000000000000000
Memory (Avail/Total): Ph:22799M/32691M, Ph+PgF:55449M/65381M
Instance name: orclsc
Redo thread mounted by this instance: 1
Oracle process number: 77
Windows thread id: 2276, image: ORACLE.EXE (SHAD)
*** 2014-10-09 01:12:36.227
*** SESSION ID:(59.535) 2014-10-09 01:12:36.227
*** CLIENT ID:() 2014-10-09 01:12:36.227
*** SERVICE NAME:(SYS$USERS) 2014-10-09 01:12:36.227
*** MODULE NAME:(JDBC Thin Client) 2014-10-09 01:12:36.227
*** ACTION NAME:() 2014-10-09 01:12:36.227
--- PROTOCOL VIOLATION DETECTED ---
----- Dump Cursor sql_id=7209sntbk4ydy xsc=0x000000003EE86610 cur=0x000000003EE61BF8 -----
LibraryHandle: Address=a9c4b3d0 Hash=572279be LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=select sum(1) packetNum,sum(grossWeigh) grossWeigh,sum(netWeigh) netWeigh,
sum(grossWeigh-netWeigh) tareWeigh,avg(moistureRate) avgmoistureRate
from cotton_lint_process process,cotton_lint_inspect insp
where process.barcode=insp.barcode and process.enterpriseCode= :1
and to_date(testTime,'yyyy-mm-dd') >= to_date(:2 ,'yyyy-mm-dd')
and to_date(testTime,'yyyy-mm-dd') <= to_date(:3 ,'yyyy-mm-dd')
*** 2014-10-09 01:12:36.907
Incident 23417 created, dump file: d:\app\administrator\diag\rdbms\orclsc\orclsc\incident\incdir_23417\orclsc_ora_2276_i23417.trc
ORA-03137: TTC 协议内部错误: [12333] [5] [54] [53] [] [] [] []
基本确定是执行一条sql的时候,发生了内部协议错误,网上百度一下解决方案。
http://www.oracleonlinux.cn/2013/09/ora-17410_and_ora-03137/
懒得敲了,直接引用网址了
主要的内容截取如下:
① 部分版本的JDBC驱动会引起该错误:
Bug 9445675 NO MORE DATA TO READ FROM SOCKET WHEN USING END-TO-END METRICS
This bug does affect the JDBC driver. This bug may be the cause when all of the following conditions are met:
- You are using the 10.1.x.x or the 11.2.0.1 JDBC driver; the bug does not affect 10.2.x.x, or 11.1.x.x versions of the driver, nor versions 11.2.0.2 or above
- You are using end-to-end metrics in your Java code
- The server side ORA-3137 [12333] error is accompanied by the client side Java exception "No more data to read from socket"
This bug is fixed in the 11.2.0.2 version of the JDBC driver and above. It is discussed in the following notes:
Note 9445675.8 Bug 9445675 - "No more data" / ORA-3137 using end to end metrics with JDBC Thin
Note 1081275.1 "java.sql.SQLRecoverableException: No more data to read from socket" is Thrown When End-to-end Metrics is Used
② 数据库自身的BUG也会导致该错误:
Unpublished Bug:9703463 - ORA-3137 [12333] or ORA-600 [kpobav-1] When Using Bind Peeking
This bug affects versions 11.1.0.6, 11.1.0.7, and 11.2.0.1 of the RDBMS. It is fixed in version 11.2.0.2 of the database.
It can also occur intermittently; similarly to unpublished Bug:8625762, this is a bind peeking bug.
==================================================
Unpublished Bug 9373370 DATA BASE RETURNS WRONG CURSORID WHEN THERE IS AN ORA-01013
This bug affects the 10.2, 11.1, 11.2.0.1, and 11.2.0.2 databases. It is discussed in the following note:
Note 9373370.8 Bug 9373370 - The wrong cursor may be executed by JDBC thin following a query timeout / ORA-3137 [12333]
While the bug primarily manifests in ORA-1006 or ORA-1008 errors, the problem may also result in ORA-600 [12333] or ORA-3137 [12333] errors appearing on the server side.
判断本案例的错误应该是 Bug:9703463导致的,从trace文件中看到该SQL语句的确用到了绑定变量,且数据库版本是11.2.0.1.0。
解决思路:
① 可以通过更换不同版本的JDBC驱动来避免该错误,也说明为什么同事在另外一套环境下,通过Weblogic的JDBC来访问数据库时,则不会遇到该错误;
② 给数据库打patch,初步认为可以通过打Patch:9703463 来解决;
③ 通过修改数据库参数来规避该错误:
修改之后,同事用之前的Tomcat那个版本的驱动来重新访问数据库时,则不再报错;
④ 直接升级数据库版本至11.2.0.3.0,通过导出导入的方式将刚项目组下的schema数据复制一份到一套11.2.0.3.0的库上,重新使用Tomcat那个版本的驱动来重新访问数据库时,亦不再报错。
以上是引用别人分析的原因,我的跟他的类似,应该也是Bug:9703463引起的,选择解决的方法:
打补丁,没有试过,暂时排除;
修改数据库参数,不知道修改成什么样子的,排除;
升级数据库,风险太大,排除;
网上还有另外一个方法:_optim_peek_user_binds=false
查了一下这个隐藏参数的作用,类似于预编译的功能,就是这个语句已经执行过一次了,下次在遇到,可以直接不用分析了,直接拿来用,节省资源的,参考:
http://www.2cto.com/database/201203/121737.html
考虑到如果把这个变量设置为false会影响性能,也排除。
剩下一个升级jdbc驱动(目前是11.2.0.1.0,升级到最新的11.2.0.4.0),如果这个不行,考虑打补丁的方案(倾向于直接打补丁,但没有经验所以先试验升级驱动)。
暂时先记录到这里