系统这几天突然出现大量com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction异常,lock导致大量异常事务不能提交,通过查询mysql INNODB_TRX表查询系统运行一段时间就会堆积大量事务,还有大量的lock wait状态的事务,根据此表的时间去系统日志表里查询那个接口,发现接口方法执行后没有退出,一开始怀疑是重复提交导致的同时操作同一条记录导致的行锁阻塞,检查发现方法逻辑简单,也没有慢sql,mysql也有lock_wait time设置会自动回滚,然后代码里加日志,发布测试,里面有一个信鸽的推送方法,执行到这快后发现不向下执行,怀疑在这块阻塞了,打印线程堆栈信息jstack -l xxxx发现确实有信鸽的线程在runnable,隔几分钟后再次dump线程堆栈发现这个一直阻塞在网络io这块
http-nio-8080-exec-10" #99 daemon prio=5 os_prio=0 tid=0x00007f84b8013800 nid=0x7ecf runnable [0x00007f84620f4000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x00000000d81b8460> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
- locked <0x00000000d81b84b8> (a sun.net.www.protocol.http.HttpURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
- locked <0x00000000d81b84b8> (a sun.net.www.protocol.http.HttpURLConnection)
at com.tencent.xinge.XingeApp.callRestful(XingeApp.java:641)
at com.tencent.xinge.XingeApp.pushSingleDevice(XingeApp.java:216)
然后注释掉信鸽推送的代码,发布测试,经过一段时间测试,mysql INNODB_TRX不在堆积事务,完美找到病因,剩下的就好解决了
总结:jstack 工具排除一些隐藏的问题确实挺好用,要多用jdk自带工具排查问题,自己加log太累,代码在调用方法的时候也没有管返回结果,日志就很难发现,以后代码调用任何方法都要处理返回结果,不能偷懒