在某个客户环境上出现性能问题。客户反馈在修改某个对象的属性时,一直在等待,然后就超时了。分析日志没发现什么明显的错误,怀疑是线程间资源占用、死锁之类引起的。拿了一些thread dump,发现了一些可疑的信息。
1. 线程 "t1[4]-28" 准备去修改该对象的属性,这时候在等待某个资源, "- waiting to lock <0x00000005406a1d68> (a xx.xx.xx.KKKK)"
"t1[4]-28" #3025 prio=5 os_prio=0 tid=0x00007fd1f4779000 nid=0x1b64 waiting for monitor entry [0x00007fd1b14e3000]
java.lang.Thread.State: BLOCKED (on object monitor)
at xx.xx.xx.XXXX(XXXX.java:694)
- waiting to lock <0x00000005406a1d68> (a xx.xx.xx.KKKK)
at xx.xx.xx.XXXX(XXXX.java:32)
at xx.xx.xx.XXXX(XXXX.java:64)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at xx.xx.xx.XXXX(XXXX.java:149)
at com.sun.proxy.$Proxy27.xxxx(Unknown Source)
at ....
at ....
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x0000000549968800> (a java.util.concurrent.ThreadPoolExecutor$Worker)
2. 等待的资源是0x00000005406a1d68,根据0x00000005406a1d68搜索,发现是 "t2[4]-20" 在占用着该资源。
"t2[4]-20" #3015 prio=5 os_prio=0 tid=0x00007fd1f5470800 nid=0x1b5a in Object.wait() [0x00007fd1baa6d000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1104)
- locked <0x00000005527ba228> (a org.apache.commons.pool.impl.GenericObjectPool$Latch)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at ....
at ....
- locked <0x0000000549a0e8c8> (a xx.xx.xx.XXXX)
at ....
at ....
- locked <0x00000005406a1d68> (a xx.xx.xx.KKKK)
at ....
at ....
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x0000000549a0eab8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
3. 检查几份 thread dump,发现线程 "t2[4]-20" 一直在占用着资源,不释放。
4. 到这里,就可以深入研究 "t2[4]-20" 在干什么,为什么会占用资源不放。是不是有一些第三方的bug,如 https://issues.apache.org/jira/browse/POOL-230 ,等等。。。
在分析一些性能方面的问题,thread dump 是系统运行时的快照,里面包含的信息比较有用,可以通过java命令或其它工具来生成thread dump. 一般建议在重现问题的过程,生成几份thread dump,可以分析是否有线程长时间在跑,这样就好确定问题了。