报警短信突然一条接一条的出现了,某台服务器宕机了。赶紧ssh到问题服务器,jstack把栈信息导出分析。此时服务器cpu正常,但是resin无响应。初步判断是线程死锁或者线程堵死的问题。
首先我们来分析jstack栈信息。首先应该查找Waiting状态的线程,发现有1000多个线程,我晕。。
分析其中的一个,部分栈信息如下,不过足以说明问题了:
"http--8005-1040$786011311" daemon prio=10 tid=0x00002aab44a25800 nid=0x7d71 in Object.wait() [0x000000004b097000..0x000000004b09ac90]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaac21d8bc0> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
- locked <0x00002aaac21d8bc0> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at com.sohu.twap.service.util.HttpUtil.get(HttpUtil.java:442)
at com.sohu.twap.service.util.HttpUtil.getOriginalContent(HttpUtil.java:462)
at com.sohu.twap.action.AbstractServiceAction.getJsSourceCode(AbstractServiceAction.java:879)
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaac21d8bc0> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:518)
- locked <0x00002aaac21d8bc0> (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at com.sohu.twap.service.util.HttpUtil.get(HttpUtil.java:442)
at com.sohu.twap.service.util.HttpUtil.getOriginalContent(HttpUtil.java:462)
at com.sohu.twap.action.AbstractServiceAction.getJsSourceCode(AbstractServiceAction.java:879)
程序在MultiThreadedHttpConnectionManager.doGetConnection方法中的object.wait() 方法中等待了。。。查看代码和google,发现应该是connection连接未释放的问题的。程序的其他地方已经升级到4.0。所以建议大家还是使用最新的4.1版本。参考其官方文档:
Release the Connection
This is a crucial step to keep things flowing.