1. 事故描述
昨日上午十点多,我们的基础应用发生生产事故。具体表象为系统出现假死无响应。2. 硬件 LB
查看硬件负载路由情况:ARRAY-3(config)#sh stati sl r tcp JiChuYinYong_a_8001
Real service JiChuYinYong_a_8001 192.168.1.137 8001 UP ACTIVE
Main health check: 192.168.1.137 8001 tcp UP
Max Conn Count: 10000
Current Connection Count: 1685
Outstanding Request Count: 1685
Total Hits: 13532432
Total Bytes In: 1822052478897
Total Bytes Out: 387396621959
Total Packets In: 2993154102
Total Packets Out: 1973275757
Average Response time: 1.165 ms
ARRAY-3(config)#sh stati sl r tcp JiChuYinYong_b_8001
Real service JiChuYinYong_b_8001 192.168.1.138 8001 UP ACTIVE
Main health check: 192.168.1.138 8001 tcp UP
Max Conn Count: 10000
Current Connection Count: 19
Outstanding Request Count: 19
Total Hits: 3408819
Total Bytes In: 658529070959
Total Bytes Out: 165733759282
Total Packets In: 1177272070
Total Packets Out: 796176948
Average Response time: 1.175 ms
两台路由情况 1685 : 19,硬件负载没有将流量均衡负载。出问题的是 137 那台机器。
3. weblogic 日志
3.1 问题的制造者
weblogic 日志很多,暴露出来的问题很多。但是很多时候并非所有暴露出来的问题都是问题:- 有一些问题在低并发时没问题,一到高并发就出问题。这种问题是问题的制造者。
- 有一些问题在低并发时没问题,高并发时本来也没有问题,但是由于系统资源被其它问题耗光,这种问题也就“暴露”出来了。这种问题其实是问题的受害者。
查看出问题那台服务器的 weblogic 日志,在出故障的时间段内,最先出现问题的日志有以下信息:
####<Mar 25, 2017 9:29:09 AM CST> <Error> <WebLogicServer> <PSFPWEB01> <psfp_in> <[ACTIVE] ExecuteThread: '52' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1490405349068> <BEA-000337> <[STUCK] ExecuteThread: '18' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "686" seconds working on the request "weblogic.servlet.internal.ServletRequestImpl@2308bcd6[
POST /psfp-inagency/resource/com.defonds.core.psfp.inagency.biz.issued.resource.InBatchIssuedResource/issued HTTP/1.1
Content-Type: application/json
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Java/1.6.0_45
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Content-Length: 472434
]", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:
Thread-411 "[STUCK] ExecuteThread: '18' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {
jrockit.net.SocketNativeIO.readBytesPinned(SocketNativeIO.java:???)
jrockit.net.SocketNativeIO.socketRead(SocketNativeIO.java:24)
java.net.SocketInputStream.socketRead0(SocketInputStream.java:???)
java.net.SocketInputStream.read(SocketInputStream.java:107)
java.io.BufferedInputStream.fill(BufferedInputStream.java:189)
java.io.BufferedInputStream.read(BufferedInputStream.java:236)
^-- Holding lock: java.io.BufferedInputStream@47fee815[thin lock]
weblogic.net.http.MessageHeader.isHTTP(MessageHeader.java:214)
weblogic.net.http.MessageHeader.parseHeader(MessageHeader.java:141)
weblogic.net.http.HttpClient.parseHTTP(HttpClient.java:460)
weblogic.net.http.HttpURLConnection.getInputStream(HttpURLConnection.java:328)
weblogic.net.http.SOAPHttpURLConnection.getInputStream(SOAPHttpURLConnection.java:37)
^-- Holding lock: weblogic.net.http.SOAPHttpURLConnection@5971479e[thin lock]
weblogic.net.http.HttpURLConnection.getResponseCode(HttpURLConnection.java:939)
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:158)
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
com.sun.jersey.api.client.Client.handle(Client.java:645)
com.sun.jersey.api.client.WebResource.handle(WebResource.java:679)
com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:558)
com.defonds.rest.core.client.proxy.ResourceJsonInvocationHandler.invoke(ResourceJsonInvocationHandler.java:33)
$Proxy437.issued(Unknown Source)
com.defonds.core.psfp.inagency.biz.issued.resource.impl.BatchIssuedResourceImpl.issued(BatchIssuedResourceImpl.java:63)
sun.reflect.GeneratedMethodAccessor4368.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:575)
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:183)
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:185)
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:93)
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:80)
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:93)
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:72)
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1495)
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1441)
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1386)
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1377)
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:378)
com.sun.jersey.spi.cont