6月9日现场反馈上午10点左右系统卡了一会。我首先分析了下当时的AWR报告,没有任何问题。再分析了当时的weblogic日志,有很多报错:
java.io.IOException: The connection manager to ConnectionManager for: 'weblogic.rjvm.RJVMImpl@174c25f1 - id: '5508187429651368410S:10.151.81.53:[7001,7001,-1,-1,-1,-1,-1]:hzsc_domain:AdminServer' connect time: 'Sun Jun 09 09:54:45 GMT+08:00 2013'' has already been shut down.
java.io.IOException: The connection manager to ConnectionManager for: 'weblogic.rjvm.RJVMImpl@174c25f1 - id: '5508187429651368410S:10.151.81.53:[7001,7001,-1,-1,-1,-1,-1]:hzsc_domain:AdminServer' connect time: 'Sun Jun 09 09:54:45 GMT+08:00 2013'' has already been shut down
at weblogic.rjvm.ConnectionManager.getOutputStream(ConnectionManager.java:1706)
at weblogic.rjvm.ConnectionManager.createHeartbeatMsg(ConnectionManager.java:1649)
at weblogic.rjvm.ConnectionManager.sendHeartbeatMsg(ConnectionManager.java:611)
at weblogic.rjvm.RJVMImpl$HeartbeatChecker.timerExpired(RJVMImpl.java:1540)
at weblogic.timers.internal.TimerImpl.run(TimerImpl.java:265)
at weblogic.work.ServerWorkManagerImpl$WorkAdapterImpl.run(ServerWorkManagerImpl.java:518)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)
是由于weblogic.rjvm.RJVMImpl的原因,使AdminServer任务有一台机器上的server 已经shut down(其实并没有shut down),导致集群server中无法完成heartbeat,因此如果在并发很大时,系统无法正常的将请求转发到其他的服务。