本文经历的事情发生在2008年
本人比较懒,碰到问题才会想到去解决, 一次偶然的机会,发现线上环境的tomcat自从把连接方式改成nio以后,时不时的发现有服务器CPU占用100%的情况。很奇怪的现象,为整个系统带来了不稳定因素,需要尽快解决。于是去网上查找资料,但是没有找到现成的答案。
定下心来自己分析,先需要知道tomcat中的线程在干什么,于是找到了jstack命令。
命令格式:jstack ${pid}
其中pid可以通过jps命令获得。
于是我们可以得到当前应用中工作线程的堆栈, 各个线程分别在做什么 , 比如很多线程都执行在同一个代码块,那这部分就可能有问题。
结果如下:
"MaintThread" daemon prio=10 tid=0x91cd3400 nid=0x4f40 waiting on condition [0x8faec000..0x8faecfb0]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.danga.MemCached.SockIOPool$MaintThread.run(SockIOPool.java:1492)
"Thread-3" prio=10 tid=0x8f9ff800 nid=0x4f3f waiting on condition [0x8fb2d000..0x8fb2df30]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.tudou.usersrv.client.HotSpareProxyMemcachedClient$1.run(HotSpareProxyMemcachedClient.java:71)
"MaintThread" daemon prio=10 tid=0x91dcfc00 nid=0x4f3e waiting on condition [0x8fb6e000..0x8fb6eeb0]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.danga.MemCached.SockIOPool$MaintThread.run(SockIOPool.java:1492)
"MaintThread" daemon prio=10 tid=0x91cd2800 nid=0x4f3d waiting on condition [0x8fbaf000..0x8fbafe30]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.danga.MemCached.SockIOPool$MaintThread.run(SockIOPool.java:1492)
"Low Memory Detector" daemon prio=10 tid=0x09a16c00 nid=0x4f37 runnable [0x00000000..0x00000000]
java.lang.Thread.State: RUNNABLE
"CompilerThread1" daemon prio=10 tid=0x09a15000 nid=0x4f36 waiting on condition [0x00000000..0x9250f688]
java.lang.Thread.State: RUNNABLE
"CompilerThread0" daemon prio=10 tid=0x09a13800 nid=0x4f35 waiting on condition [0x00000000..0x92590608]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x09a12800 nid=0x4f34 runnable [0x00000000..0x925d1c10]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x099ff800 nid=0x4f33 in Object.wait() [0x92812000..0x92812eb0]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
- locked <0x978b2300> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x099ff000 nid=0x4f32 in Object.wait() [0x92853000..0x92853e30]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x998d4160> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x09918800 nid=0x4f2c runnable [0xf7fb9000..0xf7fba208]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
- locked <0x978b24f8> (a java.net.SocksSocketImpl)
at java.net.ServerSocket.implAccept(ServerSocket.java:453)
at java.net.ServerSocket.accept(ServerSocket.java:421)
at org.apache.catalina.core.StandardServer.await(StandardServer.java:389)
at org.apache.catalina.startup.Catalina.await(Catalina.java:642)
at org.apache.catalina.startup.Catalina.start(Catalina.java:602)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
"VM Thread" prio=10 tid=0x099fc400 nid=0x4f31 runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x0991f400 nid=0x4f2d runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x09920400 nid=0x4f2e runnable
"GC task thread#2 (ParallelGC)" prio=10 tid=0x09921400 nid=0x4f2f runnable
"GC task thread#3 (ParallelGC)" prio=10 tid=0x09922400 nid=0x4f30 runnable
"VM Periodic Task Thread" prio=10 tid=0x09a18400 nid=0x4f38 waiting on condition
使用linux自带的ps命令:
ps -eL -o pid,%cpu,lwp|grep -i 16907
列出子线程的cpu占用率等情况,再对照jstack命令得到的线程号,就可以知道cpu占用高的线程执行到了那里。
通过这种方式,确定了是由于tomcat的nio用到了jdk6的nio包,命中了其中的一个死锁bug。apache组织的人也建议在高并发的情况下不要开启nio.
在生产环境中,jstack命令帮助解决了内存泄露引起的垃圾回收线程占用高CPU, 以及服务线程阻塞在某个远程接口处导致无响应等问题。
希望这篇文章可以帮助到大家解决更多问题,谢谢!