怎么分析线程栈

jstack Dump 日志文件中的线程状态
dump 文件里,值得关注的线程状态有:
  1. 死锁, Deadlock(重点关注) 
  2. 执行中,Runnable   
  3. 等待资源, Waiting on condition(重点关注) 
  4. 等待获取监视器, Waiting on monitor entry(重点关注)
  5. 暂停,Suspended
  6. 对象等待中,Object.wait() 或 TIMED_WAITING
  7. 阻塞, Blocked(重点关注)  
  8. 停止,Parked

下面我们先从第一个例子开始分析,然后再列出不同线程状态的含义以及注意事项,最后再补充两个实例。

综合示范一: Waiting to lock 和 Blocked
实例如下:
"RMI TCP Connection(267865)-172.16.5.25" daemon prio=10 tid=0x00007fd508371000 nid=0x55ae  waiting for monitor entry [0x00007fd4f8684000]
   java.lang.Thread.State:  BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:201)
-  waiting to lock <0x00000000acf4d0c0> (a org.apache.log4j.Logger)
at org.apache.log4j.Category.forcedLog(Category.java:388)
at org.apache.log4j.Category.log(Category.java:853)
at org.apache.commons.logging.impl.Log4JLogger.warn(Log4JLogger.java:234)
at com.tuan.core.common.lang.cache.remote.SpyMemcachedClient.get(SpyMemcachedClient.java:110)
……
1)线程状态是  Blocked,阻塞状态。说明线程等待资源超时!
2)“ waiting to lock <0x00000000acf4d0c0>”指,线程在等待给这个 0x00000000acf4d0c0 地址上锁(英文可描述为:trying to obtain  0x00000000acf4d0c0 lock)。
3)在 dump 日志里查找字符串 0x00000000acf4d0c0,发现有大量线程都在等待给这个地址上锁。如果能在日志里找到谁获得了这个锁(如locked < 0x00000000acf4d0c0 >),就可以顺藤摸瓜了。
4)“ waiting for monitor entry”说明此线程通过 synchronized(obj) {……} 申请进入了临界区,从而进入了下图1中的“Entry Set”队列,但该 obj 对应的 monitor 被其他线程拥有,所以本线程在 Entry Set 队列中等待。
5)第一行里,"RMI TCP Connection(267865)-172.16.5.25"是 Thread Name 。tid指Java Thread id。nid指native线程的id。prio是线程优先级。[0x00007fd4f8684000]是线程栈起始地址。
 
Dump文件中的线程状态含义及注意事项

含义如下所示:

  • Deadlock:死锁线程,一般指多个线程调用间,进入相互资源占用,导致一直等待无法释放的情况。
  • Runnable:一般指该线程正在执行状态中,该线程占用了资源,正在处理某个请求,有可能正在传递SQL到数据库执行,有可能在对某个文件操作,有可能进行数据类型等转换。
  • Waiting on condition:等待资源,或等待某个条件的发生。具体原因需结合 stacktrace来分析。
    • 如果堆栈信息明确是应用代码,则证明该线程正在等待资源。一般是大量读取某资源,且该资源采用了资源锁的情况下,线程进入等待状态,等待资源的读取。
    • 又或者,正在等待其他线程的执行等。
    • 如果发现有大量的线程都在处在 Wait on condition,从线程 stack看,正等待网络读写,这可能是一个网络瓶颈的征兆。因为网络阻塞导致线程无法执行。
      • 一种情况是网络非常忙,几乎消耗了所有的带宽,仍然有大量数据等待网络读写;
      • 另一种情况也可能是网络空闲,但由于路由等问题,导致包无法正常的到达。
    • 另外一种出现 Wait on condition的常见情况是该线程在 sleep,等待 sleep的时间到了时候,将被唤醒。
  • Blocked:线程阻塞,是指当前线程执行过程中,所需要的资源长时间等待却一直未能获取到,被容器的线程管理器标识为阻塞状态,可以理解为等待资源超时的线程。
  • Waiting for monitor entry 和 in Object.wait():Monitor是 Java中用以实现线程之间的互斥与协作的主要手段,它可以看成是对象或者 Class的锁。每一个对象都有,也仅有一个 monitor。从下图1中可以看出,每个 Monitor在某个时刻,只能被一个线程拥有,该线程就是 “Active Thread”,而其它线程都是 “Waiting Thread”,分别在两个队列 “ Entry Set”和 “Wait Set”里面等候。在 “Entry Set”中等待的线程状态是 “Waiting for monitor entry”,而在 “Wait Set”中等待的线程状态是 “in Object.wait()”。


 
 

图1 A Java Monitor

 

综合示范二: W aiting on condition  和 TIMED_WAITING
实例如下:
"RMI TCP Connection(idle)" daemon prio=10 tid=0x00007fd50834e800 nid=0x56b2  waiting on condition [0x00007fd4f1a59000]
   java.lang.Thread.State:  TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
-  parking to wait for  <0x00000000acd84de8> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:323)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:874)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:945)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:662)
1)“ TIMED_WAITING (parking)”中的 timed_waiting 指等待状态,但这里指定了时间,到达指定的时间后自动退出等待状态;parking指线程处于挂起中。

2)“ waiting on condition”需要与堆栈中的“ parking to wait for  <0x00000000acd84de8> (a java.util.concurrent.SynchronousQueue$TransferStack)”结合来看。首先,本线程肯定是在等待某个条件的发生,来把自己唤醒。其次,SynchronousQueue 并不是一个队列,只是线程之间移交信息的机制,当我们把一个元素放入到 SynchronousQueue 中时必须有另一个线程正在等待接受移交的任务,因此这就是本线程在等待的条件。
3)别的就看不出来了。

综合示范三:in Obejct.wait()  和 TIMED_WAITING

实例如下:
"RMI RenewClean-[172.16.5.19:28475]" daemon prio=10 tid=0x0000000041428800 nid=0xb09  in Object.wait() [0x00007f34f4bd0000]
   java.lang.Thread.State:  TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
-  waiting on <0x00000000aa672478> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
-  locked <0x00000000aa672478> (a java.lang.ref.ReferenceQueue$Lock)
at sun.rmi.transport.DGCClient$EndpointEntry$RenewCleanThread.run(DGCClient.java:516)
at java.lang.Thread.run(Thread.java:662)
1)“ TIMED_WAITING (on object monitor)”,对于本例而言,是因为本线程调用了 java.lang.Object.wait(long timeout) 而进入等待状态。 

2)“Wait Set”中等待的线程状态就是“  in Object.wait() ”。当线程获得了 Monitor,进入了临界区之后,如果发现线程继续运行的条件没有满足,它则调用对象(一般就是被 synchronized 的对象)的 wait() 方法,放弃了 Monitor,进入 “Wait Set”队列。只有当别的线程在该对象上调用了 notify() 或者 notifyAll() ,“ Wait Set”队列中线程才得到机会去竞争,但是只有一个线程获得对象的 Monitor,恢复到运行态。  

3)RMI RenewClean 是 DGCClient 的一部分。 DGC 指的是 Distributed GC,即分布式垃圾回收。 

4)请注意,是先  locked <0x00000000aa672478>,后  waiting on <0x00000000aa672478>,之所以先锁再等同一个对象,请看下面它的代码实现:
static private class  Lock { };
private Lock lock = new Lock();
public Reference<? extends T>  remove(long timeout)
{
    synchronized (lock) {
        Reference<? extends T> r =  reallyPoll();
        if (r != null) return r;
        for (;;) {
             lockwait(timeout);
            r =  reallyPoll();
            ……
       }
}
即,线程的执行中,先用 synchronized 获得了这个对象的 Monitor(对应于   locked <0x00000000aa672478> );当执行到 lock.wait(timeout);,线程就放弃了 Monitor 的所有权,进入“Wait Set”队列(对应于   waiting on <0x00000000aa672478> )。
5)从堆栈信息看,是正在清理 remote references to remote objects ,引用的租约到了,分布式垃圾回收在逐一清理呢。



最后我们来 看下面的一段线程栈作为实例,我会根据这个实例对线程栈做出完整分析:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.10-b01-428 mixed mode):

"DestroyJavaVM" prio=5 tid=7f9712001000 nid=0x110247000 waiting on condition [00000000]
   java.lang.Thread.State: RUNNABLE

"Thread-21" prio=5 tid=7f9712944000 nid=0x118d76000 waiting for monitor entry [118d75000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366f58> (a threads.deadlock.Account)
 - locked <7f3366ee0> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-20" prio=5 tid=7f971216c000 nid=0x118c73000 waiting for monitor entry [118c72000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366e98> (a threads.deadlock.Account)
 - locked <7f3366f58> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-19" prio=5 tid=7f9712943800 nid=0x118b70000 waiting for monitor entry [118b6f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366f40> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-18" prio=5 tid=7f9712942800 nid=0x118a6d000 waiting for monitor entry [118a6c000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366f40> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-17" prio=5 tid=7f9712942000 nid=0x11896a000 waiting for monitor entry [118969000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366ec8> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-16" prio=5 tid=7f9712941000 nid=0x118867000 waiting for monitor entry [118866000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366ec8> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-15" prio=5 tid=7f9712940800 nid=0x118764000 waiting for monitor entry [118763000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366ef8> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-14" prio=5 tid=7f971293f800 nid=0x118661000 waiting for monitor entry [118660000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366f28> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-13" prio=5 tid=7f97129ae000 nid=0x11855e000 waiting for monitor entry [11855d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366eb0> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-12" prio=5 tid=7f97129ad000 nid=0x11845b000 waiting for monitor entry [11845a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366f40> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-11" prio=5 tid=7f97129ac800 nid=0x118358000 waiting for monitor entry [118357000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366f58> (a threads.deadlock.Account)
 - locked <7f3366eb0> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-10" prio=5 tid=7f97129ab800 nid=0x118255000 waiting for monitor entry [118254000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366eb0> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-9" prio=5 tid=7f97129ab000 nid=0x118152000 waiting for monitor entry [118151000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366e98> (a threads.deadlock.Account)
 - locked <7f3366ec8> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-8" prio=5 tid=7f97129aa000 nid=0x11804f000 waiting for monitor entry [11804e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366eb0> (a threads.deadlock.Account)
 - locked <7f3366f28> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-7" prio=5 tid=7f97129a9800 nid=0x117f4c000 waiting for monitor entry [117f4b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366eb0> (a threads.deadlock.Account)
 - locked <7f3366e80> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-6" prio=5 tid=7f97129a8800 nid=0x117e49000 waiting for monitor entry [117e48000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366e80> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-5" prio=5 tid=7f97128a1800 nid=0x117d46000 waiting for monitor entry [117d45000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:81)
 - waiting to lock <7f3366f28> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-4" prio=5 tid=7f97121af800 nid=0x117c43000 waiting for monitor entry [117c42000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366e80> (a threads.deadlock.Account)
 - locked <7f3366e98> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-3" prio=5 tid=7f97121ae800 nid=0x117b40000 waiting for monitor entry [117b3f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366e80> (a threads.deadlock.Account)
 - locked <7f3366ef8> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"Thread-2" prio=5 tid=7f971224a000 nid=0x117a3d000 waiting for monitor entry [117a3c000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366eb0> (a threads.deadlock.Account)
 - locked <7f3366f40> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

"RMI TCP Accept-0" daemon prio=5 tid=7f97128fd800 nid=0x117837000 runnable [117836000]
   java.lang.Thread.State: RUNNABLE
 at java.net.PlainSocketImpl.socketAccept(Native Method)
 at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
 - locked <7f32ee740> (a java.net.SocksSocketImpl)
 at java.net.ServerSocket.implAccept(ServerSocket.java:462)
 at java.net.ServerSocket.accept(ServerSocket.java:430)
 at sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
 at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
 at sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
 at java.lang.Thread.run(Thread.java:680)

"Poller SunPKCS11-Darwin" daemon prio=1 tid=7f97128fd000 nid=0x117734000 waiting on condition [117733000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at sun.security.pkcs11.SunPKCS11$TokenPoller.run(SunPKCS11.java:692)
 at java.lang.Thread.run(Thread.java:680)

"Low Memory Detector" daemon prio=5 tid=7f971209e000 nid=0x1173ec000 runnable [00000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=9 tid=7f971209d000 nid=0x1172e9000 waiting on condition [00000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=9 tid=7f971209c800 nid=0x1171e6000 waiting on condition [00000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=9 tid=7f971209b800 nid=0x1170e3000 waiting on condition [00000000]
   java.lang.Thread.State: RUNNABLE

"Surrogate Locker Thread (Concurrent GC)" daemon prio=5 tid=7f971209a800 nid=0x116fe0000 waiting on condition [00000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=7f971209a000 nid=0x116d1c000 in Object.wait() [116d1b000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <7f3001300> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
 - locked <7f3001300> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
 at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=7f9712099000 nid=0x116c19000 in Object.wait() [116c18000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 - waiting on <7f30011d8> (a java.lang.ref.Reference$Lock)
 at java.lang.Object.wait(Object.java:485)
 at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
 - locked <7f30011d8> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=9 tid=7f9712096800 nid=0x116b16000 runnable 

"Gang worker#0 (Parallel GC Threads)" prio=9 tid=7f9712002800 nid=0x1135c7000 runnable 

"Gang worker#1 (Parallel GC Threads)" prio=9 tid=7f9712003000 nid=0x1136ca000 runnable 

"Concurrent Mark-Sweep GC Thread" prio=9 tid=7f971204d800 nid=0x116790000 runnable 
"VM Periodic Task Thread" prio=10 tid=7f97122d4000 nid=0x11793a000 waiting on condition 

"Exception Catcher Thread" prio=10 tid=7f9712001800 nid=0x1103ef000 runnable 
JNI global references: 1037


Found one Java-level deadlock:
=============================
"Thread-21":
  waiting to lock monitor 7f97118bd560 (object 7f3366f58, a threads.deadlock.Account),
  which is held by "Thread-20"
"Thread-20":
  waiting to lock monitor 7f97118bc108 (object 7f3366e98, a threads.deadlock.Account),
  which is held by "Thread-4"
"Thread-4":
  waiting to lock monitor 7f9711834360 (object 7f3366e80, a threads.deadlock.Account),
  which is held by "Thread-7"
"Thread-7":
  waiting to lock monitor 7f97118b9708 (object 7f3366eb0, a threads.deadlock.Account),
  which is held by "Thread-11"
"Thread-11":
  waiting to lock monitor 7f97118bd560 (object 7f3366f58, a threads.deadlock.Account),
  which is held by "Thread-20"

Java stack information for the threads listed above:
===================================================
"Thread-21":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366f58> (a threads.deadlock.Account)
 - locked <7f3366ee0> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-20":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366e98> (a threads.deadlock.Account)
 - locked <7f3366f58> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-4":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366e80> (a threads.deadlock.Account)
 - locked <7f3366e98> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-7":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366eb0> (a threads.deadlock.Account)
 - locked <7f3366e80> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-11":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366f58> (a threads.deadlock.Account)
 - locked <7f3366eb0> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

Found 1 deadlock.

Heap
 par new generation   total 19136K, used 11590K [7f3000000, 7f44c0000, 7f44c0000)
  eden space 17024K,  68% used [7f3000000, 7f3b51ac0, 7f40a0000)
  from space 2112K,   0% used [7f40a0000, 7f40a0000, 7f42b0000)
  to   space 2112K,   0% used [7f42b0000, 7f42b0000, 7f44c0000)
 concurrent mark-sweep generation total 63872K, used 0K [7f44c0000, 7f8320000, 7fae00000)
 concurrent-mark-sweep perm gen total 21248K, used 8268K [7fae00000, 7fc2c0000, 800000000)

通过快速浏览, 你能看出一个完整的线程栈可以划分成如下四部分:
  1. 程序中包含的完整线程队列
  2. 死锁的线程队列清单
  3. 一部分死锁线程的日志
  4. 程序的堆分析总结

完整的线程队列


The thread list in point one above is a list of all the application’s threads and their current status. From this you can see that an application consists of a whole bunch of threads, which you can roughly divide in to two. Firstly there are the background threads. These are the ones that every application has, which get on with all the dirty jobs that we, as application programmers, don’t usually need to worry about. These have names such as:"DestroyJavaVM", Low Memory Detector, Finalizer, Exception Catcher Thread andConcurrent Mark-Sweep GC Thread. Secondly, there are the threads that you or I may create as part of our code. These usually have names that consist of the wordThread followed by a number. For example: Thread-3, Thread-6 andThread-20.

"Thread-20" prio=5 tid=7f971216c000 nid=0x118c73000 waiting for monitor entry [118c72000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:82)
 - waiting to lock <7f3366e98> (a threads.deadlock.Account)
 - locked <7f3366f58> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:58)

Looking at the information given on Thread-20 in more detail you can see that this can be broken down into several parts. These are:

Thread-20 The thread’s name as described above.
prio=5 The thread's priority. A number from 1 to 10, where 1 is the lowest and 10 is the highest priority.
tid=7f971216c000 The thread id. A unique number that’s returned by a Thread.getId() call.
nid=0x118c73000 The native thread id. This maps to a platform dependent thread id.
waiting for monitor entry [118c72000]
java.lang.Thread.State: BLOCKED (on object monitor)
This is the status of the thread; in this case it’s BLOCKED. Also included is a stack trace outlining where the thread is blocked.

Note that a thread can also be marked as a daemon. For example:
"RMI TCP Accept-0" daemon prio=5 tid=7f97128fd800 nid=0x117837000 runnable [117836000]
   java.lang.Thread.State: RUNNABLE
Daemon threads are background task threads such as the RMI TCP Accept-0 thread listed above. A daemon thread is a thread that does not prevent the JVM from exiting. The JVM will exit, or close down, when only daemon threads remain.

However, the thread list doesn't really help in tracing the cause of a deadlock, so moving swiftly along...


死锁的线程队列


This section of the thread dump contains a list of all threads that are involved in the deadlock.

Found one Java-level deadlock:
=============================
"Thread-21":
  waiting to lock monitor 7f97118bd560 (object 7f3366f58, a threads.deadlock.Account),
  which is held by "Thread-20"
"Thread-20":
  waiting to lock monitor 7f97118bc108 (object 7f3366e98, a threads.deadlock.Account),
  which is held by "Thread-4"
"Thread-4":
  waiting to lock monitor 7f9711834360 (object 7f3366e80, a threads.deadlock.Account),
  which is held by "Thread-7"
"Thread-7":
  waiting to lock monitor 7f97118b9708 (object 7f3366eb0, a threads.deadlock.Account),
  which is held by "Thread-11"
"Thread-11":
  waiting to lock monitor 7f97118bd560 (object 7f3366f58, a threads.deadlock.Account),
  which is held by "Thread-20"

From the segment above, you can see that there are five threads all blocking on instances thethreads.deadlock.Account class

Leaving aside the monitor ids and Account instances, you can see that "Thread-21" is waiting for "Thread-20", which is waiting for "Thread-4", which in turn is waiting for "Thread-7". "Thread-7" is waiting for "Thread-11", which is waiting for "Thread-20": a deadlock loop as shown in the diagram below:


死锁线程的日志


The final piece of the puzzle is the list of deadlocked thread stack traces as shown below:

Java stack information for the threads listed above:
===================================================
"Thread-21":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366f58> (a threads.deadlock.Account)
 - locked <7f3366ee0> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-20":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366e98> (a threads.deadlock.Account)
 - locked <7f3366f58> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-4":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366e80> (a threads.deadlock.Account)
 - locked <7f3366e98> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-7":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366eb0> (a threads.deadlock.Account)
 - locked <7f3366e80> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-11":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f3366f58> (a threads.deadlock.Account)
 - locked <7f3366eb0> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

From the previous section, we know that Thread-20 is waiting, via a circuitous route, forThread-11 and Thread-11 is waiting for Thread-20. This is our deadlock.

The next step is to tie this deadlock up to lines of code using the thread stack trace above and I’ve simplified this in the diagram below.



In the above diagram I've removed the 7f3366 prefix from the object ids for clarity; hence, object7f3366f58 is now f58. From this diagram, you can see that objectf58 is locked by Thread-20 on line 59 and is waiting for a lock on objecte98 on line 86. Following the arrows down, you can see that Thread-7 is waiting for a lock oneb0 on line 86, which in turn is locked by Thread-11 on line 59.Thread-11 is waiting for a lock on f58 on line 86, which, looping back up, is locked on line 58 byThread-20

So, where are these lines of code? The following shows line 59:



...and this is line 86:






Everyone gets surprises sometimes and the stack trace above surprised me. I was expecting the locks to be on lines 85 and 86; however, they were on 59 and 86. Since line 59 doesn’t contain asynchronized keyword, I’m guessing that the compiler has done some optimisation on thetransfer(...) method’s first synchronized keyword.

The conclusion that can be drawn from this is that the code, which randomly picks twoAccount objects from a list, is locking them in the wrong order on lines 59 and 86. So what’s the fix? More on that next time; however, there’s one final point to note, which is that the make up of a deadlock may not be the same every time you generate a thread dump on a program. After running the DeadlockDemo program again and usingkill -3 PID to get hold of another thread dump, I obtained these results:

Found one Java-level deadlock:
=============================
"Thread-20":
  waiting to lock monitor 7fdc7c802508 (object 7f311a530, a threads.deadlock.Account),
  which is held by "Thread-3"
"Thread-3":
  waiting to lock monitor 7fdc7a83d008 (object 7f311a518, a threads.deadlock.Account),
  which is held by "Thread-11"
"Thread-11":
  waiting to lock monitor 7fdc7c802508 (object 7f311a530, a threads.deadlock.Account),
  which is held by "Thread-3"

Java stack information for the threads listed above:
===================================================
"Thread-20":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:86)
 - waiting to lock <7f311a530> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-3":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:87)
 - waiting to lock <7f311a518> (a threads.deadlock.Account)
 - locked <7f311a530> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)
"Thread-11":
 at threads.deadlock.DeadlockDemo$BadTransferOperation.transfer(DeadlockDemo.java:87)
 - waiting to lock <7f311a530> (a threads.deadlock.Account)
 - locked <7f311a518> (a threads.deadlock.Account)
 at threads.deadlock.DeadlockDemo$BadTransferOperation.run(DeadlockDemo.java:59)

Found 1 deadlock.

In this thread dump a smaller number of threads are involved in the deadlock, but if you analyze it you can draw the same conclusions as my first example.

Next time: fixing the code...


All source code for this an other blogs in the series are available on Github at git://github.com/roghughe/captaindebug.git

没有更多推荐了,返回首页