Thread dump是对于程序的诊断是非常有用的,它可以帮助我们分析诸如程序没有响应,响应慢,CPU抖动,死锁,内存问题以及其它的系统问题。今天主要想分享如果打印Thread Dump
假定我们现在怀疑某个java进程出现了问题,比如没有响应,首先可以尝试拿到这个java应用对应的process Id (ps or jps)。然后可以看一下这个应用启动了那一些线程,这个步骤可以top命令:
# top -H -p 3627
top - 07:15:00 up 5:46, 2 users, load average: 0.22, 0.33, 0.47
Tasks: 12 total, 0 running, 12 sleeping, 0 stopped, 0 zombie
Cpu(s): 23.5%us, 9.8%sy, 2.2%ni, 62.3%id, 1.2%wa, 0.0%hi, 0.9%si, 0.0%st
Mem: 8061348k total, 7981468k used, 79880k free, 73332k buffers
Swap: 8208380k total, 0k used, 8208380k free, 1370492k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3627 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.00 java
3628 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.17 java
3629 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.00 java
3630 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.00 java
3631 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.00 java
3632 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.00 java
3633 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.00 java
3634 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.00 java
3635 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.01 java
3636 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.01 java
3637 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.00 java
3638 cms 20 0 3042m 23m 10m S 0.0 0.3 0:00.01 java
CPU的summary各个field的意义:
us = user mode (application运行时间)
sy = system mode (kernel运行时间)
ni = low priority user mode (nice)
id = idle task
wa = I/O waiting
hi = servicing IRQs
si = servicing soft IRQs
st = steal (time given to other DomU instances)
其中第一列PID就是Thread 的系统Id, 对应于后面thread dump中的nid(native thread id)。这里需要特别注意:这个id是十进制的,而thread dump中的nid是十六进制的,所以需要做一个转换。
从top命令output的summary来看,它有12个线程,而且12个线程处于sleeping状态。
接下来就是如果得到这个进程3627的thread dump。这里主要介绍jstack 以及kill -3,当然还有其它的方式,如jvisualVM,这里不详细介绍了。
jstack
jstack是一个非常有用的工具,用于抓取thread dump, 这个工具通常在JDK_HOME/bin目录下。用法也相对简单:
jstack -l <pid>
如果没有响应,可以尝试使用-F选项, 3627进程thread dump:
2018-02-27 07:27:18
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.141-b31 mixed mode):
"Attach Listener" daemon prio=10 tid=0x00007faba0001000 nid=0x3b52 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Service Thread" daemon prio=10 tid=0x00007fabc411a800 nid=0xe35 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"C2 CompilerThread1" daemon prio=10 tid=0x00007fabc4118000 nid=0xe34 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"C2 CompilerThread0" daemon prio=10 tid=0x00007fabc4115000 nid=0xe33 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Signal Dispatcher" daemon prio=10 tid=0x00007fabc4113000 nid=0xe32 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Finalizer" daemon prio=10 tid=0x00007fabc40f1800 nid=0xe31 in Object.wait() [0x00007fabc0ad0000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007d6f04860> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000007d6f04860> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
Locked ownable synchronizers:
- None
"Reference Handler" daemon prio=10 tid=0x00007fabc40ef800 nid=0xe30 in Object.wait() [0x00007fabafffe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007d6f04478> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000007d6f04478> (a java.lang.ref.Reference$Lock)
Locked ownable synchronizers:
- None
"main" prio=10 tid=0x00007fabc4008800 nid=0xe2c runnable [0x00007fabc9ed6000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked <0x00000007d6f19af0> (a java.io.BufferedInputStream)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
- locked <0x00000007d721e808> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
- locked <0x00000007d721e808> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at com.tools.Utilities.readInput(LogUtilities.java:175)
at com.tools.Utilities.main(LogUtilities.java:184)
Locked ownable synchronizers:
- None
"VM Thread" prio=10 tid=0x00007fabc40eb000 nid=0xe2f runnable
"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fabc401e000 nid=0xe2d runnable
"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fabc4020000 nid=0xe2e runnable
"VM Periodic Task Thread" prio=10 tid=0x00007fabc4125800 nid=0xe36 waiting on condition
JNI global references: 129
可以看到上面有12个线程,跟top命令看到是一样的。每个线程那是以下面的格式开始:
"main" prio=10 tid=0x00007fabc4008800 nid=0xe2c runnable [0x00007fabc9ed6000]
java.lang.Thread.State: RUNNABLE
其中:
- main: thread name,如果没有指定线程名字,由系统自动生成,比如”Thread-1”
- tid: jvm的线程id,由jvm自已维护其唯一性
- nid: thread的系统id,这个通常跟操作系统相关,所以称为本地线程Id, 在Linux下可以在/proc/processId/task/查看这个所以的线程Id
- runnable: 线程的状态
kill -3
如果生产环境没有jdk,只有jre,那就没有办法使用jstack。作为备用,我们可以使用kill -3或者kill -QUIT来抓取thread dump
kill -3 <pid>
这里需要注意的是,thread dump可能不会在你当前的console上打印
- 如果你的应用是布置在tomcat,那么你可以从catalina.out查看
- 如果你是用nohup启动的后台进程,那么应该在nohup.out查看
kill -3的thread dump跟jstack很类似,有一点不一样,kill -3会在最后打印堆相关的信息:
Heap
PSYoungGen total 36864K, used 3193K [0x00000007d6f00000, 0x00000007d9800000, 0x0000000800000000)
eden space 31744K, 10% used [0x00000007d6f00000,0x00000007d721e670,0x00000007d8e00000)
from space 5120K, 0% used [0x00000007d9300000,0x00000007d9300000,0x00000007d9800000)
to space 5120K, 0% used [0x00000007d8e00000,0x00000007d8e00000,0x00000007d9300000)
ParOldGen total 83968K, used 0K [0x0000000784e00000, 0x000000078a000000, 0x00000007d6f00000)
object space 83968K, 0% used [0x0000000784e00000,0x0000000784e00000,0x000000078a000000)
PSPermGen total 21504K, used 4553K [0x000000077fc00000, 0x0000000781100000, 0x0000000784e00000)
object space 21504K, 21% used [0x000000077fc00000,0x0000000780072558,0x0000000781100000)