问题原因:在生产环境,发现某机器CPU占用很高。需要关联操作系统线程和java 应用程序进程ID,以进一步定位java应用异常问题
1. top -Hp pid 查看应用进程的子线程的占用情况(生产数据未保留,暂以测试为例)
或者pstree -p pid
[bppf_b@CSHJ_QZJK2 ~]$ top -Hp 11167
top - 19:50:26 up 102 days, 5:51, 3 users, load average: 0.23, 0.15, 0.05
Tasks: 41 total, 0 running, 41 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 1.0%sy, 0.0%ni, 98.4%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 32866796k total, 32379584k used, 487212k free, 1211524k buffers
Swap: 16776184k total, 9671176k used, 7105008k free, 5906632k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11167 bppf_b 20 0 9610m 355m 5496 S 0.0 1.1 0:00.00 java
11169 bppf_b 20 0 9610m 355m 5496 S 0.0 1.1 0:00.79 java
11170 bppf_b 20 0 9610m 355m 5496 S 0.0 1.1 0:04.98 java
11171 bppf_b 20 0 9610m 355m 5496 S 0.0 1.1 0:05.08 java
11172 bppf_b 20 0 9610m 355m 5496 S 0.0 1.1 0:05.04 java
2. 查看负载高的线程id,以 26092 为例,转换成16进制
[bppf_b@CSHJ_QZJK2 ~]$ echo 'obase=16;26092' | bc
65EC
3.通过JStack 命令查看栈信息,搜索065ec,定位到具体的线程。然后根据代码分析原因
"pool-1-thread-1" daemon prio=10 tid=0x00007fdd94037800 nid=0x65ec waiting on condition [0x00007fddf04be000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c8025c40> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:662)
总结:java栈中, 第一行里,"
pool-1-thread-1"是 Thread Name
tid指Java Thread id。
nid指native线程的id。
prio是线程优先级。
指线程在0x00000000c8025c40 这个地址上等待。
waiting on condition [0x00007fddf04be000] 线程栈起始地址