找堆栈日志思路参考:https://blog.csdn.net/yeyuningzi/article/details/118877693
stack命令 stack TID >XXX.txt 可以得到堆栈日志,那么从堆栈日志里 得到问题的具体定位也是个挺难的问题(至少开始我就不会,所以在经过奋战之后),我决定做个记录
这里就涉及java里 进程、线程的状态问题,java线程状态:
1. 初始(NEW):新创建了一个线程对象,但还没有调用start()方法。
2. 运行(RUNNABLE):Java线程中将就绪(ready)和运行中(running)两种状态笼统的称为“运行”。
线程对象创建后,其他线程(比如main线程)调用了该对象的start()方法。该状态的线程位于可运行线程池中,等待被线程调度选中,获取CPU的使用权,此时处于就绪状态(ready)。就绪状态的线程在获得CPU时间片后变为运行中状态(running)。
3. 阻塞(BLOCKED):表示线程阻塞于锁。
4. 等待(WAITING):进入该状态的线程需要等待其他线程做出一些特定动作(通知或中断)。
5. 超时等待(TIMED_WAITING):该状态不同于WAITING,它可以在指定的时间后自行返回。
6. 终止(TERMINATED):表示该线程已经执行完毕。
参考:https://blog.csdn.net/qq_22771739/article/details/82529874
按照这几种状态 我们就要特别注意线程堆栈日志里的 这种含有阻塞(BLOCKED)、等待(WAITING)以及超时等待(TIMED_WAITING)信息:
1、java.lang.Thread.State: TIMED_WAITING
2、java.lang.Thread.State: WAITING
3、java.lang.Thread.State: BLOCKED
一般情况下,通过前期多次的top 、 top -p PID H 命令就可以基本确定出问题的线程TID,那么可以得到对应线程的十六进制数据,在堆栈日志里找到对应的线程堆栈:
"http-nio2-0.0.0.0-9160-Acceptor-0" #109 daemon prio=5 os_prio=0 tid=0x0000ffff807fa000 nid=0x1f7e waiting on condition [0x0000ffff3113e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000feacb3d8> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at sun.nio.ch.PendingFuture.get(PendingFuture.java:180)
at com.tongweb.web.util.net.Nio2Endpoint$Acceptor.run(Nio2Endpoint.java:450)
at java.lang.Thread.run(Thread.java:748)
"ContainerBackgroundProcessor[StandardEngine[TONGWEB]]" #108 daemon prio=5 os_prio=0 tid=0x0000ffff806cf000 nid=0x1f7d waiting on condition [0x0000ffff3133e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.tongweb.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1349)
at java.lang.Thread.run(Thread.java:748)
"RMI RenewClean-[192.168.0.87:36385]" #104 daemon prio=5 os_prio=0 tid=0x0000fffebc032000 nid=0x1f79 in Object.wait() [0x0000ffff4c3fe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x00000000c12f1958> (a java.lang.ref.ReferenceQueue$Lock)
at sun.rmi.transport.DGCClient$EndpointEntry$RenewCleanThread.run(DGCClient.java:563)
at java.lang.Thread.run(Thread.java:748)
"RMI Scheduler(0)" #103 daemon prio=5 os_prio=0 tid=0x0000ffff81163000 nid=0x1f78 waiting on condition [0x0000ffff4ddfd000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c03e9b58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"RMI Reaper" #101 prio=5 os_prio=0 tid=0x0000ffff809c2800 nid=0x1f73 in Object.wait() [0x0000ffff4d7fe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000c03e8de8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x00000000c03e8de8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at sun.rmi.transport.ObjectTable$Reaper.run(ObjectTable.java:351)
at java.lang.Thread.run(Thread.java:748)
我们会发现 堆栈里 处于 什么java.lang.Thread.State: TIMED_WAITING、java.lang.Thread.State: WAITING的线程还挺多,那怎么办?!
来仔细看日志会发现有一些不一样:
状态还是有不一样:
TIMED_WAITING (parking)
WAITING (on object monitor)
TIMED_WAITING (sleeping)
发现了没有,有一个WAITING (on object monitor)! 这个是什么鬼?下面的堆栈信息也明显跟其他的不一样:
"RMI RenewClean-[192.168.0.87:36385]" #104 daemon prio=5 os_prio=0 tid=0x0000fffebc032000 nid=0x1f79 in Object.wait() [0x0000ffff4c3fe000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x00000000c12f1958> (a java.lang.ref.ReferenceQueue$Lock)
at sun.rmi.transport.DGCClient$EndpointEntry$RenewCleanThread.run(DGCClient.java:563)
at java.lang.Thread.run(Thread.java:748)
"RMI Reaper" #101 prio=5 os_prio=0 tid=0x0000ffff809c2800 nid=0x1f73 in Object.wait() [0x0000ffff4d7fe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000c03e8de8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0x00000000c03e8de8> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at sun.rmi.transport.ObjectTable$Reaper.run(ObjectTable.java:351)
at java.lang.Thread.run(Thread.java:748)
不知道是什么意思 ,我们来度娘、谷歌一下(我是第一次弄,所以只能这样了。。。)
会发现 就是他们了(一个进程耗费cpu高 可能是一个或者多个问题引起的。。。。)
到这里 就可以发现问题就是 : locked <0x00000000c12f1958> (a java.lang.ref.ReferenceQueue$Lock)
locked <0x00000000c03e8de8> (a java.lang.ref.ReferenceQueue$Lock)
再各种搜一下 :这就是个死锁!!!
接下来 该做什么 我暂时不知道。。。后续遇到了 再记录吧,大致应该是就给你研发说:你看你的代码出现死锁了!赶紧解决一下!