请注意:我正在使用JClouds标记这一点,因为如果您阅读了整个问题和随后的评论,我相信这可能是JClouds的一个错误或者是对该库的误用.
我有一个可执行的JAR运行,工作一段时间,完成工作而不会抛出任何错误/异常,然后在它应该退出时永远挂起.我使用VisualVM(关注正在运行的线程)对其进行了分析,并且我还在一个日志语句中进行了打印,以便在应用程序挂起的位置(在main()方法的末尾)进行打印.这是我的主要方法的最后一部分:
Set threadSet = Thread.getAllStackTraces().keySet();
for(Thread t : threadSet) {
String daemon = (t.isDaemon()? "Yes" : "No");
System.out.println("The ${t.getName()} thread is currently running; is it a daemon? ${daemon}.");
}
当我的JAR执行此代码时,我看到以下输出:
The com.google.inject.internal.util.Finalizer thread is currently running; is it a daemon? Yes.
The Signal Dispatcher thread is currently running; is it a daemon? Yes.
The RMI Scheduler(0) thread is currently running; is it a daemon? Yes.
The Attach Listener thread is currently running; is it a daemon? Yes.
The user thread 3 thread is currently running; is it a daemon? No.
The Finalizer thread is currently running; is it a daemon? Yes.
The RMI TCP Accept-0 thread is currently running; is it a daemon? Yes.
The main thread is currently running; is it a daemon? No.
The RMI TCP Connection(1)-10.10.99.8 thread is currently running; is it a daemon? Yes.
The Reference Handler thread is currently running; is it a daemon? Yes.
The JMX server connection timeout 24 thread is currently running; is it a daemon? Yes.
我不认为我必须担心守护进程(如果我错了就纠正我),所以将其过滤到非守护进程:
The user thread 3 thread is currently running; is it a daemon? No.
The main thread is currently running; is it a daemon? No.
显然,主线程仍然在运行,因为有些东西阻止它退出.嗯,用户线程3看起来很有趣. VisualVM告诉我们什么?
这是应用程序挂起的点的线程视图(上面的控制台输出打印时发生的事情).嗯,用户线程3看起来更加可疑!
所以在杀死应用程序之前我采取了一个线程转储.这是用户线程3的堆栈跟踪:
"user thread 3" prio=6 tid=0x000000000dfd4000 nid=0x2360 waiting on condition [0x00000000114ff000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000782cba410> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Locked ownable synchronizers:
- None
我从来没有必须分析其中的一个,所以这对我来说是胡言乱语(但也许不是训练有素的眼睛!).
在杀死应用程序之后,VisualVM的时间线每秒都会停止滴答/递增,我可以在时间轴中向后水平滚动到用户线程3的创建位置并开始它作为一个唠叨线程的生命:
但是我无法弄清楚如何分辨代码用户线程3的创建位置.所以我问:
>我如何知道创建用户线程3的内容,以及(特别是因为我怀疑它是创建线程的第三方OSS库)它正在创建?
>我如何分类,诊断和修复此线程挂起?
更新:
这是我的代码在用户线程3似乎被创建的同时触发:
ExecutorService myExecutor = Executors.newCachedThreadPool();
for(Node node : nodes) {
BootstrapAndKickTask bootAndKickTask = new BootstrapAndKickTask(node, ctx);
myExecutor.execute(bootAndKickTask);
}
myExecutor.shutdown();
if(!myExecutor.awaitTermination(15, TimeUnit.MINUTES)) {
TimeoutException toExc = new TimeoutException("Hung after the 15 minute timeout was reached.");
log.error(toExc);
throw toExc;
}
这里还有我的GitHub Gist,其中包含完整的线程转储.