java hung_java – Hung JVM消耗100%的CPU

我们有一个在

Linux 32位(CentOS)上的Sun JRE 6u20上运行的JAVA服务器.我们使用服务器热点与CMS收集器与以下选项(我只提供了相关的):

-Xmx896m -Xss128k -XX:NewSize=384M -XX:MaxPermSize=96m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC

有时,运行一段时间后,JVM似乎进入挂起状态,即使我们没有向应用程序发出任何请求,cpu仍然以100%的速度旋转(我们有8个逻辑cpu,所以它看起来像只有一个cpu做旋转).

在这种状态下,JVM不响应SIGHUP信号(kill -3),我们无法通过jstack连接到它.我们可以连接“jstack -F”,但是输出是诡异的(我们可以看到很多NullPointerExceptions从JStack显然是因为它不能“走”一些堆栈).所以“jstack -F”输出似乎没用.

我们已经从“gdb”运行了堆栈转储,我们可以匹配旋转cpu的线程ID(我们发现使用“top”和每个线程视图 – “H”选项)与一个线程栈出现在gdb结果中,这是它的外观:

Thread 443 (Thread 0x7e5b90 (LWP 26310)):

#0 0x0115ebd3 in CompactibleFreeListSpace::block_size(HeapWord const*) const () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#1 0x01160ff9 in CompactibleFreeListSpace::prepare_for_compaction(CompactPoint*) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#2 0x0123456c in Generation::prepare_for_compaction(CompactPoint*) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#3 0x01229b2c in GenCollectedHeap::prepare_for_compaction() () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#4 0x0122a7fc in GenMarkSweep::invoke_at_safepoint(int,ReferenceProcessor*,bool) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#5 0x01186024 in CMSCollector::do_compaction_work(bool) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#6 0x011859ee in CMSCollector::acquire_control_and_collect(bool,bool) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#7 0x01185705 in ConcurrentMarkSweepGeneration::collect(bool,bool,unsigned int,bool) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#8 0x01227f53 in GenCollectedHeap::do_collection(bool,int) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#9 0x0115c7b5 in GenCollectorPolicy::satisfy_Failed_allocation(unsigned int,bool) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#10 0x0122859c in GenCollectedHeap::satisfy_Failed_allocation(unsigned int,bool) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#11 0x0158a8ce in VM_GenCollectForAllocation::doit() () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#12 0x015987e6 in VM_Operation::evaluate() () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#13 0x01597c93 in VMThread::evaluate_operation(VM_Operation*) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#14 0x01597f0f in VMThread::loop() () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#15 0x015979f0 in VMThread::run() () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#16 0x0145c24e in java_start(Thread*) () from /usr/java/jdk1.6.0_20/jre/lib/i386/server/libjvm.so

#17 0x00ccd46b in start_thread () from /lib/libpthread.so.0

#18 0x00bc2dbe in clone () from /lib/libc.so.6

看起来JVM线程正在进行一些CMS相关的工作.

我们检查了盒子上的内存使用情况,似乎有足够的内存可用,并且系统没有交换.

有没有人遇到这样的情况?它看起来像JVM的错误吗?

UPDATE

我已经获得了关于这个问题的更多信息(它发生在已经运行了7天以上的服务器上).

当JVM进入“挂起”状态时,它保持2小时,直到服务器被手动重新启动.我们已经获得了进程和gc日志的核心转储.我们也试图获得堆转储,但“jmap”失败.我们尝试使用jmap -F,但是在程序中断之前只有一个4Mb文件被写入异常(某个内存位置不可访问).

到目前为止,我认为最有趣的信息来自gc日志.似乎GC记录也停止了(可能在VM线程进入长循环时):

657501.199: [Full GC (System) 657501.199: [CMS: 400352K->313412K(524288K),2.4024120 secs] 660634K->313412K(878208K),[CMS Perm : 29455K->29320K(68568K)],2.4026470 secs] [Times: user=2.39 sys=0.01,real=2.40 secs]

657513.941: [GC 657513.941: [ParNew: 314624K->13999K(353920K),0.0228180 secs] 628036K->327412K(878208K),0.0230510 secs] [Times: user=0.08 sys=0.00,real=0.02 secs]

657523.772: [GC 657523.772: [ParNew: 328623K->17110K(353920K),0.0244910 secs] 642036K->330523K(878208K),0.0247140 secs] [Times: user=0.08 sys=0.00,real=0.02 secs]

657535.473: [GC 657535.473: [ParNew: 331734K->20282K(353920K),0.0259480 secs] 645147K->333695K(878208K),0.0261670 secs] [Times: user=0.11 sys=0.00,real=0.02 secs]

....

....

688346.765: [GC [1 CMS-initial-mark: 485248K(524288K)] 515694K(878208K),0.0343730 secs] [Times: user=0.03 sys=0.00,real=0.04 secs]

688346.800: [CMS-concurrent-mark-start]

688347.964: [CMS-concurrent-mark: 1.083/1.164 secs] [Times: user=2.52 sys=0.09,real=1.16 secs]

688347.964: [CMS-concurrent-preclean-start]

688347.969: [CMS-concurrent-preclean: 0.004/0.005 secs] [Times: user=0.00 sys=0.01,real=0.01 secs]

688347.969: [CMS-concurrent-abortable-preclean-start]

CMS: abort preclean due to time 688352.986: [CMS-concurrent-abortable-preclean: 2.351/5.017 secs] [Times: user=3.83 sys=0.38,real=5.01 secs]

688352.987: [GC[YG occupancy: 297806 K (353920 K)]688352.987: [Rescan (parallel),0.1815250 secs]688353.169: [weak refs processing,0.0312660 secs] [1 CMS-remark: 485248K(524288K)] 783055K(878208K),0.2131580 secs] [Times: user=1.13 sys

=0.00,real=0.22 secs]

688353.201: [CMS-concurrent-sweep-start]

688353.903: [CMS-concurrent-sweep: 0.660/0.702 secs] [Times: user=0.91 sys=0.07,real=0.70 secs]

688353.903: [CMS-concurrent-reset-start]

688353.912: [CMS-concurrent-reset: 0.008/0.008 secs] [Times: user=0.01 sys=0.00,real=0.01 secs]

688354.243: [GC 688354.243: [ParNew: 344928K->30151K(353920K),0.0305020 secs] 681955K->368044K(878208K),0.0308880 secs] [Times: user=0.15 sys=0.00,real=0.03 secs]

....

....

688943.029: [GC 688943.029: [ParNew: 336531K->17143K(353920K),0.0237360 secs] 813250K->494327K(878208K),0.0241260 secs] [Times: user=0.10 sys=0.00,real=0.03 secs]

688950.620: [GC 688950.620: [ParNew: 331767K->22442K(353920K),0.0344110 secs] 808951K->499996K(878208K),0.0347690 secs] [Times: user=0.11 sys=0.00,real=0.04 secs]

688956.596: [GC 688956.596: [ParNew: 337064K->37809K(353920K),0.0488170 secs] 814618K->515896K(878208K),0.0491550 secs] [Times: user=0.18 sys=0.04,real=0.05 secs]

688961.470: [GC 688961.471: [ParNew (promotion Failed): 352433K->332183K(353920K),0.1862520 secs]688961.657: [CMS

我怀疑这个问题与日志中的最后一行有关(我添加了一些“….”,以跳过一些不是有趣的行).

服务器停留在停留状态2小时(可能试图通过GC和压缩旧一代)这一事实似乎对我来说很奇怪.此外,gc log会突然停止该消息,并且没有其他任何东西被打印出来,可能是因为VM Thread进入某种无限循环(或者需要2个小时).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值