最近线上的一个tomcat总是无缘无故崩溃,tomcat日志里又没有报任何错误,于是调出JVM的崩溃日志查看,一般崩溃日志在启动目录下,比如tomcat的bin目录下,但是如果你用自己写的脚本启动的tomcat,则这个日志可能就在你放脚本的目录下。
#
# A fatal error has been detected by the Java Runtime Environment:#
# SIGSEGV (0xb) at pc=0xfe572980, pid=2383, tid=108 // SIGSEGV (0xb)这种问题一般是调用JNI或者使用了JNI的第三方库造成的,我们可以查'UNIX信号表",可以查到这个是因为非法访问了内存或者是访问了无效指针造成的。
#
# JRE version: Java(TM) SE Runtime Environment (7.0_80-b15) (build 1.7.0_80-b15)
# Java VM: Java HotSpot(TM) Server VM (24.80-b11 mixed mode solaris-sparc )
# Problematic frame:
# V [libjvm.so+0x572980] void Par_MarkFromRootsClosure::scan_oops_in_oop(HeapWord*)+0x8 // 这里V表示一种frame type,这里看到是垃圾回收时出现错误,这里表示是执行libjvm造成的
#
# Core dump written. Default location: /glodon/deploy/core or core.2383 //表示崩溃的core文件的位置,实际上没有找到,由于是用这个位置的脚本启动的,所以core文件生在这里
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
--------------- T H R E A D ---------------
Current thread (0x00501000): GCTaskThread [stack: 0x2e600000,0x2e680000] [id=108] // 这里说明是垃圾回收线程执行时崩溃
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000c
Registers:
G1=0x0000000f G2=0x00000001 G3=0xffffffff G4=0x00000000
G5=0x00000010 G6=0x00000000 G7=0xfa0b5200 Y=0x00000000
O0=0x00000022 O1=0x00000379 O2=0x00000000 O3=0x00000022
O4=0x00000021 O5=0x00000000 O6=0x2e67f880 O7=0xfe572cac
L0=0x00003ffe L1=0x00003fff L2=0x004e5df0 L3=0x00000001
L4=0x004e5dac L5=0x0000037a L6=0x00000000 L7=0x00504358
I0=0x2e67fa84 I1=0x53a5a710 I2=0x00000005 I3=0x006b2d38
I4=0x006b2d38 I5=0x00000000 I6=0x2e67f928 I7=0xfe572968
PC=0xfe572980 nPC=0xfe572984
Top of Stack: (sp=0x2e67f880)
0x2e67f880: 00003ffe 00003fff 004e5df0 00000001
0x2e67f890: 004e5dac 0000037a 00000000 00504358
0x2e67f8a0: 2e67fa84 53a5a710 00000005 006b2d38
0x2e67f8b0: 006b2d38 00000000 2e67f928 fe572968
0x2e67f8c0: fee63d24 0013c800 ff2352ec 00000000
此处省略N行......................................
I3=0x006b2d38 is an unknown value
I4=0x006b2d38 is an unknown value
I5=0x00000000 is an unknown value
I6=0x2e67f928 is an unknown value
I7=0xfe572968: JVM_GetClassDeclaredMethods+0x2998ec in /usr/jdk/instances/jdk1.7.0/jre/lib/sparc/server/libjvm.so at 0xfe000000
Stack: [0x2e600000,0x2e680000], sp=0x2e67f880, free space=510k //当前栈还剩余510K,够用
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x572980] void Par_MarkFromRootsClosure::scan_oops_in_oop(HeapWord*)+0x8
V [libjvm.so+0x572970] bool Par_MarkFromRootsClosure::do_bit(unsigned)+0xb0
V [libjvm.so+0x3ac368] bool BitMap::iterate(BitMapClosure*,unsigned,unsigned)+0x84
V [libjvm.so+0x5663f8] void CMSConcMarkingTask::do_scan_and_mark(int,CompactibleFreeListSpace*)+0x2b0
V [libjvm.so+0x565e5c] void CMSConcMarkingTask::work(unsigned)+0xc4
V [libjvm.so+0xbb7d38] void YieldingFlexibleGangWorker::loop()+0xd0
V [libjvm.so+0xa04a00] java_start+0x338
--------------- P R O C E S S ---------------
Java Threads: ( => current thread )
0x02852c00 JavaThread "qtp28852019-947" [_thread_blocked, id=1053, stack(0x24a80000,0x24b00000)]
0x052a5000 JavaThread "qtp28852019-940" [_thread_blocked, id=1046, stack(0x17c80000,0x17d00000)]
0x04e23800 JavaThread "qtp28852019-931" [_thread_blocked, id=1037, stack(0x17e80000,0x17f00000)]
0x02035c00 JavaThread "http-bio-8080-exec-111" daemon [_thread_in_vm, id=1036, stack(0x18180000,0x18200000)]
0x04e16c00 JavaThread "http-bio-8080-exec-110" daemon [_thread_blocked, id=1035, stack(0x18580000,0x18600000)]
此处省略N行......................................
0x009cbc00 JavaThread "Finalizer" daemon [_thread_blocked, id=113, stack(0x2df80000,0x2e000000)]
0x009ca400 JavaThread "Reference Handler" daemon [_thread_blocked, id=112, stack(0x2e080000,0x2e100000)]
0x00028c00 JavaThread "main" [_thread_in_native, id=2, stack(0xfde80000,0xfdf00000)]
Other Threads:
0x009c7400 VMThread [stack: 0x2e180000,0x2e200000] [id=111]
0x009f3800 WatcherThread [stack: 0x2d980000,0x2da00000] [id=119]
=>0x00501000 (exited) GCTaskThread [stack: 0x2e600000,0x2e680000] [id=108]
VM state:not at safepoint (normal execution) // 不在安全点,说明这个时候JVM执行的代码不在for循环结束或者方法结束的地方,也恰好说明调用的是JNI的东西,因为JVM调用JNI的时候不会等待其执行到安全点,关于安全点的问题,这里不讨论,大家可以查其他资料。
VM Mutex/Monitor currently owned by a thread: None
Heap //崩溃瞬间还是回收回来了,因此基本排除是因为JVM内存崩溃引起的问题。
par new generation total 235968K, used 21333K [0x36400000, 0x46400000, 0x46400000)
eden space 209792K, 8% used [0x36400000, 0x376654c0, 0x430e0000)
from space 26176K, 9% used [0x430e0000, 0x433501a0, 0x44a70000)
to space 26176K, 0% used [0x44a70000, 0x44a70000, 0x46400000)
concurrent mark-sweep generation total 474160K, used 436253K [0x46400000, 0x6330c000, 0xb6400000) // cms垃圾回收的老生代情况
concurrent-mark-sweep perm gen total 262144K, used 141342K [0xb6400000, 0xc6400000, 0xf6400000) // cms永久代情况
Card table byte_map: [0x35c00000,0x36202000] byte_map_base: 0x35a4e000
Polling page: 0xff3f6000
Code Cache [0xfbc00000, 0xfd4f8000, 0xfdc00000)
total_blobs=7341 nmethods=7026 adapters=247 free_code_cache=7377Kb largest_free_block=7400896
Compilation events (10 events):
Event: 14607.071 Thread 0x009ee400 8453 org.apache.coyote.http11.InternalInputBuffer::fill (178 bytes)
Event: 14607.077 Thread 0x009ee400 nmethod 8453 0xfc1ef488 code [0xfc1ef5a0, 0xfc1ef7c4]
Event: 14612.564 Thread 0x009ecc00 8454 % gboat2.base.core.web.MetadataSupportStrategy::getOperaList @ 146 (230 bytes)
Event: 14612.599 Thread 0x009ecc00 nmethod 8454% 0xfd4e05c8 code [0xfd4e07c0, 0xfd4e11d8]
Event: 14612.745 Thread 0x009ee400 8455 gboat2.base.core.web.MetadataSupportStrategy::getOperaList (230 bytes)
Event: 14612.797 Thread 0x009ee400 nmethod 8455 0xfd4ee208 code [0xfd4ee620, 0xfd4ef700]
Event: 14626.459 Thread 0x009ecc00 8456 com.sun.tools.javac.tree.TreeInfo::hasConstructors (34 bytes)
Event: 14626.462 Thread 0x009ecc00 nmethod 8456 0xfd0fec08 code [0xfd0fed00, 0xfd0fee10]
Event: 14643.412 Thread 0x009ee400 8457 com.itextpdf.text.pdf.PdfReader::readArray (71 bytes)
Event: 14643.424 Thread 0x009ee400 nmethod 8457 0xfd4dde88 code [0xfd4de000, 0xfd4de504]
GC Heap History (10 events): // 最后10次垃圾回收事件
Event: 14626.265 GC heap before // 这个是事件发生的时间戳,虚拟机(tomcat)启动以来的秒数
{Heap before GC invocations=803 (full 3): //这里的full 3表示从tomcat开始到现在一共进行了3次full类型的垃圾回收;“Heap before GC invocations=803”这个表示在当前垃圾回收之前,进行了多少次垃圾回收工作,这里表示进行了803次的垃圾回收
par new generation total 235968K, used 215681K [0x36400000, 0x46400000, 0x46400000) //新生代使用约256M(不到256,因为有一个survivor不能用)
eden space 209792K, 100% used [0x36400000, 0x430e0000, 0x430e0000)//约209M
from space 26176K, 22% used [0x44a70000, 0x450305d8, 0x46400000) //约26M,这个地址现在是from,下个阶段将变成to
to space 26176K, 0% used [0x430e0000, 0x430e0000, 0x44a70000) //约26M
concurrent mark-sweep generation total 474160K, used 434490K [0x46400000, 0x6330c000, 0xb6400000) //老生代并发清理,总共约463M,用去424M,剩余约39M
concurrent-mark-sweep perm gen total 262144K, used 141295K [0xb6400000, 0xc6400000, 0xf6400000) //永久代,总共约256M,用掉约141M,剩余约115M
Event: 14626.282 GC heap after // 和before比较起来,垃圾回收消耗时间27毫秒
Heap after GC invocations=804 (full 3): // 回收
par new generation total 235968K, used 3183K [0x36400000, 0x46400000, 0x46400000)
eden space 209792K, 0% used [0x36400000, 0x36400000, 0x430e0000)
from space 26176K, 12% used [0x430e0000, 0x433fbd30, 0x44a70000)
to space 26176K, 0% used [0x44a70000, 0x44a70000, 0x46400000)
concurrent mark-sweep generation total 474160K, used 435308K [0x46400000, 0x6330c000, 0xb6400000) // 剩余38M
concurrent-mark-sweep perm gen total 262144K, used 141295K [0xb6400000, 0xc6400000, 0xf6400000) // 不变
}
此处省略N行......................................
Event: 14692.050 GC heap before //相隔25秒
{Heap before GC invocations=807 (full 3): // 满了
par new generation total 235968K, used 214960K [0x36400000, 0x46400000, 0x46400000)
eden space 209792K, 99% used [0x36400000, 0x430dfbe8, 0x430e0000)
from space 26176K, 19% used [0x44a70000, 0x44f7c738, 0x46400000)
to space 26176K, 0% used [0x430e0000, 0x430e0000, 0x44a70000)
concurrent mark-sweep generation total 474160K, used 435958K [0x46400000, 0x6330c000, 0xb6400000) // 不变
concurrent-mark-sweep perm gen total 262144K, used 141342K [0xb6400000, 0xc6400000, 0xf6400000) // 增加27K
Event: 14692.067 GC heap after
Heap after GC invocations=808 (full 3): // 回收
par new generation total 235968K, used 2496K [0x36400000, 0x46400000, 0x46400000)
eden space 209792K, 0% used [0x36400000, 0x36400000, 0x430e0000)
from space 26176K, 9% used [0x430e0000, 0x433501a0, 0x44a70000)
to space 26176K, 0% used [0x44a70000, 0x44a70000, 0x46400000)
concurrent mark-sweep generation total 474160K, used 436253K [0x46400000, 0x6330c000, 0xb6400000) // 增加295K,剩余37M
concurrent-mark-sweep perm gen total 262144K, used 141342K [0xb6400000, 0xc6400000, 0xf6400000) // 不变120802K
}
Deoptimization events (10 events): // 非优化事件
Event: 14023.027 Thread 0x0382c400 Uncommon trap: reason=null_check action=make_not_entrant pc=0xfd3f6188
此处省略N行...................................... method=org.apache.struts2.jasper.compiler.TagLibraryInfoImpl.parseTLD(Lorg/apache/struts2/jasper/JspCompilationContext;Ljava/lang/String;Ljava/io/InputStream;Ljava/net/URL;)V @ 573
Internal exceptions (10 events): // 内部异常
Event: 14691.724 Thread 0x02035c00 Threw 0x41055a00 at /HUDSON/workspace/7u-2-build-solaris-sparc/jdk7u80/2329/hotspot/src/share/vm/prims/jvm.cpp:1319
Event: 14691.724 Thread 0x02035c00 Threw 0x410577a0 at /HUDSON/workspace/7u-2-build-solaris-sparc/jdk7u80/2329/hotspot/src/share/vm/prims/jvm.cpp:1319
此处省略N行......................................
Events (10 events):
Event: 14692.045 Executing VM operation: GenCollectForAllocation
Event: 14692.068 Executing VM operation: GenCollectForAllocation done
Event: 14692.068 Executing VM operation: CMS_Initial_Mark // 初始标记
Event: 14692.092 Executing VM operation: CMS_Initial_Mark done //
Event: 14692.093 Executing VM operation: RevokeBias
Event: 14692.096 Executing VM operation: RevokeBias done
Event: 14692.097 Executing VM operation: RevokeBias
Event: 14692.098 Executing VM operation: RevokeBias done
Event: 14692.099 Executing VM operation: RevokeBias
Event: 14692.101 Executing VM operation: RevokeBias done
Dynamic libraries:
0x00010000 /usr/jdk/instances/jdk1.7.0/jre/bin/java
0xff370000 /usr/lib/libthread.so.1
此处省略N行......................................
0x36220000 /usr/jdk/instances/jdk1.7.0/jre/lib/sparc/libjpeg.so
VM Arguments:
jvm_args: -Djava.util.logging.config.file=/GBP/apache-tomcat-7.0.57/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xms512m -Xmx2048m -Xmn256m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:PermSize=256m -XX:MaxPermSize=1024m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass -XX:ErrorFile=/var/log/java/java_error%p.log -Djava.endorsed.dirs=/GBP/apache-tomcat-7.0.57/endorsed -Dcatalina.base=/GBP/apache-tomcat-7.0.57 -Dcatalina.home=/GBP/apache-tomcat-7.0.57 -Djava.io.tmpdir=/GBP/apache-tomcat-7.0.57/temp
java_command: org.apache.catalina.startup.Bootstrap start
Launcher Type: SUN_STANDARD
Environment Variables:
JAVA_HOME=/usr/java
JRE_HOME=/usr/java/jre
CLASSPATH=:/usr/java/lib:/usr/java/bin:/usr/java/jre/lib
PATH=/usr/sbin:/usr/bin:/usr/java/bin:/usr/java/jre/bin:/GBP/apache-tomcat-7.0.57/bin:/usr/sfw/bin:/opt/csw/bin:/usr/ccs/bin:/usr/openwin/bin:/usr/dt/bin:/usr/platform/sun4v/sbin:/opt/sun/bin:/opt/SUNWexplo/bin:/opt/SUNWsneep/bin:/opt/CTEact/bin
LD_LIBRARY_PATH=/usr/lib:/usr/local/lib:/usr/openwin/lib
SHELL=/usr/bin/bash
Signal Handlers:
SIGSEGV: [libjvm.so+0xba1e58], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGBUS: [libjvm.so+0xba1e58], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGFPE: [libjvm.so+0x2257f4], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGPIPE: [libjvm.so+0x2257f4], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGXFSZ: [libjvm.so+0x2257f4], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGILL: [libjvm.so+0x2257f4], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
SIGUSR1: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGUSR2: SIG_DFL, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGQUIT: [libjvm.so+0xa08414], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIGHUP: SIG_IGN, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGINT: SIG_IGN, sa_mask[0]=0x00000000, sa_flags=0x00000000
SIGTERM: [libjvm.so+0xa08414], sa_mask[0]=0xffbffeff, sa_flags=0x00000004
SIG39: [libjvm.so+0xa0d058], sa_mask[0]=0x00000000, sa_flags=0x00000008
SIG40: [libjvm.so+0x2257f4], sa_mask[0]=0xffbffeff, sa_flags=0x0000000c
--------------- S Y S T E M ---------------
OS: Oracle Solaris 10 8/11 s10s_u10wos_17b SPARC
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
Assembled 23 August 2011
uname:SunOS 5.10 Generic_147440-01 sun4v
(T2 libthread)
rlimit: STACK 8192k, CORE infinity, NOFILE 65536, AS infinity
load average:1.18 1.11 1.05
CPU:total 256 v9, popc, vis1, vis2, vis3, blk_init, cbcond, sun4v, niagara_plus
Memory: 8k page, physical 267911168k(133769880k free)
vm_info: Java HotSpot(TM) Server VM (24.80-b11) for solaris-sparc JRE (1.7.0_80-b15), built on Apr 10 2015 18:48:54 by "java_re" with Sun Studio 12u1
time: Thu Mar 3 16:18:34 2016
elapsed time: 14704 seconds
上oracle官方查看并没有人提出这个bug,因此判断有可能是因为java调用JNI库造成的内存错误,盘了下项目中使用的技术,并且查看了tomcat打出的业务日志,发现在调用jmagick库的时候程序一直报错,这是个上传图片打水印的组件,调用了动态库造成的问题。刚好这个水印功能不需要,则关闭了这个功能,不再代用打水印的API,观察了一周,系统再也没有崩溃过。
当然了问题的排除过程,并没有这么顺利,由于之前对JVM的崩溃日志接触不多,查了不少资料。这里强调一句,网上的JVM崩溃场景不同,每个人遇到的问题也不一样,注重思路就行。比如我这个文章能提醒大家调用JNI可能会出现JVM崩溃的问题,大家按照这个思路排查了也许能解决自己的问题,这种问题一般排除的时候需要好多天,但是真正动手结局也许一分钟都用不到。