本文对Javacore文件常用的模块进行解读
javacore基本内容
Javacore,也可以称为“threaddump”或是“javadump”,它是 Java 提供的一种诊断特性,能够提供一份可读的当前运行的 JVM 中线程使用情况的快照。即在某个特定时刻,JVM 中有哪些线程在运行,每个线程执行到哪一个类,哪一个方法。应用程序如果出现不可恢复的错误或是内存泄露,就会自动触发 Javacore 的生成。而为了性能问题诊断的需要,我们也会主动触发生成 Javacore。在 AIX、Linux、Solaris 环境中,我们通常使用 kill -3 <PID> 产生该进程的 Javacore。IBM Java6 中产生 Javacore 的详细方法可以参考文章 [1]。对于 IBM JVM,AIX 平台上的 Javacore 会被写到 javacore.<date>.<time>.<PID>.<sequence>.txt 中。对于 Oracle JVM,Javacore 被附加到 native_stdout.txt。Javacore 的内容有两列,第一列是“类型”,第二列表示“数据”,如清单 1 所示:
1TISIGINFO Dump Event "user" (00004000) received 1TIDATETIME Date: 2013/12/22 at 23:05:18 1TIFILENAME Javacore filename: /usr/WebSphere/AppServer/profiles/demo_solr/javacore.20131222.230518.7995516.0004.txt
通常情况下,Javacore 中除了线程信息外,还能提供关于操作系统,应用程序环境,线程,程序调用栈,锁,监视器和内存使用等相关信息。为了便于分析,Javacores 的每一段的开头,都会用“----------”和上一信息块区分开来。每一信息块的标题会以“=========”来标识,很容易找到,如清单 2 所示:
NULL ------------------------------------------------------------------------ 0SECTION GPINFO subcomponent dump routine NULL ================================ 2XHOSLEVEL OS Level : AIX 7.1 2XHCPUS Processors - 3XHCPUARCH Architecture : ppc64 3XHNUMCPUS How Many : 8 3XHNUMASUP NUMA is either not supported or has been disabled by user NULL 1XHERROR2 Register dump section only produced for SIGSEGV, SIGILL or SIGFPE. NULL
Javacore 文件中,每行都包含一个标签,这个标签最多由 15 个字符组成。第一位数字表示信息的详细级别,级别越高代表信息越详细。接着的两个字符是段标题的缩写,例如:“CI”表示 Command line interpreter,“CL”表示 Class loader,“LK”表示 Locking,“ST”表示 Storage,“TI”表示 Title,“XE”表示 Execution engine 等。余下的字符表示信息的概述。如下清单 3 所示:
3XMTHREADINFO "Thread-18" J9VMThread:0x00000000308DA900, j9thread_t:0x0000010016C4F2E0, java/lang/Thread:0x000000004136E3E8, state:P, prio=5
虽然不同版本的 JVM 所产生的 Javacore 的格式会稍有不同,但基本都包含下面几个内容:
TITLE 信息块:描述 Javacore 产生的原因,时间以及文件的路径。常见的 Javacore 产生的原因可以参考文章 [2]。最常见的有下面三种:
user:SIGQUIT 信号
gpf:程序一般保护性错误导致系统崩溃
systhrow:JVM 内部抛出的异常
GPINFO 信息块:GPF(一般保护性错误)信息
ENVINFO 信息块:系统运行时的环境和 JVM 参数
MEMINFO 信息块:内存使用情况和垃圾回收情况
LOCKS 信息块:用户监视器(monitor)和系统监视器(monitor)情况
THREADS 信息块:所有 java 线程的状态信息和执行堆栈
CLASSES 信息块:类加载信息
Javacore 中的线程可分为以下几种状态:
死锁(Deadlock)【重点关注】:一般指多个线程调用间,进入相互资源占用,导致一直等待无法释放的情况。
执行中(Runnable)【重点关注】:一般指该线程正在执行状态中,该线程占用了资源,正在处理某个请求,有可能在对某个文件操作,有可能进行数据类型等转换等。
等待资源(Waiting on condition)【重点关注】:等待资源,如果堆栈信息明确是应用代码,则证明该线程正在等待资源,一般是大量读取某资源、且该资源采用了资源锁的情况下,线程进入等待状态。又或者,正在等待其他线程的执行等。
等待监控器检查资源(Waiting on monitor)
暂停(Suspended)
对象等待中(Object.wait())
阻塞(Blocked)【重点关注】:指当前线程执行过程中,所需要的资源长时间等待却一直未能获取到,被容器的线程管理器标识为阻塞状态,可以理解为等待资源超时的线程。这种情况在应用的日志中,一般可以看到 CPU 饥渴,或者某线程已执行了较长时间的信息。
停止(Parked)
通过对 Javacore 数据的分析经验,结合对具体应用代码逻辑的理解,有经验的工程师可以直接通过文本编辑器查看原始 Javacore 文件来分析当前应用程序的运行状态。一般初学者则需要通过一些工具进行更直观的分析。
NATIVEMEMINFO解析
The NATIVEMEMINFO section of a Javadump provides information about the native memory allocated by the Java? runtime environment.
Native memory is memory requested from the operating system using library functions such as malloc() and mmap().
粗略的查看了openj9的源码,这个部分的内存是jvm运行时内存分配时累加的,比如线程Threads,每创建一个线程就加一个固定大小的值,如stacksize。与jvm占用的实际内存RSS是有差别的
0SECTION NATIVEMEMINFO subcomponent dump routine NULL ================================= 0MEMUSER 1MEMUSER JRE: 1,035,485,832 bytes / 13912 allocations //小于使用pmap -x pid命令看到的占用内存,但大于RSS。由于是占用内存,甚至会大于容器的限制内存 1MEMUSER | 2MEMUSER +--VM: 747,440,432 bytes / 10038 allocations 2MEMUSER | | 3MEMUSER | +--Classes: 51,006,872 bytes / 4156 allocations //稍大于MEMINFO区域Class Memory的Total memory 2MEMUSER | | 3MEMUSER | +--Memory Manager (GC): 468,154,296 bytes / 1381 allocations 3MEMUSER | | | 4MEMUSER | | +--Java Heap: 456,192,000 bytes / 1 allocation //约等于-Xmx 3MEMUSER | | | 4MEMUSER | | +--Other: 11,962,296 bytes / 1380 allocations 2MEMUSER | | 3MEMUSER | +--Threads: 36,987,912 bytes / 579 allocations //线程占用内存,每创建一个线程+stacksize/padsize 3MEMUSER | | | 4MEMUSER | | +--Java Stack: 2,340,752 bytes / 104 allocations //java栈 3MEMUSER | | | 4MEMUSER | | +--Native Stack: 33,816,576 bytes / 105 allocations //本地栈,可在Threads部分查看每一个线程的native stack大小(大部分260k左右) 3MEMUSER | | | 4MEMUSER | | +--Other: 830,584 bytes / 370 allocations 2MEMUSER | | 3MEMUSER | +--Trace: 971,456 bytes / 532 allocations 2MEMUSER | | 3MEMUSER | +--JVMTI: 17,776 bytes / 13 allocations 2MEMUSER | | 3MEMUSER | +--JNI: 969,456 bytes / 2379 allocations 2MEMUSER | | 3MEMUSER | +--Port Library: 187,755,888 bytes / 122 allocations 3MEMUSER | | | 4MEMUSER | | +--Unused <32bit allocation regions: 187,736,832 bytes / 1 allocation 3MEMUSER | | | 4MEMUSER | | +--Other: 19,056 bytes / 121 allocations 2MEMUSER | | 3MEMUSER | +--Other: 1,576,776 bytes / 876 allocations 1MEMUSER | 2MEMUSER +--JIT: 284,273,712 bytes / 946 allocations // 可通过参数禁用,禁用后这部分内存释放 2MEMUSER | | 3MEMUSER | +--JIT Code Cache: 268,435,456 bytes / 1 allocation // 与MEMINFO区域JIT Code Cache的Totalmemory相等 2MEMUSER | | 3MEMUSER | +--JIT Data Cache: 6,291,648 bytes / 3 allocations // 稍大于MEMINFO区域的 Total Memory 2MEMUSER | | 3MEMUSER | +--Other: 9,546,608 bytes / 942 allocations 1MEMUSER | 2MEMUSER +--Class Libraries: 3,242,920 bytes / 336 allocations 2MEMUSER | | 3MEMUSER | +--VM Class Libraries: 3,242,920 bytes / 336 allocations 3MEMUSER | | | 4MEMUSER | | +--sun.misc.Unsafe: 2,222,320 bytes / 88 allocations 4MEMUSER | | | | 5MEMUSER | | | +--Direct Byte Buffers: 59,840 bytes / 47 allocations 4MEMUSER | | | | 5MEMUSER | | | +--Other: 2,162,480 bytes / 41 allocations 3MEMUSER | | | 4MEMUSER | | +--Other: 1,020,600 bytes / 248 allocations 1MEMUSER | 2MEMUSER +--Unknown: 528,768 bytes / 2592 allocations NULL NULL ------------------------------------------------------------------------
文档http://www-01.ibm.com/support/docview.wss?mhq=NATIVEMEMINFO%20subcomponent%20dump%20routine&mhsrc=ibmsearch_a&uid=isg3T1025759解释了top , pmap -x pid命令打印的占用内存为何会大于NATIVEMEMINFO的JRE,感兴趣的可以自行阅读。总结就是pmap -x命令查看的占用内存还包括了被操作系统引入的,我们可以通过设置export MALLOC_ARENA_MAX=1来排除干扰。
注意这里说的是占用内存,使用top,pmap -x pid命令打印的实际进程内存RSS实际上与在Paas平台的监控页面看到的使用内存是一致的
MEMINFO
0SECTION MEMINFO subcomponent dump routine NULL ================================= NULL 1STHEAPTYPE Object Memory //堆内存,可与jca.jar工具打开的界面对应 NULL id start end size space/region 1STHEAPSPACE 0x00007FDD9007C0C0 -- -- -- Generational 1STHEAPREGION 0x00007FDD9007C5B0 0x00000000E4D00000 0x00000000E9E00000 0x0000000005100000 Generational/Tenured Region 1STHEAPREGION 0x00007FDD9007C290 0x00000000FCC40000 0x00000000FFAD0000 0x0000000002E90000 Generational/Nursery Region 1STHEAPREGION 0x00007FDD9007C1A0 0x00000000FFAD0000 0x0000000100000000 0x0000000000530000 Generational/Nursery Region NULL 1STHEAPTOTAL Total memory: 139198464 (0x00000000084C0000) 1STHEAPINUSE Total memory in use: 46194984 (0x0000000002C0E128) // 实际使用堆内存 1STHEAPFREE Total memory free: 93003480 (0x00000000058B1ED8) NULL 1STSEGTYPE Internal Memory NULL segment start alloc end type size 1STSEGMENT 0x00007FDD90057AE8 0x00007FDD7943A030 0x00007FDD794F9550 0x00007FDD7953A030 0x00800040 0x0000000000100000 1STSEGMENT 0x00007FDD90057A28 0x00007FDD7963B030 0x00007FDD7973B010 0x00007FDD7973B030 0x00800040 0x0000000000100000 1STSEGMENT 0x00007FDD90057968 0x00007FDD7973C030 0x00007FDD7983C030 0x00007FDD7983C030 0x00800040 0x0000000000100000 1STSEGMENT 0x00007FDD90057728 0x00007FDD79B3C030 0x00007FDD79C3C030 0x00007FDD79C3C030 0x00800040 0x0000000000100000 1STSEGMENT 0x00007FDD900577E8 0x00007FDD791CF030 0x00007FDD792CF030 0x00007FDD792CF030 0x00800040 0x0000000000100000 1STSEGMENT 0x00007FDD900575A8 0x00007FDD8D787030 0x00007FDD8D887030 0x00007FDD8D887030 0x00800040 0x0000000000100000 1STSEGMENT 0x00007FDD90057668 0x00007FDD7A61D030 0x00007FDD7A71D030 0x00007FDD7A71D030 0x00800040 0x0000000000100000 1STSEGMENT 0x00007FDD900574E8 0x00007FDD8D989030 0x00007FDD8DA89030 0x00007FDD8DA89030 0x00800040 0x0000000000100000 NULL 1STSEGTOTAL Total memory: 8388608 (0x0000000000800000) 1STSEGINUSE Total memory in use: 8123648 (0x00000000007BF500) 1STSEGFREE Total memory free: 264960 (0x0000000000040B00) NULL 1STSEGTYPE Class Memory //类文件信息内存,存于非堆 NULL segment start alloc end type size 1STSEGMENT 0x00007FDCFC034DA8 0x0000000002AEABD0 0x0000000002AEB030 0x0000000002AEB030 0x00010040 0x0000000000000460 1STSEGMENT 0x00007FDCFC0342E8 0x00007FDCFC034900 0x00007FDCFC034C40 0x00007FDCFC034D10 0x00020040 0x0000000000000410 1STSEGMENT 0x00007FDD90058318 0x0000000001ABF028 0x0000000001AC7028 0x0000000001AC7028 0x00010040 0x0000000000008000 1STSEGMENT 0x00007FDD90058258 0x00007FDD901C7650 0x00007FDD901E6F18 0x00007FDD901E7650 0x00020040 0x0000000000020000 1STSEGMENT 0x00007FDD90058198 0x0000000001AB6FD8 0x0000000001ABEFD8 0x0000000001ABEFD8 0x00010040 0x0000000000008000 1STSEGMENT 0x00007FDD900580D8 0x00007FDD95715290 0x00007FDD957158A0 0x00007FDD957158A0 0x04020104 0x0000000000000640 1STSEGMENT 0x00007FDD90058018 0x00007FDD95714C70 0x00007FDD95715250 0x00007FDD95715250 0x04200104 0x0000000000000610 1STSEGMENT 0x00007FDD90057F58 0x0000000001AAEF88 0x0000000001AB6F88 0x0000000001AB6F88 0x00010040 0x0000000000008000 1STSEGMENT 0x00007FDD90057E98 0x00007FDD94008030 0x00007FDD94027D18 0x00007FDD94028030 0x00020040 0x0000000000020000 NULL 1STSEGTOTAL Total memory: 47917352 (0x0000000002DB2928) 1STSEGINUSE Total memory in use: 47530528 (0x0000000002D54220) //javaaf的应用大概为30M,dexcloud大概为50m 1STSEGFREE Total memory free: 386824 (0x000000000005E708) NULL 1STSEGTYPE JIT Code Cache NULL segment start alloc end type size 1STSEGMENT 0x00007FDD9009BC08 0x00007FDD7B400000 0x00007FDD7BAAD160 0x00007FDD8B400000 0x00000068 0x0000000010000000 NULL 1STSEGTOTAL Total memory: 268435456 (0x0000000010000000) //JIT 1STSEGINUSE Total memory in use: 7000416 (0x00000000006AD160) 1STSEGFREE Total memory free: 261435040 (0x000000000F952EA0) 1STSEGLIMIT Allocation limit: 268435456 (0x0000000010000000) NULL 1STSEGTYPE JIT Data Cache NULL segment start alloc end type size 1STSEGMENT 0x00007FDD9009C1F8 0x00007FDC83D3C030 0x00007FDC83F3C030 0x00007FDC83F3C030 0x00000048 0x0000000000200000 1STSEGMENT 0x00007FDD9009C138 0x00007FDD79D41030 0x00007FDD79F41030 0x00007FDD79F41030 0x00000048 0x0000000000200000 1STSEGMENT 0x00007FDD9009C078 0x00007FDD8BA79030 0x00007FDD8BC79030 0x00007FDD8BC79030 0x00000048 0x0000000000200000 NULL 1STSEGTOTAL Total memory: 6291456 (0x0000000000600000) 1STSEGINUSE Total memory in use: 6291456 (0x0000000000600000) 1STSEGFREE Total memory free: 0 (0x0000000000000000) 1STSEGLIMIT Allocation limit: 402653184 (0x0000000018000000) NULL
重要参数及FAQ
-
openj9的本机堆栈大小缺省值为400K,可以使用-Xss<size>来调整大小
-
java线程分配的native stack可以在THREADS部分的每一个线程查看,如下0x41000=266240 ,大概260KB
3XMTHREADINFO "pool-65-thread-1" J9VMThread:0x0000000003A58600, omrthread_t:0x00007F1610132850, java/lang/Thread:0x00000000E774EA80, state:P, prio=5 3XMJAVALTHREAD (java/lang/Thread getId:0x102, isDaemon:false) 3XMTHREADINFO1 (native thread ID:0x125, native priority:0x5, native policy:UNKNOWN, vmstate:P, vm thread flags:0x000a0001) 3XMTHREADINFO2 (native stack address range from:0x00007F178569D000, to:0x00007F17856DE000, size:0x41000) 3XMCPUTIME CPU usage total: 0.002599323 secs, current category="Application"