目录
问题追溯
发下现场的java服务一天会down掉一次,查看日志发现是内存不足,截取部分日志如下
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at com.ztesoft.zsmart.zcm.agent.bss30.scripts.common.bigdata.util.HttpUtil.doGet(HttpUtil.java:52)
at com.ztesoft.zsmart.zcm.agent.bss30.scripts.common.bigdata.util.HttpUtil.doGet(HttpUtil.java:29)
at com.ztesoft.zsmart.zcm.agent.bss30.scripts.common.bigdata.util.HttpUtil$doGet.call(Unknown Source)
at com.ztesoft.zsmart.zcm.agent.bss30.scripts.common.bigdata.jmx.BdmYarnResourceMgrPerformance.doKpi(BdmYarnResourceMgrPerformance.groovy:72)
问题预览
首先看下java服务的内存动态变化
ps -ef|grep java # 确认java的pid
jmap -heap 34674 # 查看进程堆内存使用情况,包括使用的GC算法、堆配置参数和各代中堆内存使用情况
服务器端执行信息
[zte@A5-401-NF5280M4-2017-201 ~]$ jmap -heap 34674
Attaching to process ID 34674, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.75-b04using thread-local object allocation.
Mark Sweep Compact GCHeap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 536870912 (512.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 2
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 85983232 (82.0MB)
G1HeapRegionSize = 0 (0.0MB)Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 134545408 (128.3125MB)
used = 60550920 (57.74585723876953MB)
free = 73994488 (70.56664276123047MB)
45.004077731140406% used
Eden Space:
capacity = 119603200 (114.0625MB)
used = 53110136 (50.64977264404297MB)
free = 66493064 (63.41272735595703MB)
44.40528012628425% used
From Space:
capacity = 14942208 (14.25MB)
used = 7440784 (7.0960845947265625MB)
free = 7501424 (7.1539154052734375MB)
49.79708487527412% used
To Space:
capacity = 14942208 (14.25MB)
used = 0 (0.0MB)
free = 14942208 (14.25MB)
0.0% used
tenured generation:
capacity = 298254336 (284.4375MB)
used = 257619424 (245.68502807617188MB)
free = 40634912 (38.752471923828125MB)
86.37575146602396% used
Perm Generation:
capacity = 50921472 (48.5625MB)
used = 50913208 (48.55461883544922MB)
free = 8264 (0.00788116455078125MB)
99.98377108972812% used22382 interned Strings occupying 2211032 bytes.
jstat -gcutil 34674 5s 30 # 5s 刷新一次 输出30次 (JVM统计检测工具)
语法格式
jstat [ generalOption | outputOptions vmid [interval[s|ms] [count]] ]
服务端执行信息
[zte@A5-401-NF5280M4-2017-201 ~]$ jstat -gcutil 34674 5s 30
S0 S1 E O P YGC YGCT FGC FGCT GCT
50.98 0.00 86.45 87.26 99.98 786 7.247 8 1.341 8.588
50.98 0.00 86.77 87.26 99.98 786 7.247 8 1.341 8.588
50.98 0.00 87.03 87.26 99.98 786 7.247 8 1.341 8.588
50.98 0.00 87.29 87.26 99.98 786 7.247 8 1.341 8.588
50.98 0.00 87.56 87.26 99.98 786 7.247 8 1.341 8.588
50.98 0.00 87.78 87.26 99.98 786 7.247 8 1.341 8.588
问题排查
导出堆内存信息
jmap -dump:live,format=b,file=/home/zte/heap/heap-dump.bin 34674
jmap语法格式
jmap [option] pid
jmap [option] executable core
jmap [option] [server-id@]remote-hostname-or-ip
服务器端执行信息
[zte@A5-401-NF5280M4-2017-201 heap]$ jmap -dump:live,format=b,file=/home/zte/heap/heap-dump.bin 34674
Dumping heap to /home/zte/heap/heap-dump.bin ...
Heap dump file created
[zte@A5-401-NF5280M4-2017-201 heap]$ ls
agent2090408_bin.txt agent2090409.txt heap-dump2090409_02.bin heap-dump2090409.bin heap-dump.bin
[zte@A5-401-NF5280M4-2017-201 heap]$
这里是导出的格式是二进制bin,导出后缀选择.bin,刚开始使用的.txt后缀发现使用 Memory Analyzer (MAT)无法解析
MAT工具分析
修改MemoryAnalyzer.ini中的-Xmx2048m 参数为2G,默认你的电脑内存大于2G
打开File->Open Heap Dump
逐步分析,查看Percentage 内存占用百分比较高的地方多是堆栈点
可以看的出来是JmsServiceFactory类下导致了HashMap对象信息的堆积,具体原因还要到代码中去确认,这里是定位内存溢出位置
参考: