记一次线上OOM排查及调优过程

通过异常邮件,发现线上定时任务服务产生了OOM: java.lang.OutOfMemoryError: Java heap space

发现异常后,先重启线上服务,然后把未执行的定时任务恢复。

查看gc信息,发现项目启动没多久,平均200秒一次FullGC
 

> jstat -gcutil 6
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
  0.00  25.85  66.86  16.12  94.99  92.71    932   11.408    34   24.540   35.947
  
> jcmd 6 VM.uptime
6:
6877.407 s

查看内存信息,发现年轻代只有230m,而且占用非常高
 

> jcmd 6 GC.heap_info
6:
 def new generation   total 235968K, used 170108K [0x00000000d0000000, 0x00000000e0000000, 0x00000000e0000000)
  eden space 209792K,  77% used [0x00000000d0000000, 0x00000000d9f83a68, 0x00000000dcce0000)
  from space 26176K,  25% used [0x00000000de670000, 0x00000000ded0b668, 0x00000000e0000000)
  to   space 26176K,   0% used [0x00000000dcce0000, 0x00000000dcce0000, 0x00000000de670000)
 tenured generation   total 524288K, used 84496K [0x00000000e0000000, 0x0000000100000000, 0x0000000100000000)
   the space 524288K,  16% used [0x00000000e0000000, 0x00000000e5284118, 0x00000000e5284200, 0x0000000100000000)
 Metaspace       used 126246K, capacity 132522K, committed 132904K, reserved 1167360K
  class space    used 14759K, capacity 15854K, committed 15920K, reserved 1048576K

查看VM参数,发现MaxHeapSize只有805306368byte/1024/1024=768M
很明显,堆内存设置小了

jcmd 6 VM.flags
6:
-XX:CICompilerCount=2 -XX:InitialHeapSize=50331648 -XX:MaxHeapSize=805306368 -XX:MaxNewSize=268435456 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=16777216 -XX:OldSize=33554432 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops

在预发布环境进行调参
调整启动参数,增加堆内存到2G,增加GC日志,设置OOM时导出堆内存

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=... -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -verbose:gc -Xloggc:... -Xms2G -Xmx2G

项目启动后,查看内存信息, 年轻代614M,应该够用

> jcmd 6 GC.run
6:
Command executed successfully
> jcmd 6 GC.heap_info
6:
 def new generation   total 629120K, used 13644K [0x0000000080000000, 0x00000000aaaa0000, 0x00000000aaaa0000)
  eden space 559232K,   2% used [0x0000000080000000, 0x0000000080d530d0, 0x00000000a2220000)
  from space 69888K,   0% used [0x00000000a6660000, 0x00000000a6660000, 0x00000000aaaa0000)
  to   space 69888K,   0% used [0x00000000a2220000, 0x00000000a2220000, 0x00000000a6660000)
 tenured generation   total 1398144K, used 49997K [0x00000000aaaa0000, 0x0000000100000000, 0x0000000100000000)
   the space 1398144K,   3% used [0x00000000aaaa0000, 0x00000000adb737e0, 0x00000000adb73800, 0x0000000100000000)
 Metaspace       used 86540K, capacity 90342K, committed 90496K, reserved 1128448K
  class space    used 10635K, capacity 11343K, committed 11392K, reserved 1048576K

项目启动后,查看gc信息,发现有4次FullGC,除去上面手动触发的FullGC,推测项目启动过程中发生了3次FullGC,很奇怪

> jstat -gcutil 6
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
  0.00   0.00  13.14   3.58  95.62  93.36      5    0.142     4    0.563    0.705

查看GC日志, 3次FullGC发生的时间2022-03-28T16:50:11,2022-03-28T16:50:15,2022-03-28T16:50:22,而且每次都是Metadata空间不足引发的

fgrep Full gc.log
2022-03-28T16:50:11.280+0800: 3.307: [Full GC (Metadata GC Threshold) 2022-03-28T16:50:11.280+0800: 3.307: [Tenured: 0K->8152K(1398144K), 0.0883040 secs] 536878K->8152K(2027264K), [Metaspace: 20515K->20515K(1069056K)], 0.0883757 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
2022-03-28T16:50:15.447+0800: 7.474: [Full GC (Metadata GC Threshold) 2022-03-28T16:50:15.447+0800: 7.474: [Tenured: 8152K->16623K(1398144K), 0.0895848 secs] 291984K->16623K(2027264K), [Metaspace: 34153K->34153K(1081344K)], 0.0896720 secs] [Times: user=0.09 sys=0.00, real=0.08 secs]
2022-03-28T16:50:22.017+0800: 14.044: [Full GC (Metadata GC Threshold) 2022-03-28T16:50:22.017+0800: 14.044: [Tenured: 16623K->33960K(1398144K), 0.1533989 secs] 510804K->33960K(2027264K), [Metaspace: 56634K->56634K(1101824K)], 0.1534892 secs] [Times: user=0.15 sys=0.01, real=0.15 secs]
2022-03-28T16:53:18.692+0800: 190.719: [Full GC (System.gc()) 2022-03-28T16:53:18.692+0800: 190.719: [Tenured: 33960K->49997K(1398144K), 0.2315362 secs] 477467K->49997K(2027264K), [Metaspace: 86536K->86536K(1128448K)], 0.2316727 secs] [Times: user=0.21 sys=0.00, real=0.23 secs]

查看日志,发现3次FullGC的时间在项目启动过程中发生的,符合上述推断

2022-03-28 16:50:11.575  INFO SpringBootBanner Line:62 [||||] -  :: Spring Boot ::        (v2.2.5.RELEASE)
.
.
.
2022-03-28 16:50:29.669  INFO Application Line:61 [||||] - Started JobAdminApplication in 20.655 seconds (JVM running for 21.696)

3次FullGC都是Metadata不足引发的,说明初始Metadata太小了,发生了3次扩容
那么应该增加初始Metaspace大小
查看内存,发现Metaspace大小为89m,因此设置初始Metaspace大小为90M,可以避免启动过程中的FullGC

> jcmd 6 GC.heap_info
6:
 def new generation   total 629120K, used 256799K [0x0000000080000000, 0x00000000aaaa0000, 0x00000000aaaa0000)
  eden space 559232K,  45% used [0x0000000080000000, 0x000000008fac7fc8, 0x00000000a2220000)
  from space 69888K,   0% used [0x00000000a6660000, 0x00000000a6660000, 0x00000000aaaa0000)
  to   space 69888K,   0% used [0x00000000a2220000, 0x00000000a2220000, 0x00000000a6660000)
 tenured generation   total 1398144K, used 49997K [0x00000000aaaa0000, 0x0000000100000000, 0x0000000100000000)
   the space 1398144K,   3% used [0x00000000aaaa0000, 0x00000000adb737e0, 0x00000000adb73800, 0x0000000100000000)
 Metaspace       used 87689K, capacity 91522K, committed 91776K, reserved 1130496K
  class space    used 10638K, capacity 11347K, committed 11392K, reserved 1048576K

增加参数 -XX:MetaspaceSize=90m,重启服务,查看gc和内存
启动过程中没有发生FullGC

> jstat -gcutil 6
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
  0.00  57.06  83.22   0.49  95.57  93.31      7    0.400     0    0.000    0.400
> jcmd 6 VM.uptime
6:
48.879 s
> jcmd 6 GC.heap_info
6:
 def new generation   total 629120K, used 545143K [0x0000000080000000, 0x00000000aaaa0000, 0x00000000aaaa0000)
  eden space 559232K,  90% used [0x0000000080000000, 0x000000009ed6ba70, 0x00000000a2220000)
  from space 69888K,  57% used [0x00000000a6660000, 0x00000000a8d52478, 0x00000000aaaa0000)
  to   space 69888K,   0% used [0x00000000a2220000, 0x00000000a2220000, 0x00000000a6660000)
 tenured generation   total 1398144K, used 6802K [0x00000000aaaa0000, 0x0000000100000000, 0x0000000100000000)
   the space 1398144K,   0% used [0x00000000aaaa0000, 0x00000000ab144820, 0x00000000ab144a00, 0x0000000100000000)
 Metaspace       used 85629K, capacity 89366K, committed 89728K, reserved 1128448K
  class space    used 10596K, capacity 11273K, committed 11392K, reserved 1048576K


至此,此次OOM和调优完成

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值