在线服务的内存泄露问题

最近发现一台机器堆内存使用率大约100%,Web server相应极慢,导致客户端多次报告Read Timeout.
跟踪Resin的jvm-default log,发现已经持续告警:

[15-11-13 09:48:57.864] {CoordinatorThread[]-3} AlarmClock slow alarm Alarm[alarm[WebAppExpandDeployGenerator[null]]] 27901ms coordinator-delta 31071ms
[15-11-13 09:48:57.864] {CoordinatorThread[]-3} AlarmClock slow alarm Alarm[alarm[WebAppExpandDeployGenerator[null]]] 26071ms coordinator-delta 31071ms
[15-11-13 09:49:02.865] {CoordinatorThread[]-3} AlarmClock slow alarm Alarm[alarm[HostExpandDeployGenerator[null]]] 17393ms coordinator-delta 22394ms

1.通过jmap -heap查看一下堆内存使用情况,相当震惊:新生代和老年代使用率都接近100%.

[work@lg-b2c-kfs-webchat-app02 ~]$ jmap -heap 27000
Attaching to process ID 27000, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.12-b01

using thread-local object allocation.
Parallel GC with 8 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 4294967296 (4096.0MB)
   NewSize          = 1310720 (1.25MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5439488 (5.1875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 268435456 (256.0MB)

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 1040121856 (991.9375MB)
   used     = 978730744 (933.3903732299805MB)
   free     = 61391112 (58.54712677001953MB)
   94.09770002948578% used
From Space:
   capacity = 26935296 (25.6875MB)
   used     = 0 (0.0MB)
   free     = 26935296 (25.6875MB)
   0.0% used
To Space:
   capacity = 22675456 (21.625MB)
   used     = 0 (0.0MB)
   free     = 22675456 (21.625MB)
   0.0% used
PS Old Generation
   capacity = 2863333376 (2730.6875MB)
   used     = 2863333288 (2730.68741607666MB)
   free     = 88 (8.392333984375E-5MB)
   99.99999692665895% used
PS Perm Generation
   capacity = 78249984 (74.625MB)
   used     = 78185600 (74.5635986328125MB)
   free     = 64384 (0.0614013671875MB)
   99.91772011097153% used

2.通过jmap -histo查看一下堆中的对象信息,发现业务对象Terminal和HttpChatContext实例数已经达到两百多万,正常情况下应该维持在几百到几千。更奇怪的是居然有大量的java.lang.ThreadLocal$ThreadLocalMap实例对象。

[work@lg-b2c-kfs-webchat-app02 log]$ jmap -histo 27000|less

 num     #instances         #bytes  class name
----------------------------------------------
   1:      11926272      920079152  [C
   2:      22979344      735339008  java.lang.ThreadLocal$ThreadLocalMap$Entry
   3:      22978102      551474448  java.util.concurrent.locks.ReentrantReadWriteLock$Sync$HoldCounter
   4:      11795560      377457920  java.lang.String
   5:           259      167801200  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
   6:       4237670      135605440  java.util.concurrent.ConcurrentHashMap$HashEntry
   7:       1611294      128911920  [Ljava.util.HashMap$Entry;
   8:       2092579      100443792  com.xiaomi.kfs.chat.core.domain.Terminal
   9:       2350789       94031560  java.util.LinkedHashMap$Entry
  10:       2092583       83703320  java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync
  11:         28486       59237952  [I
  12:       1584956       50718592  java.util.HashMap$Entry
  13:       2092580       50221920  java.util.concurrent.locks.ReentrantReadWriteLock
  14:       2077571       49861704  com.xiaomi.kfs.chat.service.support.HttpChatContext
  15:        783207       43859592  java.util.LinkedHashMap
  16:        827778       39733344  java.util.HashMap
  17:         13066       34390136  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
  18:       2092583       33481328  java.util.concurrent.locks.ReentrantReadWriteLock$Sync$ThreadLocalHoldCounter
  19:       2092583       33481328  java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock
  20:       2092580       33481280  java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock
  21:       1564801       25036816  java.util.HashMap$EntrySet
  22:        163768       22473320  [B

3.可以确认出现了内存泄露,Terminal和HttpChatContext存在未被回收的现象。使用工具来观察一下这两种类对象是通过哪些路径和GC Roots关联上的。我们通过dump快照和JProfiler来进行进一步分析:

jmap -dump:format=b,live,file=/home/work/tmp/terminalMemoryLeak.hprof 27000

4.通过JProfiler,我们可以非常清晰地看到各类对象的数量和内存占用情况。
AllObjects.png

5.选择com.xiaomi.kfs.chat.core.domain.Terminal类,双击创建一个新的Object Set,选择Cumulated incoming references,可以看到Terminal对象的引用链。
TerminalReference.png

6.经过分析确认,ChatDomainManager中维护了Terminal的哈希表,而某些Terminal未被正确地从ConcurrentHashMap中删除,因此出现数量过大的情况。而Terminal对象中使用了ReentrantReadWriteLock,ReentrantReadWriteLock是通过ThreadLocal来记录当前线程所持有的读锁数量的。因此才有了上面诸多有关ReentrantReadWriteLock的对象实例。

更多内容请参考我的博客 清欢的个人博客

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值