jvm问题系列-----记压测系统宕机排查过程

背景

收银台系统压测过程中率先挂掉,紧急排查原因

步骤一 查看服务状态

服务已挂掉,但cpu一直标高

可以看到服务已挂掉,但cpu一直标高,查看进程号

[root@localhost pay]# ps -ef|grep java
root     19153     1 41 831 ?       3-04:39:01 java -Dserver.port=9090 -jar -Xmx256m -Xms256m -XX:+PrintGCDetails -Xloggc:/apps/pay/log/paycenter-cashier-gc.log -Ddbconfigpath=/apps/dbconfig/ -javaagent:/apps/pp-agent/pinpoint-bootstrap-1.7.1.jar -Dpinpoint.agentId=cashier-35-157 -Dpinpoint.applicationName=cashier -Djava.rmi.server.hostname=10.151.35.157 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9011 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false /apps/pay/cashier.jar

查看占有cpu线程

[root@localhost pay]# top -Hp 19153
top - 23:40:04 up 581 days, 20:25,  4 users,  load average: 2.11, 2.78, 2.92
Threads:  38 total,   1 running,  37 sleeping,   0 stopped,   0 zombie
%Cpu(s): 89.7 us,  2.5 sy,  0.0 ni,  7.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  3881308 total,  1031412 free,  1035152 used,  1814744 buff/cache
KiB Swap:  4063228 total,  3817716 free,   245512 used.  2301624 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
19159 root      20   0 3077448 804164   8096 S 79.7 20.7   2022:43 java
19158 root      20   0 3077448 804164   8096 S 78.3 20.7   2017:21 java
19160 root      20   0 3077448 804164   8096 R 18.0 20.7 459:14.59 java
19153 root      20   0 3077448 804164   8096 S  0.0 20.7   0:00.00 java
19157 root      20   0 3077448 804164   8096 S  0.0 20.7   0:08.66 java
19161 root      20   0 3077448 804164   8096 S  0.0 20.7   0:02.98 java
19162 root      20   0 3077448 804164   8096 S  0.0 20.7   0:01.37 java
19164 root      20   0 3077448 804164   8096 S  0.0 20.7   0:00.00 java

查看对应占cpu线程

[root@localhost pay]# printf "%x\n" 19159
4ad7
[root@localhost pay]# jstack 19153 |grep 4ad7
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007ff294022000 nid=0x4ad7 runnable
[root@localhost pay]# printf "%x\n" 19158
4ad6
[root@localhost pay]# jstack 19153 |grep 4ad6
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007ff294020000 nid=0x4ad6 runnable

发现占cpu的都是gc线程,那看来问题不在线程逻辑问题,继续排查

查看gc日志

启动脚本已经制定打印gc日志-XX:+PrintGCDetails -Xloggc:/apps/pay/log/paycenter-cashier-gc.log ,下载gc日志,利用gc分析工具https://gceasy.io/查看日志,但发现文件太大,在线工具查看有问题,改用离线工具gcviewer分析

在这里插入图片描述
下载gcviewer工具包

java -Xmx2048m -Xms2048m -jar gcviewer_1.3.4.jar paycenter-cashier-gc.log

查看gc日志情况
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
参数备注说明

Summary
-------

- Footprint:
     o Maximal amount of memory allocated  分配的最大内存
- Max heap after conc GC:
     o Max used heap after concurrent gc.  conc gc之后最大使用的堆内存
- Max tenured after conc GC:
     o Max used tenured heap after concurrent gc (followed by % of max tenured / % of max total heap).   在conc gc之后老年代堆空间的最大使用量
- Max heap after full GC:
     o Max used heap after full gc. Indicates max live object size and can help to determine heap size.  

     在全gc之后堆空间的最大使用量,指示存活对象的最大数,可以帮助确定堆size的大小
- Freed Memory:
     o Total amount of memory that has been freed  已经被释放的总的内存
- Freed Mem/Min:
     o Amount of memory that has been freed per minute  每分钟被释放的内存的总数
- Total Time:
     o Time data was collected for (only if timestamp was present in log)  垃圾收集器工作的时间
- Acc Pauses:
     o Sum of all pauses due to GC  因为垃圾回收而停顿的统计数
- Throughput:
     o Time percentage the application was NOT busy with GC 垃圾回收时应用程序没有停顿卡死所占的时间比
- Full GC Performance:
     o Performance of full collections. Note that all collections
       that include a collection of the tenured generation or
       are marked with "Full GC" are considered Full GC.  所有垃圾回收的性能,包括老年代回收或者被标记为全GC被认为是全GC的
- GC Performance:
     o Performance of minor collections. These are collections
       that are not full according to the definition above.  minor类型回收的性能,这个集合不全是前面的minor类型

Memory
------

- Total heap (usage / alloc max):
     o Max memory usage / allocation in total heap (the last is the 
       same as "footprint" in Summary)                   最大内存使用量/分配的总堆内存
- Tenured heap (usage / alloc max):
     o Max memory usage / allocation in tenured space    最大内存使用量/分配的老年代空间
- Young heap (usage / alloc max):
     o Max memory usage / allocation in young space      最大内存使用量/分配的年轻代空间
- Perm heap (usage / alloc max):
     o Max memory usage / allocation in perm space       最大内存使用量/分配的持久代空间
- Max tenured after conc GC:
     o see in "summary" section                          
- Avg tenured after conc GC:
     o average size of tenured heap after concurrent collection    conc gc之后年老代堆平均大小
- Max heap after conc GC:
     o see in "summary" section
- Avg heap after conc GC:
     o average size of concurrent heap after concurrent collection  conc gc之后年并发堆平均大小
- Max heap after full GC:
     o see in "summary" section  
- Avg after full GC:
     o The average heap memory consumption after a full collection  全gc之后平均堆内存的消耗量
- Avg after GC:
     o The average heap memory consumption after a minor collection minor之后平均堆内存的消耗量
- Freed Memory:
     o Total amount of memory that has been freed  已经被释放的内存总量
- Freed by full GC:
     o Amount of memory that has been freed by full collections   通过全GC被释放的内存总量
- Freed by GC:
     o Amount of memory that has been freed by minor collections  通过minor GC被释放的内存总量
- Avg freed full GC:
     o Average amount of memory that has been freed by full
       collections   通过全GC被释放的内存平均量
- Avg freed GC:
     o Average amount of memory that has been freed by minor
       collections   通过minor GC被释放的内存平均量

- Slope GC:
     o Average of the slope of the regression lines for the memory
       consumption after minor collections in between full collections.
       That is, if you have two full collections and many minor
       collections in between, GCViewer will calculate the slope for
       the minor collections up to the first full collection, then the
       slope of the minor collections between the first and the second
       full collection. Then it will compute a weighted average (each
       slope wil be weighted with the number of measuring points it was
       computed with).



Pause
-----

- Acc Pauses:
     o Sum of all pauses due to any kind of GC       所有因为GC引起的中断时间的总计 
- Number of Pauses:
     o Count of all pauses due to any kind of GC     所有因为GC引起的中断次数的总计
- Avg Pause:
     o Average length of a GC pause of any kind      平均一次中断的时间
- Min / max Pause:
     o Shortest /longest pause of any kind           最短/最长中断的时间
- Avg pause interval:
     o avg interval between two pauses of any kind   两次中断之间平均的时间间隔
- Min / max pause interval:
     o Min / max interval between two pauses of any kind  最短/最长的两次中断之间平均的时间间隔

- Acc full GC:
     o Sum of all pauses due to full collections     所有因为全GC引起的中断时间的总计 
- Number of full GC pauses:
     o Count of all pauses due to full collections   所有因为全GC引起的中断次数的总计                   
- Avg full GC:
     o Average length of a full GC pause       全GC平均一次中断的时间
- Min / max full GC pause:
     o Shortest / longest full GC pause        全GC最短/最长中断的时间
     
- Acc GC:
     o Sum of all pauses due to minor collections   所有因为minor GC引起的中断时间的总计 
- Number of GC pauses:
     o Count of all pauses due to minor collections 所有因为minor GC引起的中断次数的总计 
- Avg GC:
     o Average length of a minor collection pause   minor GC平均一次中断的时间
- Min / max GC pause:
     o Shortest / longest minor GC pause            minor GC最短/最长中断的时间

查看GC日志Tenured heap(老年代)使用100%,young heap使用99.7%,gc次数:
full gc 569238次,minigc 4882次

因为我们压测过程中jvm配置为-Xmx256m -Xms256m,都不大,现在查看下堆栈信息

下载通过Eclipse Memory Analyze对应内存堆栈文件hprof

在这里插入图片描述
查看堆内存在的对象情况:
在这里插入图片描述

发现第一个对象占了60%
在这里插入图片描述

在这里插入图片描述
点开这个对象查看,发现这个对象中记录的都是应用中的地址信息,查看对应这个对象的引用信息:list objects->with outgoing references
在这里插入图片描述
找到了对应的map,然后继续分析他对应的引用路径path to GC Roots -》exclude all phantom/weak/soft etc.references
在这里插入图片描述
在这里插入图片描述
最后定位到org.springframework.aop.aspectj.annotation.AnnotationAwareAspectJAutoProxyCreator类的advisedBeans属性上
在这里插入图片描述

到这一步我们发现advisedBeans这个属性里面存了我们的代码列的很多url地址,对应代码查看

  redirectUrl = "/pc/index";
                redirectUrl += (BizOrderTypeEnum.POWER.equals(cashierOrderCacheDTO.getBizType()) ? "#/powercashier"
                    : "#/commoncashier");
                redirectUrl += "?token=" + token;
            }
            return "redirect:" + redirectUrl;

这里根据下单返回的参数的不同,每次生成新的token,然后根据通过redirect进行重定向,具体原因是:
因为redirectUrl是带参数,动态的链接,redirect会构建ViewResolver#create一个RedirectView,执行applyLifecycleMethods去initBean的时候,会使用了一个名为AdvisedBeans的ConcurrentHashMap去保存,这个map以redirect:redirectUrl为key,又没有做limit容量限制,而edirectUrl是一个动态的链接,所以每次redirect都会生成put一条数据到map中。最简单的修改方法就是HttpServletResponse的sendRedirect方法替换一下。

问题细节代码解释可以参考这篇博客https://blog.csdn.net/qq_26093341/article/details/85696884,里面作者做了代码层面的说明。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值