JAVA线上环境线上问题排查步骤

今日线上机器发生服务卡顿现象,根据运维人员给的报警记录知道是那台机器出现的故障

解决思路

  • 是否内存溢出
  • 是否线程跑满无法接收新请求

根据这2个思路进行进一步的排查

内存溢出排查

内存溢出的排查步骤
第一步:进程是否存活
在这里插入图片描述
通过命令看到,tomcat的进程存在,则说明进程是存活的,如果是内存溢出导致的宕机的话,可以在

cat /var/log/messages

可以看到机器是在什么时候溢出的,这样的话,我们只能从dump文件来分析内存的大对象信息了

第二步:查看JVM内存是否正常

通过命令查看

jmap -heap pid
$ jmap -heap 2723
Attaching to process ID 2723, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.102-b14

using thread-local object allocation.
Garbage-First (G1) GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 4294967296 (4096.0MB)
   NewSize                  = 1363144 (1.2999954223632812MB)
   MaxNewSize               = 2575302656 (2456.0MB)
   OldSize                  = 5452592 (5.1999969482421875MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 2097152 (2.0MB)

Heap Usage:
G1 Heap:
   regions  = 2048
   capacity = 4294967296 (4096.0MB)
   used     = 3687839368 (3516.997688293457MB)
   free     = 607127928 (579.002311706543MB)
   85.86420137435198% used
G1 Young Generation:
Eden Space:
   regions  = 885
   capacity = 2006974464 (1914.0MB)
   used     = 1855979520 (1770.0MB)
   free     = 150994944 (144.0MB)
   92.47648902821317% used
Survivor Space:
   regions  = 15
   capacity = 31457280 (30.0MB)
   used     = 31457280 (30.0MB)
   free     = 0 (0.0MB)
   100.0% used
G1 Old Generation:
   regions  = 907
   capacity = 2256535552 (2152.0MB)
   used     = 1798305416 (1714.997688293457MB)
   free     = 458230136 (437.00231170654297MB)
   79.69320112887812% used

200785 interned Strings occupying 21207448 bytes.

从堆信息来看,内存充足,说明不是内存的问题

第三步:查看对象信息

如果是内存出了问题,我们一定可以查到是那个类的实例太多导致了我们系统的问题
同样是使用堆分析命令

jmap -histo:live  2723 >aaa.log

由于堆内存对象信息特别多因此建议输出在文本中查看,一般看前50行足够,如果没有我们自己写的类,那一般可以肯定问题不是出现在内存上了

 num     #instances         #bytes  class name
----------------------------------------------
   1:       1377882      721392080  [C
   2:       1140506      100364528  java.lang.reflect.Method
   3:         87973       52810296  [B
   4:       1368121       32834904  java.lang.String
   5:        980746       31383872  java.util.HashMap$Node
   6:        652719       31330512  org.aspectj.weaver.reflect.ShadowMatchImpl
   7:        762876       30515040  java.util.LinkedHashMap$Entry
   8:        769210       24614720  java.util.concurrent.ConcurrentHashMap$Node
   9:        260191       23543120  [Ljava.util.HashMap$Node;
  10:        924804       23440400  [Ljava.lang.String;
  11:        652719       20887008  org.aspectj.weaver.patterns.ExposedState
  12:        826190       18004968  [Z
  13:        293162       17204016  [Ljava.lang.Object;
  14:        652719       14069760  [Lorg.aspectj.weaver.ast.Var;
  15:        268954       12909792  java.util.HashMap
  16:        175403        9822568  java.util.LinkedHashMap
  17:         69779        9434712  [[Ljava.lang.String;
  18:        220558        8822320  java.lang.ref.SoftReference
  19:        298086        7154064  java.util.ArrayList
  20:        204250        6536000  java.lang.ref.WeakReference
  21:         52528        6363696  java.lang.Class
  22:           390        6136272  [Ljava.util.concurrent.ConcurrentHashMap$Node;
  23:        141958        5649848  [I
  24:         59065        4725200  java.lang.reflect.Constructor
  25:        227110        4592400  [Ljava.lang.Class;
  26:        110306        4412240  java.util.WeakHashMap$Entry
  27:         55838        3837608  [Ljava.util.Hashtable$Entry;
  28:         57099        3654336  org.hibernate.mapping.Column
  29:        225098        3601568  java.lang.Integer
  30:         74124        3557952  org.apache.commons.collections.ReferenceMap$SoftRef
  31:         57097        3197432  org.hibernate.mapping.Property
  32:         57099        2740752  org.hibernate.mapping.SimpleValue
  33:        107268        2574432  java.beans.MethodRef
  34:         53218        2554464  java.util.Hashtable
  35:         45524        2549344  java.beans.MethodDescriptor
  36:         28447        2503336  org.apache.catalina.session.StandardSession
  37:         55435        2217400  org.hibernate.annotations.common.reflection.java.JavaXProperty
  38:         53884        2155360  org.hibernate.tuple.StandardProperty
  39:         28081        2021832  java.lang.reflect.Field
  40:         61638        1972416  org.apache.commons.collections.SequencedHashMap$Entry
  41:         29913        1914432  java.util.concurrent.ConcurrentHashMap
  42:         55406        1772992  org.hibernate.annotations.common.reflection.java.JavaXSimpleType
  43:         24472        1761984  sun.reflect.DelegatingClassLoader
  44:         72766        1746384  sun.reflect.annotation.AnnotationInvocationHandler
  45:        103277        1652432  java.util.HashSet
  46:         49103        1571296  java.util.Vector
  47:         22653        1449792  java.net.URL
  48:         43082        1378624  java.util.Hashtable$Entry
  49:         57097        1370328  org.hibernate.property.BasicPropertyAccessor$BasicGetter
  50:         57097        1370328  org.hibernate.property.BasicPropertyAccessor$BasicSetter

这么多实例中没有看到我们自己写的类,当然也有可能是框架故障导致,不过此类问题一般较少

第四步:排查线程信息

jstack pid

线程的信息也比较多,一般可以输出到文本中查看

jstack 2723 >a.log

下载文件之后我们看到,java.lang.Thread.State: WAITING (parking)线程处于WAITING状态

"http-nio-8190-exec-69" #126 daemon prio=5 os_prio=0 tid=0x00007fe9800de800 nid=0xbc3 waiting on condition [0x00007fe8e9402000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006fc703a10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:104)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:32)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"http-nio-8190-exec-68" #125 daemon prio=5 os_prio=0 tid=0x00007fe99502e800 nid=0xbc2 waiting on condition [0x00007fe8e9483000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000006fc703a10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:104)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:32)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

明确告知我们锁对象a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject
明确告知了ThreadPoolExecutor中出现了锁,如果能看到我们自己写的代码,则可以直接断定是哪里出了问题,如果不是,需要我们排查系统中所有使用到线程池的地方

统计一下进程中有多少线程处于等待状态

grep WAITING |wc -l
104

发现有104个线程处于等待,一共只有151个线程

150个是tomcat线程 在server.xml中配置
1个是守护线程

稍后我会丰富一下WAITING 等内容的含义

第五步:排查代码

经过代码排查发现,我们系统中有一段代码

ExecutorService savePool = Executors.newSingleThreadExecutor();

这段代码中的savePool是一个成员变量,且受Spring管理
在使用这段代码时候

savePool.execute(new Runnable() {
    @Override
    public void run() {
        try {
        	IgnoreLoginController.ignoreLogin();
            // 2016-01-28 by cary 来自于ECM的销售出库单审核成功后,要将状态反馈给上游ECM
            saveOtherInfo(orders, operate);
        } catch (Exception e) {
            log.error(e);
        } finally {
            IgnoreLoginController.cancelIgnore();
            orders.clear();
        }
    }
});

很明显这段代码没有执行线程池的shutdown()方法

shutdown() 的含义是,停止接收新任务,但是线程池中已有的任务人可以继续执行
shutdownNow() 的含义是,停止接收新任务,现有线程池中的任务也停止执行

因此只需要在这段代码最后加上 savePool.shutdown();即可

注意:jmap jstack命令只能在root用户,或者系统项目的用户自己可以执行,非本用户启动的项目可能出现以下错误

[yianju@prd-server ~]$jmap -heap 20846
Attaching to process ID 20846, please wait...
Error attaching to process: sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
sun.jvm.hotspot.debugger.DebuggerException: sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.execute(LinuxDebuggerLocal.java:163)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach(LinuxDebuggerLocal.java:278)
        at sun.jvm.hotspot.HotSpotAgent.attachDebugger(HotSpotAgent.java:671)
        at sun.jvm.hotspot.HotSpotAgent.setupDebuggerLinux(HotSpotAgent.java:611)
        at sun.jvm.hotspot.HotSpotAgent.setupDebugger(HotSpotAgent.java:337)
        at sun.jvm.hotspot.HotSpotAgent.go(HotSpotAgent.java:304)
        at sun.jvm.hotspot.HotSpotAgent.attach(HotSpotAgent.java:140)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:185)
        at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
        at sun.jvm.hotspot.tools.HeapSummary.main(HeapSummary.java:49)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.tools.jmap.JMap.runTool(JMap.java:201)
        at sun.tools.jmap.JMap.main(JMap.java:130)
Caused by: sun.jvm.hotspot.debugger.DebuggerException: cannot open binary file
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.attach0(Native Method)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.access$100(LinuxDebuggerLocal.java:62)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$1AttachTask.doit(LinuxDebuggerLocal.java:269)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal$LinuxDebuggerLocalWorkerThread.run(LinuxDebuggerLocal.java:138)
  • 3
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值