Jvm线上调优实战(1)

侦探问题根源


  • 通过top指令发现,当前5511的线程cup和内存的使用率过高:
top :
 5511 root      20   0 16.841g 6.088g   5584 S  47.0 39.2   4011:41 java
 9550 root      20   0 2516200  67892   2436 S   0.7  0.4 204:20.40 java
21271 root      20   0 11.579g 0.987g   5056 S   0.7  6.4  46:29.88 java
   13 root      20   0       0      0      0 S   0.3  0.0   0:35.21 ksoftirqd/1
 2128 root      20   0  9.833g 1.273g   6656 S   0.3  8.2  33:48.35 java
29464 root      20   0 11.578g 1.030g   5448 S   0.3  6.6  31:42.67 java
31721 root      20   0  157744   2264   1544 R   0.3  0.0   0:00.32 top 

  • 查询当前java进程详情,利用 jinfo pid 5511
Attaching to process ID 5511, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.151-b12
Java System Properties:

java.runtime.name = Java(TM) SE Runtime Environment
java.vm.version = 25.151-b12
sun.boot.library.path = /usr/local/java/jre/lib/amd64
java.vendor.url = http://java.oracle.com/
java.vm.vendor = Oracle Corporation
path.separator = :
file.encoding.pkg = sun.io
java.vm.name = Java HotSpot(TM) 64-Bit Server VM
sun.os.patch.level = unknown
sun.java.launcher = SUN_STANDARD
user.country = US
user.dir = /mnt/app/bdcenter-base/bdcenter-service-base-1.0-SNAPSHOT
java.vm.specification.name = Java Virtual Machine Specification
PID = 5511
java.runtime.version = 1.8.0_151-b12
java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
os.arch = amd64
java.endorsed.dirs = /usr/local/java/jre/lib/endorsed
line.separator = 

java.io.tmpdir = /tmp
java.vm.specification.vendor = Oracle Corporation
os.name = Linux
io.netty.noKeySetOptimization = true
sun.jnu.encoding = UTF-8
java.library.path = /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
spring.beaninfo.ignore = true
sun.nio.ch.bugLevel = 
java.specification.name = Java Platform API Specification
java.class.version = 52.0
sun.management.compiler = HotSpot 64-Bit Tiered Compilers
os.version = 3.10.0-693.2.2.el7.x86_64
user.home = /root
user.timezone = Asia/Shanghai
catalina.useNaming = false
java.awt.printerjob = sun.print.PSPrinterJob
file.encoding = UTF-8
@appId = bdcenter-service-base
java.specification.version = 1.8
io.netty.recycler.maxCapacityPerThread = 0
catalina.home = /tmp/tomcat.7050606280722271893.9025
user.name = root
java.class.path = bdcenter-service-base-1.0-SNAPSHOT.jar
java.vm.specification.version = 1.8
sun.arch.data.model = 64
sun.java.command = bdcenter-service-base-1.0-SNAPSHOT.jar --spring.profiles.active=prod 5
java.home = /usr/local/java/jre
user.language = en
java.specification.vendor = Oracle Corporation
io.netty.noUnsafe = true
awt.toolkit = sun.awt.X11.XToolkit
java.vm.info = mixed mode
java.version = 1.8.0_151
java.ext.dirs = /usr/local/java/jre/lib/ext:/usr/java/packages/lib/ext
sun.boot.class.path = /usr/local/java/jre/lib/resources.jar:/usr/local/java/jre/lib/rt.jar:/usr/local/java/jre/lib/sunrsasign.jar:/usr/local/java/jre/lib/jsse.jar:/usr/local/java/jre/lib/jce.jar:/usr/local/java/jre/lib/charsets.jar:/usr/local/java/jre/lib/jfr.jar:/usr/local/java/jre/classes
java.awt.headless = true
java.vendor = Oracle Corporation
catalina.base = /tmp/tomcat.7050606280722271893.9025
file.separator = /
java.vendor.url.bug = http://bugreport.sun.com/bugreport/
sun.io.unicode.encoding = UnicodeLittle
sun.cpu.endian = little
sun.cpu.isalist = 

VM Flags:
Non-default VM flags: -XX:+AggressiveOpts -XX:CICompilerCount=4 -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:InitialHeapSize=6442450944 -XX:MaxDirectMemorySize=1073741824 -XX:MaxHeapSize=6442450944 -XX:MaxNewSize=697892864 -XX:MaxTenuringThreshold=6 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=697892864 -XX:OldPLABSize=16 -XX:OldSize=5744558080 -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:ThreadStackSize=256 -XX:+UseBiasedLocking -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseFastAccessorMethods -XX:+UseParNewGC 
Command line:  -Xms6G -Xmx6G -XX:MaxPermSize=256M -XX:MaxDirectMemorySize=1G -Xss256k -XX:+AggressiveOpts -XX:+UseBiasedLocking -XX:+UseFastAccessorMethods -XX:+DisableExplicitGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 -XX:CMSInitiatingOccupancyFraction=75 -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCTimeStamps -XX:+PrintGCDetails

Command line 可以看出JVM 用户输入参,对整个jvm的信息进行消息的描述


  • 查看当前gc情况: jstat gcacuse 5511
[root@bigdata-service-1:/root]# jstat -gccause 5511 1000
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC                 
  0.00   8.86  61.50  72.16  95.05  91.62   4577  326.410     8    8.855  335.265 Allocation Failure   No GC               
  0.00   8.86  64.31  72.16  95.05  91.62   4577  326.410     8    8.855  335.265 Allocation Failure   No GC               
  0.00   8.86  65.04  72.16  95.05  91.62   4577  326.410     8    8.855  335.265 Allocation Failure   No GC               
  0.00   8.86  65.17  72.16  95.05  91.62   4577  326.410     8    8.855  335.265 Allocation Failure   No GC               
  0.00   8.86  65.26  72.16  95.05  91.62   4577  326.410     8    8.855  335.265 Allocation Failure   No GC               
  0.00   8.86  65.26  72.16  95.05  91.62   4577  326.410     8    8.855  335.265 Allocation Failure   No GC               
  0.00   8.86  67.80  72.16  95.05  91.62   4577  326.410     8    8.855  335.265 Allocation Failure   No GC 

发现新生代频繁的进行gc操【YGC:gc 次数, YGCT: gc时间】,老年的几乎没有gc【FGC:full gc次数,FGCT :full gc时间】,可能原因是新生代的设计过小,导致新生代频繁gc


  • 查看当前jvm 内存分配: jmap -heap 5511
[root@bigdata-service-1:/root]# jmap -heap 5511
Attaching to process ID 5511, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.151-b12

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 6442450944 (6144.0MB)
   NewSize                  = 697892864 (665.5625MB)
   MaxNewSize               = 697892864 (665.5625MB)
   OldSize                  = 5744558080 (5478.4375MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 628162560 (599.0625MB)
   used     = 210724080 (200.96214294433594MB)
   free     = 417438480 (398.10035705566406MB)
   33.54610628178795% used
Eden Space:
   capacity = 558432256 (532.5625MB)
   used     = 203572032 (194.14141845703125MB)
   free     = 354860224 (338.42108154296875MB)
   36.45420367694519% used
From Space:
   capacity = 69730304 (66.5MB)
   used     = 7152048 (6.8207244873046875MB)
   free     = 62578256 (59.67927551269531MB)
   10.256728552337876% used
To Space:
   capacity = 69730304 (66.5MB)
   used     = 0 (0.0MB)
   free     = 69730304 (66.5MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 5744558080 (5478.4375MB)
   used     = 4145940376 (3953.876853942871MB)
   free     = 1598617704 (1524.560646057129MB)
   72.17161561016022% used

32720 interned Strings occupying 4008568 bytes.

从JVM的分配可以看出,新生代分配:capacity = 628162560 (599.0625MB),Eden Space:capacity = 558432256 (532.5625MB),老年代分配:capacity = 5744558080 (5478.4375MB)。从数据分析出,新生代设计过小,导致新生代频繁gc。其次,新生代设计过小,导致大对象无法分配,直接分配到老年代。导致老年代空间占用过大,从而占用整个系统内存空间。可能导致其他服务内存不足。


  • 线上手动GC,查看GC后的JVM详情信息

[root@bigdata-service-1:/root]# jmap -histo:live 5511
日志过大.....

[root@bigdata-service-1:/root]# jstat -gc 5511 1000
 S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC     MU    CCSC   CCSU   YGC     YGCT    FGC    FGCT     GCT   
68096.0 68096.0 7588.1  0.0   545344.0 394801.0 5609920.0  4050654.1  87952.0 83637.7 11392.0 10437.9   4592  327.582   9      8.855  336.437
68096.0 68096.0  0.0    0.0   545344.0   0.0    5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 46918.8  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 46922.8  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 46937.1  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 46937.1  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 46937.1  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 64190.0  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 64190.0  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 64190.0  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 64190.0  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 64190.0  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 82152.3  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 82152.3  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 82152.3  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 82156.4  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 82156.4  5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 101737.5 5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 101737.5 5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 103581.5 5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 103581.5 5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 103581.5 5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 124812.4 5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 124812.4 5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578
68096.0 68096.0  0.0    0.0   545344.0 124814.5 5609920.0  4043257.0  87952.0 83637.7 11392.0 10437.9   4592  327.582   9     12.997  340.578

手动强制Full gc 以后发现,内存并没有被释放,说明当前系统可能存在内存泄漏,导致对象不能被清理释放【OC:老年代空间,OU :老年代使用空间(KB)】。
存在内存泄漏,主要是当前的对象引用没有被释放导致,比如僵尸线程的存在。说明当前系统可能存在大量的僵尸线程。进而进行进一步的排查


  • 查看当前系统线程情况:jstack。可以看到当前线程的详细情况
[root@bigdata-service-1:/root]# jstack 5511 | head -50
2018-07-20 11:59:27
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode):

"Keep-Alive-Timer" #19437 daemon prio=8 os_prio=0 tid=0x00007f8d749ce000 nid=0x4995 waiting on condition [0x00007f8bb7b2b000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at sun.net.www.http.KeepAliveCache.run(KeepAliveCache.java:172)
	at java.lang.Thread.run(Thread.java:748)

"http-nio-9025-exec-201" #19378 daemon prio=5 os_prio=0 tid=0x00007f8c8e3d3000 nid=0x7ffc waiting on condition [0x00007f8b3a724000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000679f4cbd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)

"http-nio-9025-exec-197" #19374 daemon prio=5 os_prio=0 tid=0x00007f8c8e3cc000 nid=0x7ff7 waiting on condition [0x00007f8b3a828000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000679f4cbd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:748)

"http-nio-9025-exec-196" #19373 daemon prio=5 os_prio=0 tid=0x00007f8c8a06d000 nid=0x7ff3 waiting on condition [0x00007f8b3a8aa000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000679f4cbd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)


  • 查看当前系统的线程总数:
jvm 开启的线程数:
[root@bigdata-service-1:/root]# jstack 5511 | wc -l
254132

当前服务器对不同IP的建立连接数:
[root@bigdata-service-1:/root]# netstat -nat|grep ESTABLISHED|awk '{print$5}'|awk -F : '{print$1}'|sort|uniq -c|sort -rn
  10244 10.27.70.185
     26 10.27.70.77
     26 10.24.237.150
     10 10.81.83.121
      7 10.29.148.127
      4 10.80.101.244
      3 10.27.4.125
      2 127.0.0.1
      2 10.80.112.35
      2 10.27.87.167
      2 10.25.0.162
      1 10.80.112.23
      1 106.11.248.20

可以分析出,当前系统对外建立了大量的连接,而且连接没有释放,导致系统开启的线程数过多。通过分析连接最多的ip,定位到出现问题的节点


总结问题

从前面的一系列的分析,可以发现,系统底层存在大量的线程,导致引用没法释放。通过对java的栈信息分析发现存在大量的ElasticSearch连接线程,导致JVM Full Gc 没法释放资源。

"elasticsearch[_client_][transport_client_boss][T#4]" #166 daemon prio=5 os_prio=0 tid=0x00007f8c9819e800 nid=0x20e3 runnable [0x00007f8c7403d000]
	- locked <0x0000000674995e98> (a sun.nio.ch.Util$3)
	- locked <0x0000000674995e80> (a java.util.Collections$UnmodifiableSet)
	- locked <0x0000000674986f58> (a sun.nio.ch.EPollSelectorImpl)
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)

进一步分析请看下篇博客:《JVM线上实战2》

相关推荐
©️2020 CSDN 皮肤主题: 书香水墨 设计师:CSDN官方博客 返回首页