api机器内存持续很高,60%-70%,成为服务器各项指标的瓶颈,别的指标如cpu、负载、磁盘都比较低;按道理使用高峰期能达到这个值,低峰期减半(即低于30%)才合适。
1)内存使用过高定位过程
top 找到mem使用最高的进程
KiB Mem : 15731936 total, 2667520 free, 9458472 used, 3605944 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 5953500 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
980 work 20 0 18.4g 8.7g 31260 S 20.3 58.0 26:44.47 java
机器共15G内存,java进程使用了最多的内存,58%,8.7G。
查看java的进程,结果发现执行java 提示:-bash: java: command not found
原因可能是用户权限不对,最终定为是没有设置profile文件
在etc/profile中增加如下内容:
export JAVA_HOME=/usr/local/jdk-13.0.1
export PATH=$PATH:$JAVA_HOME/bin
然后执行 source /etc/profile
之后就可以执行 jmap java等指令了
jps 获取java进程
16933 coding-cloud-web.jar
jhsdb jmap --heap --pid 16933
Attaching to process ID 16933, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 13.0.1+9
using thread-local object allocation.
Garbage-First (G1) GC with 8 thread(s)
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 12884901888 (12288.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 7730102272 (7372.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 4194304 (4.0MB)
Heap Usage:
G1 Heap:
regions = 3072
capacity = 12884901888 (12288.0MB)
used = 6758003184 (6444.934066772461MB)
free = 6126898704 (5843.065933227539MB)
52.44900770485401% used
G1 Young Generation:
Eden Space:
regions = 1552
capacity = 8107589632 (7732.0MB)
used = 6509559808 (6208.0MB)
free = 1598029824 (1524.0MB)
80.28970512157268% used
Survivor Space:
regions = 2
capacity = 12582912 (12.0MB)
used = 10926576 (10.420394897460938MB)
free = 1656336 (1.5796051025390625MB)
86.83662414550781% used
G1 Old Generation:
regions = 58
capacity = 4764729344 (4544.0MB)
used = 237516800 (226.513671875MB)
free = 4527212544 (4317.486328125MB)
4.984895947953345% used
java的堆占用了12G eden就使用了7G内存,而无论什么回收器,Eden区不满,是不会进行回收的。
又加上堆的survior、old和metaspace、线程的栈等数据,以及别的进程占用的内存,有2-3G。
共计占用10G很正常。所以肯定要调整jvm的参数。
现在jvm配置: -Xms12g -Xmx12g -XX:MaxGCPauseMillis=200
修改后:
元空间 默认20M。太小,增加如下配置:
-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512M
设置回收线程数
cat /proc/cpuinfo| grep "processor"| wc -l
结果是8 说明8核服务器,配置并发GC线程数为8的一半4
-XX:ConcGCThreads=4
运维操作:
所有操作仅先用api-1机器实验,操作仅在该机器上处理
1 nginx上摘除api-1机器
2 修改api-1机器上的jvm启动参数,
1)删除老的参数 Xms Xmx参数
2)新增参数配置
-Xms6g
-Xmx8g
-XX:G1ReservePercent=20
-XX:ConcGCThreads=4
-XX:MetaspaceSize=256M -XX:MaxMetaspaceSize=512M
-XX:MaxGCPauseMillis=100
3 重启api01的服务
4 修改api01不在夜间自动重启
5 nginx上增加api01服务
修改后
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 8589934592 (8192.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 5152702464 (4914.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 268435456 (256.0MB)
CompressedClassSpaceSize = 528482304 (504.0MB)
MaxMetaspaceSize = 536870912 (512.0MB)
G1HeapRegionSize = 2097152 (2.0MB)
Heap Usage:
G1 Heap:
regions = 4096
capacity = 8589934592 (8192.0MB)
used = 3540951824 (3376.914810180664MB)
free = 5048982768 (4815.085189819336MB)
41.222104616463184% used
G1 Young Generation:
Eden Space:
regions = 1608
capacity = 4051697664 (3864.0MB)
used = 3372220416 (3216.0MB)
free = 679477248 (648.0MB)
83.22981366459628% used
Survivor Space:
regions = 3
capacity = 8388608 (8.0MB)
used = 8188176 (7.8088531494140625MB)
free = 200432 (0.1911468505859375MB)
97.61066436767578% used
G1 Old Generation:
regions = 78
capacity = 2386558976 (2276.0MB)
used = 160543232 (153.10595703125MB)
free = 2226015744 (2122.89404296875MB)
6.72697526499341% used
结果:
1 开机启动避免的并发回收 日志可以说明
2 机器内存 确实降下来了
调整前范围:60%-70% 调整后:30%-40%,大概减小了一半
1天后 api01 - 35% api02 - 67%
3 gc频率与时间(jstat -gc统计数据分析)
1)频率 24小时的平均 增加,api02 api01-12次/小时 api02 10次/小时
2)时间 每次gc平均耗时 api01-14ms api02-23ms
结果:
上面参数适用于所有机器
内存优化过程:
1 50%机器调整jvm参数
2 如果3天内没有问题 100%机器调整jvm参数
3 继续观察3天
4 50%关闭自动重启
5 100%关闭自动重启
6 api机器原本10台,当时机器负载等都比较正常,扩的原因是什么?是否考虑缩减机器
机器优化
1)分析各个机器的使用情况,特别关注近半月峰值并记录
2)针对峰值小于40% 的缩减机器,每次缩减比例为10%,直到峰值占用超过70%