JVM crash后,将错误日志输出到指定目录,配置
-XX:ErrorFile=./hs_err_pid<pid>.log
# Native memory allocation (malloc) failed to allocate 1048576 bytes for committing reserved memory.
针对OOM,日志中已经给出了可能的原因:
# Possible reasons:
# The system is out of physical RAM or swap space 物理内存或交换区耗尽
# In 32 bit mode, the process size limit was hit 32位系统中,线程数量过多
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
针对原因1,检查GC Heap History (10 events):
Heap after GC invocations=1262 (full 4):
garbage-first heap total 3569664K, used 1505360K [0x00000001e0000000, 0x00000002b9e00000, 0x00000007e0000000)
region size 1024K, 32 young (32768K), 32 survivors (32768K)
compacting perm gen total 463872K, used 463634K [0x00000007e0000000, 0x00000007fc500000, 0x0000000800000000)
the space 463872K, 99% used [0x00000007e0000000, 0x00000007fc4c4818, 0x00000007fc4c4a00, 0x00000007fc500000)
No shared spaces configured.
}
堆内存:total总共约3.5G,用掉1.5,未溢出
Permanent Generation空间不足:使用了99%,不能加载额外的类
所以调整-XX:PermSize= -XX:MaxPermSize= 两个参数来增大PermGen内存,但一般情况下,是不需要手动配置的,只要设置-Xmx足够大即可,JVM会自行选择合适的PermGen大小
针对原因2,查看
--------------- P R O C E S S ---------------
发现了大量的Java线程,且状态都为blocked
3w多,所以原因即程序中产生了大量线程,线程泄露,导致的OOM
# Native memory allocation (mmap) failed to map 262144 bytes for committing reserved memory.
可以发现上面日志,也是OOM,但通过分析日志,其堆、交换区、线程数、物理内存等参数都在正常水平,那究竟是哪里溢出?
分析
--------------- S Y S T E M ---------------
PageTables: 231660 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 148815280 kB
Committed_AS: 148685332 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 676640 kB
VmallocChunk: 34290626040 kB
HardwareCorrupted: 0 kB
AnonHugePages: 84658176 kB
发现了CommitLimit与Committed_AS两个参数,由于linux提供的overCommit技术,同时查看系统参数
重点关注vm.overcommit_memory配置项,其含义是指
通过系统文件/proc/sys/vm/overcommit_memory进行设置,系统默认是0
0 Heuristic overcommit handling。启发式overcommit处理模式。太过明显的overcommit会被拒绝。root用户可以比普通用户多分配内存。
1 Always overcommit。总是可以overcommit。适用于一些科学计算的应用
2 Don't overcommit。不允许overcommit。应用程序允许分配的地址空间不能超过swap+总的物理内存*overcommit_ratio.这种情况下,当系统不能为应用程序分配更过内存时,不会被杀掉,而是会报内存分配错误。通过/proc/sys/vm/overcommit_ratio 进行设置,默认是50.也就是说如果有512MB的swap和2G物理内存,那么上述mongos进程最大可以从系统分配的内存大小为512MB+2GB*%50=1.5GB
overcommit_memory=0, 表示内核将检查是否有足够的可用内存供应用进程使用;如果有足够的可用内存,内存申请允许;否则,内存申请失败,并把错误返回给应用进程。
overcommit_memory=1, 表示内核允许分配所有的物理内存,而不管当前的内存状态如何。
overcommit_memory=2, 表示内核允许分配超过所有物理内存和交换空间总和的内存
CommitLimit: 148815280 kB
Committed_AS: 148685332 kB
Memory: 4k page, physical 148249784k(37218532k free), swap 2047996k(2047996k free)
CommitLimit=physical*overcommit_ratio+swap
=148249784k*99%+2047996k=148815280
当overcommit设置为2时,则会进行CommitLimit与Committed_AS的校验
CommitLimit-Committed_AS=129948<262144(需要申请的) 触发了OOM
为了不进行此次校验,当物理内存有空间时,直接进行分配即可,通过设置
/etc/sysctl.conf,修改vm.overcommit_memory=1
sysctl -p使配置生效
ps.
有三种方式修改内核参数,但要有root权限:
(1)编辑/etc/sysctl.conf ,改vm.overcommit_memory=1,然后sysctl -p 使配置文件生效
(2)sysctl vm.overcommit_memory=1
(3)echo 1 > /proc/sys/vm/overcommit_memory