max_map_count超出导致的OOM_max.map.count-CSDN博客

原文地址：原文地址

故障现象：

一天早上使用（老版A系统）的应用一直FullGC。
使用（新版A系统）的B和C两个应用一直抛NPE，其余使用A系统5的应用正常。

初步排查结论：

A系统的数据量从500W条一夜增加到1000W多条，所以A系统1最大的数据包load过程堆内存占用飙升到2G左右，加上老数据包的占用，会占用3G的堆内存，所以导致A系统1的应用一致FullGC。
A系统5的故障是由于Server端在build B和C两个应用数据包的过程中发生了OOM，导致数据包不完整，由于主观和客观因素，这个错误的数据包被推送到B和C的机器上，所以导致了的B的大量NPE，进而引发了全网不可用。。。

排查OOM的过程

先说下A系统 Server的内存配置：
线上机器 48G物理内存 jvm启动参数主要的是:
-Xms25000m -Xmx25000m -Xss6m -XX:PermSize=300m -XX:MaxPermSize=560m -Xmn5000m -XX:+DisableExplicitGC -XX:MaxDirectMemorySize=20000m
预发机器 16G物理内存其中分配了12G的Heap内存
在A系统的Server上发现应用日志中发现了大量的OOM异常：
Caused by: java.lang.OutOfMemoryError: Map failed
看到OOM的第一反映就是去看Heap和物理内存的占用情况：但是CMS Generation的使用率在50%，物理内存和Swap空间都是足够的。此时记录相关信息后就重启应用，但是重启后线上机器没多久问题依旧出现，可是同样的程序同样的数据在预发却是正常，让人琢磨不透...
然后跟踪堆栈找到抛出异常的地方是在 FileChannle#map，这个方法是创建一个内存映射文件，A系统为了降低堆内存的时候，同时提高写入的效率，可以将一个文件分成多段，内存映射多个MappedByteBuffer进行读写操作：
private void mapBuffer(int bufferCapacity) throws IOException {
ByteBuffer old = byteBuffer;
byteBuffer = fileChannel.map(mapMode, filePos, bufferCapacity);
...
}
跟踪fileChannle.map的方法发现最终调用的native方法是
FileChannleImpl.java

// Creates a new mapping
private native long map0(int prot, long position, long length) throws IOException;
翻看了openJDK的源码
FileChannelImpl.c

JNIEXPORT jlong JNICALL
Java_sun_nio_ch_FileChannelImpl_map0(JNIEnv *env, jobject this, jint prot, jlong off, jlong len)
{
...
mapAddress = mmap64(
0, /* Let OS decide location */
len, /* Number of bytes to map */
protections, /* File permissions */
flags, /* Changes are shared */
fd, /* File descriptor of mapped file */
off); /* Offset into file */

if (mapAddress == MAP_FAILED) {
if (errno == ENOMEM) {
JNU_ThrowOutOfMemoryError(env, "Map failed");
return IOS_THROWN;
}
return handle(env, -1, "Map failed");
}
return ((jlong) (unsigned long) mapAddress);
}
mmap64是个系统函数，当返回的错误码是ENOMEM时，会向上抛出OOME，进一步查阅了GNU的手册，可以发现抛出ENOMEM错误码的解释：
ENOMEM
Either there is not enough memory for the operation, or the process is out of address space.
这里提到了两个原因：一个是内存不足，另一个是超出了进程的地址空间。可是当时系统内存和swap是富余的，所以第一种情况应该不是，第二地址空间不足，可翻阅了很多资料发现都是说32位机器上的地址空间限制问题，我们是64系统加上64位JVM，怎么会出现地址空间不足？
当时排查到这里有点束手无策了，可A系统是每天都要跑的，当天必须要出修复方案，所以只能死马当活马医，按照我猜测的最可疑的地方进行修复。当时的猜测就是ByteBuffer没释放，所以按照FileChannleImpl的unmap方法把上面的代码改成:
private void mapBuffer(int bufferCapacity) throws IOException {
ByteBuffer old = byteBuffer;
byteBuffer = fileChannel.map(mapMode, filePos, bufferCapacity);
...
if (old != null && old instanceof DirectBuffer) {
try {
sun.misc.Cleaner cleaner = ((DirectBuffer) old).cleaner();
cleaner.clean();
} catch (Throwable t) {
logger.error("Failed to unmap. ");
}
}
}
没想到果然系统就正常了，所以OOM问题是ByteBuffer没释放导致的，但是为什么没有释放，又如何触发的还是不清楚。
这时候有同事提出是启动参数-XX:+DisableExplicitGC导致的，参考的文章是撒迦这篇文章 ,文章中提到了使用DirectMemory可能会引起OOM的情况
1、应用本身在GC堆内的对象行为良好，正常情况下很久都不发生full GC；
2、应用大量使用了NIO的direct memory，经常、反复的申请DirectByteBuffer
3、使用了-XX:+DisableExplicitGC
我当时持有怀疑的态度，因为mmap操作应该是不占用DirectMemory空间的，后来毕大师也解释了启动参数中如果有MaxDirectMemorySize的情况下是会触发DirectMemory的回收的。
恰好当天这个机器的JVM还crash了一次，crash日志中heap占用和物理内存都是非常正常，但日志中有个现象比较诡异： Dynamic libraries:这部分信息非常多，统计以后发现有65532条：
40000000-40009000 r-xp 00000000 08:02 4685961 /opt/taobao/install/jdk-1.6.0_32/bin/java
40108000-4010a000 rwxp 00008000 08:02 4685961 /opt/taobao/install/jdk-1.6.0_32/bin/java
4010a000-4010b000 ---p 00000000 00:00 0
4010b000-4020b000 rwxp 00000000 00:00 0
4020b000-4020c000 ---p 00000000 00:00 0
4020c000-4030c000 rwxp 00000000 00:00 0
403be000-4046a000 rwxp 00000000 00:00 0 [heap]
...
7f6a5c070000-7f6a5c076000 r-xs 00095000 08:02 4687979 /opt/taobao/install/jdk-1.6.0_32/jre/lib/jsse.jar
7f6a5c076000-7f6a5c077000 r-xs 00003000 08:02 4736638 /opt/taobao/install/jboss-4.2.2.GA/client/getopt.jar
7f6a5c077000-7f6a5c079000 r-xs 00009000 08:02 4736611 /opt/taobao/install/jboss-4.2.2.GA/bin/run.jar
7f6a5c079000-7f6a5c081000 rwxs 00000000 08:02 3981400 /tmp/hsperfdata_admin/4800
7f6a5c081000-7f6a5c082000 rwxp 00000000 00:00 0
7f6a5c082000-7f6a5c083000 r-xp 00000000 00:00 0
7f6a5c083000-7f6a5c085000 rwxp 00000000 00:00 0
7fff0041f000-7fff00434000 rwxp 00000000 00:00 0 [stack]
7fff005a9000-7fff005aa000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
翻阅资料，发现这个数据来自 /proc/{pid}/maps, 这个文件展示了进程的虚拟地址空间的使用情况，这时突然想到ENOMEM中有说到进程的地址空间不足导致的，但是最后的7fff005aa000还远不到上限，而且计算虚拟内存占用也就几个G的空间。
这时想到前面提到65532这个数据，联想到了file-max，但是一查看是4889494,顺势猜想虚拟内存映射是不是也有打开上限？不出所料果然是有限制的：
max_map_count

The max_map_count file allows for the restriction of the number of VMAs (Virtual Memory Areas) that a particular process can own. A Virtual Memory Area is a contiguous area of virtual address space. These areas are created during the life of the process when the program attempts to memory map a file, links to a shared memory segment, or allocates heap space. Tuning this value limits the amount of these VMAs that a process can own. Limiting the amount of VMAs a process can own can lead to problematic application behavior because the system will return out of memory errors when a process reaches its VMA limit but can free up lowmem for other kernel uses. If your system is running low on memory in the NORMAL zone, then lowering this value will help free up memory for kernel use.
参考链接: http://www.redhat.com/magazine/001nov04/features/vm/
参考上面的说明，max_map_count这个参数就是允许一个进程在VMAs(虚拟内存区域)拥有最大数量，VMA是一个连续的虚拟地址空间，当进程创建一个内存映像文件时VMA的地址空间就会增加，当达到max_map_count了就是返回out of memory errors。
这个数据通过下面的命令可以查看：

cat /proc/sys/vm/max_map_count
发现A系统Server的数值果然是65536，而且测试修改max_map_count后filechannel#map的个数的上限也随之变化。所以可以确定程序OOM是由于达到了这个系统的上限，也就是ENOMEM错误码中所指的out of process address。
确定了异常的触发原因，再排查引发的原因就比较容易了，再来看下FileChannleImp#map的代码：
...
try {
// If no exception was thrown from map0, the address is valid
addr = map0(imode, mapPosition, mapSize);
} catch (OutOfMemoryError x) {
// An OutOfMemoryError may indicate that we've exhausted memory
// so force gc and re-attempt map
System.gc();
try {
Thread.sleep(100);
} catch (InterruptedException y) {
Thread.currentThread().interrupt();
}
try {
addr = map0(imode, mapPosition, mapSize);
} catch (OutOfMemoryError y) {
// After a second OOME, fail
throw new IOException("Map failed", y);
}
}
...
其中有个System.gc()的调用，当第一次map失败时，会执行gc，对VMA上地址进行回收，之后再重复执行map操作。
当启动参数中有DisableExplicitGC，就禁止了System.gc()，因而导致VMA地址空间不足引起的OOM。这也能解释为什么线上执行失败，但是预发却成功：因为预发的Heap内存比线上小很多，触发FullGC的机会要大，能及时回收VMA地址空间而侥幸逃过一劫。
所以这次故障导致的原因主要总结下来是：
1、数据量翻番导致mmap操作也随之增加
2、jvm启动参数中加上了DisableExplicitGC，禁止了System.gc()调用，所以也禁止了map失败时的回收。
3、线上Heap空间比较大，发生FullGC的概率小，使得VMA的地址回收速度更加慢