java heap space堆内存溢出的原因及解决方案

最新推荐文章于 2024-10-02 10:11:40 发布

weixin_33916256

最新推荐文章于 2024-10-02 10:11:40 发布

阅读量893

点赞数

文章标签： java 操作系统 python

原文链接：https://my.oschina.net/wliming/blog/672964

版权

2019独角兽企业重金招聘Python工程师标准>>>

原因：没正确使用String的substring和split方法，读取的文件过大，List或其他集合存入的数据过多等等。比如我做过一个项目log4j记录日志的时候报的，底层就是因为substring引起的

解决方案：先解决程序中可能引起这个问题的BUG，再配置参数，-Xms 512m -Xmx 1024m 这样的运行内存大小的参数

public class SubstringTest {
	private String str = new String(new byte[1000000]);
	public String getStr() {
		return str.substring(0,2);
	}
	@Test
	public void test(){
		List list = new ArrayList();
		for(int i = 0; i < 10000; i++){
			SubstringTest stringTest = new SubstringTest();
			list.add(stringTest.getStr());
		}
	}
}

在上面的代码中，JDK6以上不会出现内存问题，而JDK6会出现问题，因为其底层代码实现问题：

public String substring(int beginIndex, int endIndex) {
  if (beginIndex < 0) {
      throw new StringIndexOutOfBoundsException(beginIndex);
  }
  if (endIndex > count) {
      throw new StringIndexOutOfBoundsException(endIndex);
  }
  if (beginIndex > endIndex) {
      throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
  }
  return ((beginIndex == 0) && (endIndex == count)) ? this :
      new String(offset + beginIndex, endIndex - beginIndex, value);
}

最后一句调用的构造为：

String(int offset, int count, char value[]) {
  this.value = value;
  this.offset = offset;
  this.count = count;
}

可以看出新创建的String对象都持有一个value数组，而这个value数组是什么呢？：

public String(byte bytes[], int offset, int length) {
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(bytes, offset, length);
}

而StringCoding.decode(bytes, offset, length):

static char[] decode(byte[] ba, int off, int len) {
        String csn = Charset.defaultCharset().name();
        try {
            // use charset name decode() variant which provides caching.
            return decode(csn, ba, off, len);
        } catch (UnsupportedEncodingException x) {
            warnUnsupportedCharset(csn);
        }
        try {
            return decode("ISO-8859-1", ba, off, len);
        } catch (UnsupportedEncodingException x) {
            // If this code is hit during VM initialization, MessageUtils is
            // the only way we will be able to get any kind of error message.
            MessageUtils.err("ISO-8859-1 charset not available: "
                             + x.toString());
            // If we can not find ISO-8859-1 (a required encoding) then things
            // are seriously wrong with the installation.
            System.exit(1);
            return null;
        }
    }

decode(csn, ba, off, len);最终会把byte数组转换为新建String对象“str”的value(char类型的数组)；上面的测试代码：

for(int i = 0; i < 10000; i++){
			SubstringTest stringTest = new SubstringTest();
			list.add(stringTest.getStr());
		}

会创建10000个新对象，而每个对象都有初始字符串“str”的value（既大小1000000的char类型的数组，会有10000个这样的数组）变量，即新开辟了10000个这样的内存空间，想想这是怎样的内存开销！报java heap space 也就不足为奇了。

而JDK7及以上，则修改了substring方法：

public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

后面的构造方法为：

 public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

使用了字符数组拷贝而不再是字符数组共享，这就造成了一个问题，效率肯定没以前快了，以前的运行速度为：5ms(这里调整了代码，不然会java heap space)。

调整后的代码为：

@Test
	public void test(){
		long timeStart = System.currentTimeMillis();
		List list = new ArrayList();
		SubstringTest stringTest = new SubstringTest();
		for(int i = 0; i < 10000; i++){
			list.add(stringTest.getStr());
		}
		long timeEnd = System.currentTimeMillis();
		System.out.println(timeEnd - timeStart);
	}

虽然把对象放在了循环外面，但是也可以看出效率问题。

使用字符数组拷贝运行时间：6274ms

注意：

上面调整后的代码和下面的代码效果是一样的：

@Test
public void test(){
		long timeStart = System.currentTimeMillis();
		List list = new ArrayList();
		for(int i = 0; i < 10000; i++){
			list.add(this.getStr());
		}
		long timeEnd = System.currentTimeMillis();
		System.out.println(timeEnd - timeStart);
}

这两者都只是创建了一个SubstringTest的实例对象，也就是说只有一块1000000大小的字符数组内存，因此也不会java heap space，还有一种改法：

private String str = new String(new byte[1000000]);
public String getStr() {
		return new String(str.substring(0,2));
}
@Test
public void test(){
		long timeStart = System.currentTimeMillis();
		List list = new ArrayList();
		for(int i = 0; i < 10000; i++){
			SubstringTest stringTest = new SubstringTest();
			list.add(stringTest.getStr());
		}
		long timeEnd = System.currentTimeMillis();
		System.out.println(timeEnd - timeStart);
}

运行速度（JDK6环境下）：42273ms

可见这样的运行速度极其低下，因此不推荐，虽然不会内存溢出（因为new 出来的新String对象所持有的char数组-value变量的大小都是2）。

综上可以看出：JDK6对字符串的操作，包括split、trim、subSequence等，因为使用的字符串数组共享，而不是字符串数组复制，减少了JVM的操作，还是有很高效率的。

转载于:https://my.oschina.net/wliming/blog/672964