java string 内存泄露_Java6 String.substring()方法的内存泄露

substring(start,end)在Java编程里面经常使用,没想到如果使用不当,会出现内存泄露。

要了解substring(),最好的方法便是查看源码(jdk6):

1 /**

2 *

3 * "hamburger".substring(4, 8) returns "urge"4 * "smiles".substring(1, 5) returns "mile"5 * 
6 *7 *@parambeginIndex the beginning index, inclusive.8 *@paramendIndex the ending index, exclusive.9 *@returnthe specified substring.10 *@exceptionIndexOutOfBoundsException if the11 * beginIndex is negative, or12 * endIndex is larger than the length of13 * this String object, or14 * beginIndex is larger than15 * endIndex.16 */

17 public String substring(int beginIndex, intendIndex) {18 if (beginIndex < 0) {19 throw newStringIndexOutOfBoundsException(beginIndex);20 }21 if (endIndex >count) {22 throw newStringIndexOutOfBoundsException(endIndex);23 }24 if (beginIndex >endIndex) {25 throw new StringIndexOutOfBoundsException(endIndex -beginIndex);26 }27 return ((beginIndex == 0) && (endIndex == count)) ? this:28 new String(offset + beginIndex, endIndex -beginIndex, value);29 }

插一句,这段substring()的源代码,为如何编写api提供了很好的一个例子,让我想起了老赵的一篇文章,对参数的判断,异常的处理,思路上有点接近。

值得注意的是,如果调用substring(i,i)的话(即beginIndex==endIndex)或者是substring(stringLength)(即是beginIndex==字符串长度),并不会抛出异常,而是会返回一个空的字符串,因为new String(offset + beginIndex , 0 , value)。

言归正传,真正创建字符串的,是一个String(int,in,char[])的构造函数,源代码如下:

1 //Package private constructor which shares value array for speed.

2 String(int offset, int count, charvalue[]) {3 this.value =value;4 this.offset =offset;5 this.count =count;6 }

Java里的字符串,其实是由三个私有变量定义:

public final classStringimplements java.io.Serializable, Comparable, CharSequence

{/**The value is used for character storage.*/

private final charvalue[];/**The offset is the first index of the storage that is used.*/

private final intoffset;/**The count is the number of characters in the String.*/

private final intcount;

}

当为字符串分配内存时,char数组存储字符,offset=0,count=字符串长度。问题在于,由substring(start,end)调用构造函数String(int,in,char[])时,实际上是改变offset和count的位置达到取得子字符串的目的,而子字符串里的value[]数组,仍然指向原字符串。假设原字符串s有1GB,且我们需要的是s.substring(1,10)这样一段小的字符串,但由于substring()里的value[]数组仍然指向1GB的原字符串,导致原字符串无法在GC中释放,从而产生了内存泄露。

但为什么要这样设计呢?由于String是不可变的(immutable),基于这种共享同一个字符数组的设计有以下好处:

调用substring()时无需复制数组,可重用value[]数组;且substring()的运行是常数时间,非线性,性能得到提高(这也是第二段代码注释的意思:share values for speed)。

如何避免这个问题呢?有一个变通的方案,通过一个构造函数,复制一段数组:

1 /**

2 * Initializes a newly created {@codeString} object so that it represents3 * the same sequence of characters as the argument; in other words, the4 * newly created string is a copy of the argument string. Unless an5 * explicit copy of {@codeoriginal} is needed, use of this constructor is6 * unnecessary since Strings are immutable.7 *8 *@paramoriginal9 * A {@codeString}10 */

11 publicString(String original) {12 int size =original.count;13 char[] originalValue =original.value;14 char[] v;15 if (originalValue.length >size) {16 //The array representing the String is bigger than the new17 //String itself. Perhaps this constructor is being called18 //in order to trim the baggage, so make a copy of the array.

19 int off =original.offset;20 v = Arrays.copyOfRange(originalValue, off, off+size);21 } else{22 //The array representing the String is the same23 //size as the String, so no point in making a copy.

24 v =originalValue;25 }26 this.offset = 0;27 this.count =size;28 this.value =v;29 }30

31 //smalStr no longer holds the value[] of 1GB

32 String smallStr = new String(s.substring(1,10));

上面的构造方法,重新复制了一段数组给v,然后再将v给字符串的数组,从而避免内存泄露。

在Java7里,String的实现已经改变,substring()方法的实现,由原来的共享数组变成了传统的拷贝,杜绝了内存泄露的同时也将运行时间由常数变成了线性:

1 public String substring(int beginIndex, intendIndex) {2 if (beginIndex < 0) {3 throw newStringIndexOutOfBoundsException(beginIndex);4 }5 if (endIndex >value.length) {6 throw newStringIndexOutOfBoundsException(endIndex);7 }8 int subLen = endIndex -beginIndex;9 if (subLen < 0) {10 throw newStringIndexOutOfBoundsException(subLen);11 }12 return ((beginIndex == 0) && (endIndex == value.length)) ? this

13 : newString(value, beginIndex, subLen);14 }

/*** Allocates a new {@codeString} that contains characters from a subarray

* of the character array argument. The {@codeoffset} argument is the

* index of the first character of the subarray and the {@codecount}

* argument specifies the length of the subarray. The contents of the

* subarray are copied; subsequent modification of the character array does

* not affect the newly created string.

*

*@paramvalue

* Array that is the source of characters

*

*@paramoffset

* The initial offset

*

*@paramcount

* The length

*

*@throwsIndexOutOfBoundsException

* If the {@codeoffset} and {@codecount} arguments index

* characters outside the bounds of the {@codevalue} array*/

public String(char value[], int offset, intcount) {if (offset < 0) {throw newStringIndexOutOfBoundsException(offset);

}if (count < 0) {throw newStringIndexOutOfBoundsException(count);

}//Note: offset or count might be near -1>>>1.

if (offset > value.length -count) {throw new StringIndexOutOfBoundsException(offset +count);

}this.value = Arrays.copyOfRange(value, offset, offset+count);

}

这个构造函数,每次都会复制数组,实现与Java6并不一样。至于哪个好哪个坏,其实很难说清楚。

据说有一种Rope的数据结构,可以更加高效地处理字符串,得好好看看。

参考:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值