The substring() Method in JDK 6 and JDK 7

The substring(int beginIndex, int endIndex) method in JDK 6 and JDK 7 are different. Knowing the difference can help you better use them. For simplicity reasons, in the followingsubstring() represent the substring(int beginIndex, int endIndex) method.

1. What substring() does?

The substring(int beginIndex, int endIndex) method returns a string that starts with beginIndex and ends with endIndex-1.

String x = "abcdef";
x = x.substring(1,3);
System.out.println(x);
Output:

bc

2. What happens when substring() is called?

You may know that because x is immutable, when x is assigned with the result of x.substring(1,3), it points to a totally new string like the following:

string-immutability

However, this diagram is not exactly right or it represents what really happens in the heap. What really happens when substring() is called is different between JDK 6 and JDK 7.

3. substring() in JDK 6

String is supported by a char array. In JDK 6, the String class contains 3 fields: char value[], int offset, int count. They are used to store real character array, the first index of the array, the number of characters in the String.

When the substring() method is called, it creates a new string, but the string's value still points to the same array in the heap. The difference between the two Strings is their count and offset values.

string-substring-jdk6

The following code is simplified and only contains the key point for explain this problem.

在String类中存在这样三个属性:

1、value 字符数组,存储字符串实际的内容
2、offset 该字符串在字符数组value中的起始位置
3、count 字符串包含的字符的长度

//JDK 6
String(int offset, int count, char value[]) {
	this.value = value;
	this.offset = offset;
	this.count = count;
}
 
public String substring(int beginIndex, int endIndex) {
	//check boundary
	return  new String(offset + beginIndex, endIndex - beginIndex, value);
}


Java1.6中的源码实现:

 public String substring(int beginIndex, int endIndex) {
	if (beginIndex < 0) {
	    throw new StringIndexOutOfBoundsException(beginIndex);
	}
	if (endIndex > count) {
	    throw new StringIndexOutOfBoundsException(endIndex);
	}
	if (beginIndex > endIndex) {
	    throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
	}
	return ((beginIndex == 0) && (endIndex == count)) ? this :
	    new String(offset + beginIndex, endIndex - beginIndex, value);
    }


上述方法再调用本身的构造方法:

// Package private constructor which shares value array for speed.
    String(int offset, int count, char value[]) {
	this.value = value;
	this.offset = offset;
	this.count = count;
    }


当我们调用字符串a的substring得到字符串b,其实这个操作,无非就是调整了一下b的offset和count,用到的内容还是a之前的value字符数组,并没有重新创建新的专属于b的内容字符数组。


举个和上面重现代码相关的例子,比如我们有一个1G的字符串a,我们使用substring(0,2)得到了一个只有两个字符的字符串b,如果b的生命周期要长于a或者手动设置a为null,当垃圾回收进行后,a被回收掉,b没有回收掉,那么这1G的内存占用依旧存在,因为b持有这1G大小的字符数组的引用。就有可能会出现内存溢出了。


如何解决


对于之前比较不常见的1G字符串只截取2个字符的情况可以使用下面的代码,这样的话,就不会持有1G字符串的内容数组引用了


String littleString = new String(largeString.substring(0,2));


下面的这个构造方法,在源字符串内容数组长度大于字符串长度时,进行数组复制,新的字符串会创建一个只包含源字符串内容的字符数组。


public String(String original) {
  int size = original.count;
  char[] originalValue = original.value;
  char[] v;
  if (originalValue.length > size) {
      // The array representing the String is bigger than the new
      // String itself.  Perhaps this constructor is being called
      // in order to trim the baggage, so make a copy of the array.
      int off = original.offset;
      v = Arrays.copyOfRange(originalValue, off, off+size);
  } else {
      // The array representing the String is the same
      // size as the String, so no point in making a copy.
      v = originalValue;
  }
  this.offset = 0;
  this.count = size;
  this.value = v;
}


4. A problem caused by substring() in JDK 6

If you have a VERY long string, but you only need a small part each time by using substring(). This will cause a performance problem, since you need only a small part, you keep the whole thing. For JDK 6, the solution is using the following, which will make it point to a real sub string:

x = x.substring(x, y) + ""


上面这行代码其实也等价于:

StringBuffer buffer = new StringBuffer();
		buffer.append(x.substring(x, y));
		buffer.append("");
		x = buffer.toString();


如果改为:

x = new String(x.substring(x, y));

这样的话通过调用subString()方法所截取的字符串就会指向一个新的真正的子串了。


5. substring() in JDK 7

This is improved in JDK 7. In JDK 7, the substring() method actually create a new array in the heap.

string-substring-jdk7

JDK1.7中的源码实现:

public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
      throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > value.length) {
      throw new StringIndexOutOfBoundsException(endIndex);
    }
    int subLen = endIndex - beginIndex;
    if (subLen < 0) {
      throw new StringIndexOutOfBoundsException(subLen);
    }
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
}

substring方法中调用的构造方法,进行内容字符数组复制。

public String(char value[], int offset, int count) {
    if (offset < 0) {
          throw new StringIndexOutOfBoundsException(offset);
    }
    if (count < 0) {
      throw new StringIndexOutOfBoundsException(count);
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
      throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}


原文出处:

http://www.programcreek.com/2013/09/the-substring-method-in-jdk-6-and-jdk-7/


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值