java String.subString用法

最新推荐文章于 2024-08-11 23:30:41 发布

姜子牙_pp

最新推荐文章于 2024-08-11 23:30:41 发布

阅读量1k

点赞数

分类专栏： java

java 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

原文地址：http://www.cnblogs.com/tedzhao/archive/2012/07/31/Java_String_substring.html

Java中的substring函数是我们经常使用的一个函数，用来截取当前字符串的子串，定义如下：

public final class String{
    public String substring(int beginIndex)；
    public String substring(int beginIndex, int endIndex)；
}

使用及声明都非常简单，但是你了解其中的细节吗？

我们再看一下substring的实现：

public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
new String(offset + beginIndex, endIndex - beginIndex, value);
}

在第12行返回了一个新的字符串，传入了三个参数：offset，count，以及原来String对象的value（char[]）。

继续看第12行String的构造函数：

String(int offset, int count, char value[]) {
        this.value = value;
        this.offset = offset;
        this.count = count;
}

这是一个Package内部的方法，非public，注意他直接将传入的char[]给抓住了，定义了起始位置，以及长度。

我理解他设计的初衷是为了节省内存，新的字符串还依然抓着老的字符串value的引用，只是重新定义起始位置和长度。

再补充一下String类的字段声明，可以看得更清楚一点：

public final class String{
/** The value is used for character storage. */
private final char value[];

/** The offset is the first index of the storage that is used. */
private final int offset;

/** The count is the number of characters in the String. */
private final int count;

/** Cache the hash code for the string */
private int hash; // Default to 0

华丽的分割线下是我的使用场景：

逐行读取一个非常大的文本文件，每一行的长度比较大，提取其中的小部分（使用了substring函数），大约每行提取10个字，行数很多（整个文本可能会有几十M或更多）。

读取文件完毕后我理解的内存消耗不会很大，但是完全出乎我的想象，内存消耗很大，甚至会outofmemory.

List<String> results = new ArrayList<String>();
InputStream stream = new FileInputStream(filePath);
BufferedReader bufferredReader = new BufferedReader(
new InputStreamReader(stream));
while (true) {
String line = bufferredReader.readLine();
if (line == null) {
break;
}

results.add(line.substring(10, 20));
}

Why？在查看了String的substring函数以及多方Google之后，终于明白了。

原来新构建的String依然抓着每一行的文本，只是调整了offset和length，不是我们所理解的只抓着一个小文本，原来的长文本被GC回收这么回事。

怎么办呐？只需要改一句：

构建一个新的String，将字串传入，这样与原始字符串就没有关系了。所以在使用substring的时候，必须要注意使用场景。

最后再提醒一个，String的split函数返回的子串也是如此。

List<String> results = new ArrayList<String>();
InputStream stream = new FileInputStream(filePath);
BufferedReader bufferredReader = new BufferedReader(
                new InputStreamReader(stream));
while (true) {
    String line = bufferredReader.readLine();
    if (line == null) {
                break;
    }

    results.add(new String(line.substring(10, 20)));
}