String系列——String关键源码解析

最新推荐文章于 2024-09-25 11:06:23 发布

过粗涩

最新推荐文章于 2024-09-25 11:06:23 发布

阅读量639

点赞数 1

分类专栏： java基础知识文章标签： java String 源码

java基础知识专栏收录该内容

9 篇文章 0 订阅

订阅专栏

文章来自：http://www.cnblogs.com/xiaoxuetu/

根据java官网文档的描述，String类代表字符串，是常量，他们的值在创建之后是不可变的，究竟String类型是怎么实现这些的呢？

final关键字

在探讨String类型的原理之前，我们应该先弄清楚关于final关键字的使用：

1> 如果final修饰的是类的话，那么这个类是不能被继承的

2> 如果final修饰的是方法的话，那么这个方法是不能被重写的

3> 如果final修饰的是变量的话，那么这个变量的值在运行期间是不能被修改的

当然，关于具体的赋值等操作，可以参考《对象与内存管理》中的最后一点，这里就不再重复了。

String类与final的不解之缘

现在，我们开始探讨String类吧，下面只是String类的部分源代码：

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence
{
    /** The value is used for character storage. */
    private final char value[]; //用来存储字符串转换而来的字符数组

    /** The offset is the first index of the storage that is used. */
    private final int offset; //字符串起始字符在字符数组的位置

    /** The count is the number of characters in the String. */
    private final int count; //字符串分解成字符数组后字符的数目
}

从上面代码，我们知道：

1> String类是被final修饰的，从安全角度来说，通过final修饰后的String类是不能被其他类继承的，在最大程度上的保护了该类，从效率上来说，提高了该类的效率，因为final修饰后会比没有用final修饰的快。

2> value[], offet, count也是被final修饰的，这时候三个变量的值必须在编译期间就被确定下来，并且在运行期间不能再被修改了。因此，每次我们进行字符串修改、拼接的时候，并不能直接修改当前String对象的值，只能重新创建一个新的对象。

3>我们创建String对象的时候，String对象还使用字符数组(char[])来存储我们的字符串的。

CodePoint & CodeUnit

1. CodePoint：一个完整的字符就是一个CodePoint(代码点)，例如：'A', 'B', 'C'

2. CodeUnit ：一个char就是一个CodeUnit(代码单元)，想'A', 'B'这些，他们就只使用了了一个CodeUnit；但是如果是使用UTF-16表示的字符，就需要两个CodeUnit

下面我们看一个代码：

public class CodePoint {
    public static void main(String[] args){
        /* 
         * 这个字符串中有一个Unicode编码表中附加级别的字符。（是一个数学符号）
         * '\u' 后面只能跟 4 个字符，\u1D56B 表示的是 \u1D56 字符和字母 B，
         * 所以使用 codePointCount 获得的代码点数量就是 2 了。
         */
        String str1="\u1D56B"; 
        /*
         *  这个字符串是上面那个字符串采用UTF-16编码方法拆成的两个连续的代码单元中的值
         *  下面这种方式是采用代理对的方式来表示这个字符，
         *  虽然说采用两个 Unicode 的代理对来表示，但这只是表示 Unicode 中的一个代码点
         */
        String str2="\uD875\uDD6B";  

        System.out.println(str1);
        System.out.println(str1.length());  //打印结果：2
        System.out.println(str1.codePointCount(0,str1.length())); //打印结果：2
        
        System.out.println(str2);
        System.out.println(str2.length()); //打印结果：2
        System.out.println(str2.codePointCount(0,str2.length())); //打印结果：1
   }
}

从上面的打印结果我们也可以看出，length() 和 codePointCount()方法的区别是：

1> codePointCount方法返回的是代码点的数量

2> length()方法返回的是代码单元的数量

String类常用的构造方法

其实呢，一般情况下，我们使用String类的时候很少通过构造方法来创建String对象的，因为这是不推荐的，但是不知道大家知不知道，

String str = "abc";　　//只创建了一个String对象

这种通过直接量创建String对象其实就等效于下面使用了通过字符串构造方法创建对象的。

char data[] = {'a', 'b', 'c'};
String str = new String(data);　　//只创建了一个String对象

但是一般情况下使用第二这种方式太麻烦了，所以我们都推荐使用第一种方式创建String对象。

下面我们开始讲解一下String类常用的构造方法吧。

1>无参数的构造方法，这个创建的String对象的值为""，注意是是""，这个就等效于我们String str = "";具体关于参数的相信不用讲大家都应该知道了吧，不记得了的朋友可以看回前面final中列举出的代码注释。

/**
 * Initializes a newly created {@code String} object so that it represents
 * an empty character sequence.  Note that use of this constructor is
 * unnecessary since Strings are immutable.
 */
public String() {
    this.offset = 0;
    this.count = 0;
    this.value = new char[0];
}

2>使用Stirng对象作为构造方法的参数，需要注意的是，通过该构造方法创建String对象将会产生2个字符串对象，所以不推荐使用（具体为什么是两个，可以参考博文《小学徒进阶系列—JVM对String的处理》）

public String(String original) {
    int size = original.count;    //获取源字符串的字符数量
    char[] originalValue = original.value;    //获取源字符串的字符数组
    char[] v;    //用于新字符串对象存储字符数组
    if (originalValue.length > size) {
        int off = original.offset;    //获取源字符串起始字符在字符数组的位置
        v = Arrays.copyOfRange(originalValue, off, off+size);    //返回将源字符数组复制到新的字符数组中
    } else {
        // The array representing the String is the same
        // size as the String, so no point in making a copy.
        v = originalValue;
    }
    this.offset = 0;
    this.count = size;
    this.value = v;
}

其实在构造方法中的各行代码里，我想大家在看这行代码的时候，最想知道的应该是Arrays.copyOfRange(char[] original,int from,int to)中各个参数的含义吧，官网中的解释是这样子的：

public static char[] copyOfRange(char[] original,
                                 int from,
                                 int to)
将指定数组的指定范围复制到一个新数组。该范围的初始索引 (from) 必须位于 0 和 original.length（包括）之间。original[from] 处的值放入副本的初始元素中（除非 from == original.length 或 from == to）。原数组中后续元素的值放入副本的后续元素。该范围的最后索引 (to)（必须大于等于 from）可以大于 original.length，在这种情况下，'\\u000' 被放入索引大于等于 original.length - from 的副本的所有元素中。返回数组的长度为 to - from。

参数：
original - 将要从其复制一个范围的数组
from - 要复制的范围的初始索引（包括）
to - 要复制的范围的最后索引（不包括）。（此索引可以位于数组范围之外）。

返回：
包含取自原数组指定范围的新数组，截取或用 0 填充以获得所需长度

抛出：
ArrayIndexOutOfBoundsException - 如果 from < 0 或 from > original.length()
IllegalArgumentException - 如果 from > to
NullPointerException - 如果 original 为 null

3>使用字符数组作为String构造方法的参数,前面你我们已经知道了，使用String str = "abc"，相当于使用该构造方法创建对象

public String(char value[]) {
    int size = value.length;
    this.offset = 0;
    this.count = size;
    this.value = Arrays.copyOf(value, size);
}

4>同样使用字符数组作为String构造方法的参数，但是并不是全部都是用来构造字符串对象的，而是只使用从offerset起的count个字符作为String对象的值。

public String(char value[], int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count < 0) {
        throw new StringIndexOutOfBoundsException(count);
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.offset = 0;
    this.count = count;
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}

String类常用的方法

在这里，我重点讲解关于String类常用的方法，同时分析它的源代码，具体使用方法和执行结果，读者可以自行尝试哦，我就不累赘的写出来啦，而且我把这些代码缩起来了，避免整篇博文都是源代码，看的辛苦，大家需要看哪个方法的源代码，就直接展开哪个方法就行了，好啦，言归正传，我们开始分析吧。

1> public char charAt(int index)

返回指定索引处的 char 值。索引范围为从 0 到 length() - 1。序列的第一个 char 值位于索引 0 处，第二个位于索引 1 处，依此类推，这类似于数组索引。下面是该方法的源码解析：

public char charAt(int index) {
    if ((index < 0) || (index >= count)) {    //如果index索引 < 0 或者 > 字符串长度， 
        throw new StringIndexOutOfBoundsException(index);    //抛出超出边界的异常
    }
    return value[index + offset];    //如果索引合法，直接取出在字符数组中索引对应的值
}

2> public String concat(String str)

将指定字符串连接到此字符串的结尾。如果参数字符串的长度为 0，则返回此 String 对象。否则，创建一个新的 String 对象，用来表示由此 String 对象表示的字符序列和参数字符串表示的字符序列连接而成的字符序列。（从上面的注释中，我们已经知道，因为String类型是常量，一旦创建值是不可以改变的，所以只能通过创建新的字符串并返回新字符串的引用，确保了字符串不可变及可共享）

public String concat(String str) {
    int otherLen = str.length();    //获取需要拼接到源字符串后的字符串的长度
    if (otherLen == 0) {    //如果长度为0
        return this;    //直接返回该对象
    }
    //否则开始创建一个新的字符串对象并返回，
    //这也是就是String类型一旦创建就不可改变的实现方式
    char buf[] = new char[count + otherLen];    
    getChars(0, count, buf, 0);        //这两行说明把字符数组复制到buf字符数组中
    str.getChars(0, otherLen, buf, count);    //具体getChars()方法在下面会讲到
    return new String(0, count + otherLen, buf);    //使用前面说道的第四种构造方法，创建新的字符串对象并返回
}

3> public void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)

将字符从此字符串复制到目标字符数组。要复制的第一个字符位于索引 srcBegin 处；要复制的最后一个字符位于索引 srcEnd-1 处（因此要复制的字符总数是 srcEnd-srcBegin）。要复制到 dst 子数组的字符从索引 dstBegin 处开始，并结束于索引：dstbegin + (srcEnd-srcBegin) - 1

参数：
srcBegin - 字符串中要复制的第一个字符的索引。
srcEnd - 字符串中要复制的最后一个字符之后的索引。
dst - 目标数组。
dstBegin - 目标数组中的起始偏移量。

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
    //几个if语句都是健壮性的判断，相信大家都能看得懂
    if (srcBegin < 0) {
        throw new StringIndexOutOfBoundsException(srcBegin);
    }
    if (srcEnd > count) {
        throw new StringIndexOutOfBoundsException(srcEnd);
    }
    if (srcBegin > srcEnd) {
        throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
    }
    //这就是一个数组的复制方法
    System.arraycopy(value, offset + srcBegin, dst, dstBegin,    //将源字符数组从索引offset + srcBegin开始，复制到到目标数组dst中，
         srcEnd - srcBegin);                    //复制到目标数组中从索引dstBegin开始，长度为 srcEnd - srcBegin    
}

4> indexOf(..)

这方法常用的有几个，比如public int indexOf(String str) 和 public int indexOf(String str, int fromIndex)，他们都是返回指定子字符串在此字符串中第一次出现处的索引。只是前者是搜索整个字符串，而后者是从指定位置开始搜索。

public int indexOf(String str)

public int indexOf(String str) {
    return indexOf(str, 0);
}

public int indexOf(String str, int fromIndex)

public int indexOf(String str, int fromIndex) {
     return indexOf(value, offset, count,
                       str.value, str.offset, str.count, fromIndex);
}

通过查看上面两个方法的源代码我们可以发现，这两个方法内部都是使用String类中的一个使用默认权限的indexOf()方法进行实现的,我们一起来进行详细的分析。

/**
  *
  * @param source 源字符串的字符数组
  * @param sourceOffset 源字符串第一个字符在字符数组中的起始位置
  * @param target 需要查找的字符串数组
  * @param targetOffset 需要查找的字符串第一个字符在字符数组中的起始位置
  * @param targetCount 需要查找的字符串的长度
  * @param fromIndex 在源字符串中查找的起始位置
  * @return 返回指定字符在此字符串中第一次出现处的索引，若不存在，返回-1
  */
static int indexOf(char[] source, int sourceOffset, int sourceCount,
           char[] target, int targetOffset, int targetCount,
           int fromIndex) {
    if (fromIndex >= sourceCount) {    //如果其实位置大于源数组的长度
        return (targetCount == 0 ? sourceCount : -1);    //若需要查找的字符串长度为0，则返回源字符串的长度，否则返回 -1
    }
    if (fromIndex < 0) {    //健壮性的矫正，将查询起始索引纠正为0
        fromIndex = 0;
    }
    if (targetCount == 0) {    //如果需要查找的字符串长度为0，说明需要查找的字符串为空值
        return fromIndex;    //直接返回起始地址
    }

    char first  = target[targetOffset];    //获取需要查找的字符串的第一个字符
    int max = sourceOffset + (sourceCount - targetCount);    //在源字符串中搜索结束的索引

    for (int i = sourceOffset + fromIndex; i <= max; i++) {    
        /* Look for first character. */
        if (source[i] != first) {    //找到与查找字符串第一个字符相等的字符在源字符串字符数组中的起始索引
        while (++i <= max && source[i] != first);
        }

        /* Found first character, now look at the rest of v2 */
        if (i <= max) {
        int j = i + 1;    //从查找到的第一个字符索引的下一个开始查找
        int end = j + targetCount - 1;    //结束本次循环查找的最后一个字符索引
        for (int k = targetOffset + 1; j < end && source[j] ==    //如果后面的字符值和查找的字符串的字符值相同，继续增加j的值，
             target[k]; j++, k++);

        if (j == end) {    //如果j的值与end的值相同了，则表示找到了需要查找的字符串
            /* Found whole string. */    
            return i - sourceOffset;    //计算出第一次出现的位置。
        }
        }
    }
    return -1;    //返回-1，说明源字符串中不包含需要查找的字符串
}

5> public boolean contains(CharSequence s)

当且仅当此字符串包含指定的 char 值序列时，返回 true。

public boolean contains(CharSequence s) {
    return indexOf(s.toString()) > -1;
}

从源代码我们也可以知道，这个方法利用的是前面说的indexOf(String str)方法进行实现的，具体就不再细说了。

6> public boolean isEmpty()

判断字符串是否为空。

public boolean isEmpty() {
    return count == 0;    //如果字符串对象中的字符数组为空，说明当前字符串长度为0
}

7> startsWith(String prefix)、endsWith(String suffix)

前者是判断字符串是否以prefix为开头，而后者是判断字符串是否以suffix为结尾。
public boolean startsWith(String prefix)

public boolean startsWith(String prefix) {
    return startsWith(prefix, 0);
}

public boolean endsWith(String suffix)

public boolean endsWith(String suffix) {
    return startsWith(suffix, count - suffix.count);
}

从上述代码我们可以看到，这两个方法同样调用的是一个方法来进行实现的，下面我们也来分析一下这个方法

/**
 * @param prefix 需要判断的字符串
 * @param toffset 开始查找的索引
 */
public boolean startsWith(String prefix, int toffset) {
    char ta[] = value;    //获取当前字符串对象的字符数组
    int to = offset + toffset;    //获取到开始判断的索引在当前字符串对象的位置
    char pa[] = prefix.value;    //获取需要判断判断的prefix字符串的字符数组
    int po = prefix.offset;    
    int pc = prefix.count;
    // Note: toffset might be near -1>>>1.
    if ((toffset < 0) || (toffset > count - pc)) {    //如果起始位置 < 0 或者 开始位置 > 字符串最后可查找的索引
        return false;                //此时说明prefix不存在当前字符串对象中
    }
    while (--pc >= 0) {                //开始判断从toffset位置开始的字符串是否和prefix的值想等
        if (ta[to++] != pa[po++]) {
        return false;
        }
    }
    return true;
}

8> copyValueOf(char data[])

其实，准确的说，这个方式是将data数组转换成字符串对象。

public static String copyValueOf(char data[]) {
    return copyValueOf(data, 0, data.length);
}

9> copyValueOf(char data[], int offset, int count)

通过源代码我们可以知道，这个方法通过调用String类型的构造方法进行创建并且返回的字符串对象

public static String copyValueOf(char data[], int offset, int count) {
    // All public String constructors now copy the data.
    return new String(data, offset, count);
}

10> toCharArray()

将字符串对象的字符数组复制到一个新的数组中并返回这个新创建的数组。

public char[] toCharArray() {
    char result[] = new char[count];
    getChars(0, count, result, 0);    // 将当前字符串对象的数组复制到新创建的result数组中
    return result;
}

11> trim()

这个方法用于去除当前字符串对象中的首部和尾部的空白，不会去除中间的空白。

public String trim() {
    int len = count;
    int st = 0;
    int off = offset;      /* avoid getfield opcode */
    char[] val = value;    /* avoid getfield opcode */

    while ((st < len) && (val[off + st] <= ' ')) {
        st++;
    }
    while ((st < len) && (val[off + len - 1] <= ' ')) {
        len--;
    }
    return ((st > 0) || (len < count)) ? substring(st, len) : this;
}

这段代码很简单，我就不再详细介绍吧，大家自己看就行啦，呵呵，原谅我偷下懒吧。

String类的equals()和==

本来呢，equals也是String中的一个常用的方法，可是为什么要单独放出来讲呢？因为他太重要了，很多初学者都很容易把他和==给混淆了。下面我们讲解下String用这两个进行对象判断时两者的区别吧。

== 判断的是字符串对象引用地址是否相同

equals判断的主要是两个字符串对象中的内容是否相同。

我们举个代码作为例子吧，相信大家运行一次肯定就能够懂的了。

public static void main(String[] args) {
        
        String x = new String("java");    //创建对象x，其值是java
        String y = new String("java");    //创建对象y，其值是java
        
        System.out.println(x == y);        // false, 使用关系相等比较符比较对象x和y
        System.out.println(x.equals(y));    // true, 使用对象的equals()方法比较对象x和y    
        
        String m = "java";    //创建对象m，其值是java
        String n = "java";    //创建对象n，其值是java
        
        System.out.println(m == n);        // true, 使用关系相等比较符比较对象m和n
        System.out.println(m.equals(n));    // true, 使用关对象的equals()方法比较对象m和n    
    }
}

为什么后面m == n结果是true呢？这个主要就是因为常量池的原因呢，详情可以参考Java对象创建方式及JVM对字符串处理学习料及了。