String源码分析

最新推荐文章于 2022-04-06 10:03:53 发布

@从入门到入土

最新推荐文章于 2022-04-06 10:03:53 发布

阅读量6.5k

点赞数 3

分类专栏： Java基础文章标签： Java基础 String

本文链接：https://blog.csdn.net/u011212394/article/details/85346274

版权

Java基础专栏收录该内容

20 篇文章 1 订阅

订阅专栏

length()、isEmpty()、charAt()

getBytes()

hashCode()、equals()

contentEquals()

compareTo()、compareToIgnoreCase()

接口实现

/**
 * 实现了 java.io.Serializable接口，标识可序列化。
 * 实现了 Comparable<String>接口，这个接口只有一个compareTo(T 0)方法，用于对两个对象比较大小。
 * 实现了 CharSequence接口，这个接口是一个只读的字符序列。
 * 包括length(), charAt(int index), subSequence(int start, int end)等方法，
 * StringBuffer和StringBuilder也是实现了该接口。
 */
public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {

主要成员

    /**
     * value[]是存储String的内容的，对于String str = "abc";本质上，"abc"存储在一个char类型的数组中。
     */
    private final char value[];

    /**
     * hash是String实例化的hashcode的一个缓存。
     * 因为String经常被用于比较。例如在HashMap中，缓存HashCode值可以避免每次比较都要重新计算。
     */
    private int hash; // Default to 0

内部类

    /**
     * 静态内部类的单例对象
     */
    public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                         = new CaseInsensitiveComparator();
    /**
     * 私有的静态内部类
     * 用于两个字符串忽略大小写的比较
     */
    private static class CaseInsensitiveComparator
            implements Comparator<String>, java.io.Serializable {
        // use serialVersionUID from JDK 1.2.2 for interoperability
        private static final long serialVersionUID = 8575799808933029326L;

        public int compare(String s1, String s2) {
            int n1 = s1.length();
            int n2 = s2.length();
            int min = Math.min(n1, n2);
            for (int i = 0; i < min; i++) {
                char c1 = s1.charAt(i);
                char c2 = s2.charAt(i);
                if (c1 != c2) {
                    c1 = Character.toUpperCase(c1);
                    c2 = Character.toUpperCase(c2);
                    if (c1 != c2) {
                        c1 = Character.toLowerCase(c1);
                        c2 = Character.toLowerCase(c2);
                        if (c1 != c2) {
                            // No overflow because of numeric promotion
                            return c1 - c2;
                        }
                    }
                }
            }
            return n1 - n2;
        }

        /** Replaces the de-serialized object. */
        private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
    }

    /**
     * String通过该方法调用内部类实现字符串忽略大小写的比较
     */
    public int compareToIgnoreCase(String str) {
        return CASE_INSENSITIVE_ORDER.compare(this, str);
    }

构造函数

String提供了多种初始化的方式，包括接收String、char[]、byte[]、StringBuffer等多种参数类型的构造方法。本质上，都是将接收到的参数传递给全局变量value[]。

    public String() {
        this.value = "".value;
    }

    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

    public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

    public String(byte bytes[], int offset, int length, String charsetName)
            throws UnsupportedEncodingException {
        if (charsetName == null)
            throw new NullPointerException("charsetName");
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(charsetName, bytes, offset, length);
    }

    public String(StringBuilder builder) {
        this.value = Arrays.copyOf(builder.getValue(), builder.length());
    }

    public String(StringBuffer buffer) {
        synchronized(buffer) {
            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
        }
    }

    ......

常用方法

length()、isEmpty()、charAt()

由于String的内部是通过char[]实现的，因此length()、isEmpty()、charAt()这些方法的本质就是读取char[]的信息。

    /**
     * 返回字符串数组的长度
     */
    public int length() {
        return value.length;
    }

    /**
     * 字符串数组是否为空
     */
    public boolean isEmpty() {
        return value.length == 0;
    }
    /**
     * 返回指定索引的字符
     */
    public char charAt(int index) {
        if ((index < 0) || (index >= value.length)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return value[index];
    }

getBytes()

将String字符串转成字节数组有多种方式，可以指定byte数组，也可以让其返回一个byte数组。本质上，都是调用了StringCoding.encode()这个静态方法。

    public void getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) {
        if (srcBegin < 0) {
            throw new StringIndexOutOfBoundsException(srcBegin);
        }
        if (srcEnd > value.length) {
            throw new StringIndexOutOfBoundsException(srcEnd);
        }
        if (srcBegin > srcEnd) {
            throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
        }
        Objects.requireNonNull(dst);

        int j = dstBegin;
        int n = srcEnd;
        int i = srcBegin;
        char[] val = value;   /* avoid getfield opcode */

        while (i < n) {
            dst[j++] = (byte)val[i++];
        }
    }
    public byte[] getBytes(String charsetName)
            throws UnsupportedEncodingException {
        if (charsetName == null) throw new NullPointerException();
        return StringCoding.encode(charsetName, value, 0, value.length);
    }
　　
　　public byte[] getBytes() {
    　　return StringCoding.encode(value, 0, value.length);
　　}

hashCode()、equals()

String覆写了Object类的hashCode()与equals()方法，这两个方法都是根据字符串的内容去计算和比较，而非内存地址，hash值在首次计算后会被缓存，这也是String能广泛使用于Map的key的原因。

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

contentEquals()

除了equals()方法外，还有只比较内容的方法contentEquals(); 该方法主要是用来比较String和StringBuffer或者StringBuilder的内容是否一样。可以看到传入参数的类型是CharSequence ，这也说明了StringBuffer和StringBuilder同样实现了CharSequence接口。

    public boolean contentEquals(CharSequence cs) {
        // Argument is a StringBuffer, StringBuilder
        if (cs instanceof AbstractStringBuilder) {
            if (cs instanceof StringBuffer) {
                synchronized(cs) {
                   return nonSyncContentEquals((AbstractStringBuilder)cs);
                }
            } else {
                return nonSyncContentEquals((AbstractStringBuilder)cs);
            }
        }
        // Argument is a String
        if (cs instanceof String) {
            return equals(cs);
        }
        // Argument is a generic CharSequence
        char v1[] = value;
        int n = v1.length;
        if (n != cs.length()) {
            return false;
        }
        for (int i = 0; i < n; i++) {
            if (v1[i] != cs.charAt(i)) {
                return false;
            }
        }
        return true;
    }

compareTo()、compareToIgnoreCase()

compareTo()方法是String对Comparable接口中方法的实现。方法通过while循环，从第一个开始比较每一个字符，当遇到第一个不相等的字符时，返回两者差的int值，返回值小于0代表anotherString较大，大于0代表anotherString较小，等于0代表两者相等。

compareToIgnoreCase()方法是忽略大小写的比较，原理与compareTo()相同。
    public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

    public int compareToIgnoreCase(String str) {
        return CASE_INSENSITIVE_ORDER.compare(this, str);
    }

startsWith()

startsWith()方法判断当前字符串是否以某一段其他字符串开始的。

    public boolean startsWith(String prefix) {
        return startsWith(prefix, 0);
    }

    public boolean startsWith(String prefix, int toffset) {
        char ta[] = value;
        int to = toffset;
        char pa[] = prefix.value;
        int po = 0;
        int pc = prefix.value.length;
        // Note: toffset might be near -1>>>1.
        if ((toffset < 0) || (toffset > value.length - pc)) {
            return false;
        }
        while (--pc >= 0) {
            if (ta[to++] != pa[po++]) {
                return false;
            }
        }
        return true;
    }

indexOf()

indexOf()方法返回指定字符或字符串在数组中的起始坐标。

    public int indexOf(int ch) {
        return indexOf(ch, 0);
    }
    
    public int indexOf(int ch, int fromIndex) {
        final int max = value.length;
        if (fromIndex < 0) {
            fromIndex = 0;
        } else if (fromIndex >= max) {
            // Note: fromIndex might be near -1>>>1.
            return -1;
        }

        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            final char[] value = this.value;
            for (int i = fromIndex; i < max; i++) {
                if (value[i] == ch) {
                    return i;
                }
            }
            return -1;
        } else {
            return indexOfSupplementary(ch, fromIndex);
        }
    }

    public int indexOf(String str) {
        return indexOf(str, 0);
    }

    ......

substring()

substring()方法返回字符串中一个子串，看最后一行可以发现，其实就是指定头尾，然后构造一个新的字符串。

    public String substring(int beginIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        int subLen = value.length - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
    }

    public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

concat()

concat方法的作用是将str拼接到当前字符串后面，返回的是一个新创建的字符串。

    public String concat(String str) {
        int otherLen = str.length();
        if (otherLen == 0) {
            return this;
        }
        int len = value.length;
        char buf[] = Arrays.copyOf(value, len + otherLen);
        str.getChars(buf, len);
        return new String(buf, true);
    }

replace()

replace方法的主要作用是将原来字符串中的oldChar全部替换成newChar。

    public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */

            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            if (i < len) {
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }

split()

split方法的主要作用是将字符串按照指定的字符regex进行分割，将分割后的字符保存到一个新的String[]中。

    public String[] split(String regex) {
        return split(regex, 0);
    }

    public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
         (1)one-char String and this character is not one of the
            RegEx's meta characters ".$|()[{^?*+\\", or
         (2)two-char String and the first char is the backslash and
            the second is not the ascii digit or ascii letter.
         */
        char ch = 0;
        if (((regex.value.length == 1 &&
             ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    // last one
                    //assert (list.size() == limit - 1);
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // If no match was found, return this
            if (off == 0)
                return new String[]{this};

            // Add remaining segment
            if (!limited || list.size() < limit)
                list.add(substring(off, value.length));

            // Construct result
            int resultSize = list.size();
            if (limit == 0) {
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        return Pattern.compile(regex).split(this, limit);
    }

trim()

trim方法的作用是，删除字符串前后的空格。

    public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */

        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

valueOf()

valueOf方法是String类中的static方法，是方法重载的经典案例，将不同类型的参数转化成String字符串。

    /**
     * 将Object对象转换成字符串
     */
    public static String valueOf(Object obj) {
        return (obj == null) ? "null" : obj.toString();
    }
    /**
     * 将boolean变量转换成字符串
     */
    public static String valueOf(boolean b) {
        return b ? "true" : "false";
    }
    /**
     * 将char变量转换成字符串
     */
    public static String valueOf(char c) {
        char data[] = {c};
        return new String(data, true);
    }
    /**
     * 将char数组转换成字符串
     */
    public static String valueOf(char data[]) {
        return new String(data);
    }
    /**
     * 将char数组从data[offset]开始取count个元素，转换成字符串
     */
    public static String valueOf(char data[], int offset, int count) {
        return new String(data, offset, count);
    }
    /**
     * 将int变量转换成字符串
     */
    public static String valueOf(int i) {
        return Integer.toString(i);
    }
    /**
     * 将long变量转换成字符串
     */
    public static String valueOf(long l) {
        return Long.toString(l);
    }
    /**
     * 将float变量转换成字符串
     */
    public static String valueOf(float f) {
        return Float.toString(f);
    }
    /**
     * 将double变量转换成字符串
     */
    public static String valueOf(double d) {
        return Double.toString(d);
    }

intern()

intern方法是String类中的native方法，其作用是将字符串对象加入到常量池。

    public static void main(String[] args) {
        String str1 = "aaa";
        String str2 = "bbb";
        String str3 = "aaabbb";
        String str4 = str1 + str2;
        String str5 = "aaa" + "bbb";

        System.out.println(str3 == str4); // false
        System.out.println(str3 == str4.intern()); // true
        System.out.println(str3 == str5); // true

    }

String定义为final的原因

String 类设计成不可变的一个原因是安全，当你在调用其他方法时，比如调用一些系统级操作指令之前，可能会有一系列校验，如果是可变类的话，可能在你校验过后，它的内部的值又被改变了，这样有可能会引起严重的系统崩溃问题，这是迫使 String 类设计成不可变类的一个重要原因。

另一个原因是高效，以 JVM 中的字符串常量池来举例，如下两个变量：
String s1 = "java";
String s2 = "java";
只有字符串是不可变时，我们才能实现字符串常量池，字符串常量池可以为我们缓存字符串，提高程序的运行效率。试想一下如果 String 是可变的，那当 s1 的值修改之后，s2 的值也跟着改变了，这样就和我们预期的结果不相符了，因此也就没有办法实现字符串常量池的功能了。

总结

在String类中，底层是通过一个private final char value[] 来保存字符串的，所以String的大部分方法都是围绕着这个char类型的数组实现的。另外，String类是通过final修饰的不可变类，无法被继承，value[]也是私有的final修饰的数组，在substring()、replace()、trim()等涉及修改的方法实现中，最后都会返回一个新创建的字符串，从而保证了String的不可变性。