java源码探究——String

最新推荐文章于 2024-03-15 13:00:00 发布

wolf小狼崽

最新推荐文章于 2024-03-15 13:00:00 发布

阅读量2.7k

点赞数 1

分类专栏： java源码探究——java.lang包文章标签： java String 源码解析 JDK1.8

本文链接：https://blog.csdn.net/qq_30585743/article/details/89459796

版权

java源码探究——java.lang包专栏收录该内容

1 篇文章 0 订阅

订阅专栏

String类源码探究

3.1.1、public String()

3.1.2、 public String(String original)

3.3.3、public String(char value[])与public String(char value[], int offset, int count)

3.3.4、public String(StringBuffer buffer)

1、写在前面的话

String是java.lang包下关于字符串操作的类，String类包括的方法涉及到常用的字符串操作，如：获取字符串特定位置的字符charAt()、比较两个字符串是否相等（包含的内容）equals()、字符串长度length()、获取字符串的子字符串subString()等。学习掌握String类，可以提高java编程基础，避免重复造轮子；深入探究String类JDK实现源码，可以学习借鉴大牛编程的风格和特点，同时培养自己对java规范化编程思维。String源码实现比较简单，容易理解学习，是探究JDK源码的切入口！

源码版本：JDK1.8

查看工具：IDEA

2、String类

public final class String implements java.io.Serializable, Comparable<String>, CharSequence ，String实现了三个接口，分别是表征序列化、比较、字符序列的接口。同时String是final类，不能被继承，并且是线程安全的（final关键字可以修饰类、方法、变量，各自的作用不清楚的见这篇文章）。

String具有上述图中的5个属性，下面依次说明：

/** The value is used for character storage. */
    private final char value[];

char value[] 是String的底层数据结构，将字符串以数组字符的形式进行存储。注意到value[]前修饰的关键字是final，这样设计的初衷与String的final类一致，从内部保证不可修改性，增加String类的运行效率（被final修饰的类、变量、方法，在编译时就可以被确定进行加载，而不用等到运行时动态加载，提高了运行速率）。

/** Cache the hash code for the string */
    private int hash; // Default to 0

int hash 缓存了String的hash值，在进行复制和获取hash值的时候，可以直接使用缓存的hash值，而不需要重新计算。

/** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

long serialVersionUID 用来进行序列化，序列化ID（注意序列化ID的作用）

/**
     * Class String is special cased within the Serialization Stream Protocol.
     *
     * A String instance is written into an ObjectOutputStream according to
     * <a href="{@docRoot}/../platform/serialization/spec/output.html">
     * Object Serialization Specification, Section 6.2, "Stream Elements"</a>
     */
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];

/**
     * A Comparator that orders {@code String} objects as by
     * {@code compareToIgnoreCase}. This comparator is serializable.
     * <p>
     * Note that this Comparator does <em>not</em> take locale into account,
     * and will result in an unsatisfactory ordering for certain locales.
     * The java.text package provides <em>Collators</em> to allow
     * locale-sensitive ordering.
     *
     * @see     java.text.Collator#compare(String, String)
     * @since   1.2
     */
    public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                         = new CaseInsensitiveComparator();

Comparator<String> CASE_INSENSITIVE_ORDER 是String类中预定义比较器属性，他忽略字符串的大小写进行比较，在源码中他是由内部类CaseInsensitiveComparator()生成，下面看这个CaseInsensitiveComparator内部类的实现。

private static class CaseInsensitiveComparator
            implements Comparator<String>, java.io.Serializable {
        // use serialVersionUID from JDK 1.2.2 for interoperability
        private static final long serialVersionUID = 8575799808933029326L;

        public int compare(String s1, String s2) {
            int n1 = s1.length();
            int n2 = s2.length();
            int min = Math.min(n1, n2);
            for (int i = 0; i < min; i++) {
                char c1 = s1.charAt(i);
                char c2 = s2.charAt(i);
                if (c1 != c2) {
                    c1 = Character.toUpperCase(c1);
                    c2 = Character.toUpperCase(c2);
                    if (c1 != c2) {
                        c1 = Character.toLowerCase(c1);
                        c2 = Character.toLowerCase(c2);
                        if (c1 != c2) {
                            // No overflow because of numeric promotion
                            return c1 - c2;
                        }
                    }
                }
            }
            return n1 - n2;
        }

        /** Replaces the de-serialized object. */
        private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
    }

CaseInsensitiveComparator内部类实现了Comparator与Serializable接口，在JDK 规范中建议在实现Comparator接口时应该实现Serializable接口，原因是：让 Comparator 也实现 java.io.Serializable 是一个好主意，因为它们在可序列化的数据结构（像 TreeSet、TreeMap）中可用作排序方法。为了成功地序列化数据结构，Comparator（如果已提供）必须实现 Serializable。

tips1：Comparator有两个方法compare与equals，但是发现CaseInsensitiveComparator只实现了compare方法，并没有实现equals方法，按照接口实现规范，实现类必须实现接口的所有未实现的方法，这里难道有问题？其实不然，所有类都是Object类的子类，CaseInsensitiveComparator隐式的继承的Object中的equals方法，故而这里是没有问题的，运行也不会报错。但是要注意，在自己实现comparator接口时，需要注意重写的compare与equals是否冲突，否则会出现紊乱情况（在使用TreeSet等时，可能出现相同的值出现多次，这与TreeSet等类设计的初衷不一致）。

tips2：观察CaseInsensitiveComparator的compare内部实现，会发现一个看起来很奇怪的现象，如图所示。

为什么转换成大写后，又转换成小写，这这这不是有病吗？其实不然，打开Character.toUpperCase()的JDK叙述：“Note that Character.isUpperCase(Character.toUpperCase(ch)) does not always return true for some ranges of characters, particularly those that are symbols or ideographs. ”，即：注意，对于某些范围内的字符，特别是那些是符号或表意符号的字符，Character.isUpperCase(Character.toUpperCase(codePoint)) 并不总是返回 true。使用两次大小写变换可以避免这种问题，我推测是这样的，不然怎么解释这种“多此一举”（聪明的你有更好的意见留言呗！）。

`3、String中的常见方法解析`

3.1、构造方法

String构造方法有很多，选取下图中标红的几个构造方法加以说明：

3.1.1、public String()

/**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = "".value;
    }

public String()构造一个空的字符串，注意这里的空并不是指的null，而是“”没有字符的字符串，“”.length()==0。

3.1.2、 public String(String original)

 /**
     * Initializes a newly created {@code String} object so that it represents
     * the same sequence of characters as the argument; in other words, the
     * newly created string is a copy of the argument string. Unless an
     * explicit copy of {@code original} is needed, use of this constructor is
     * unnecessary since Strings are immutable.
     *
     * @param  original
     *         A {@code String}
     */
    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

public String(String original)，根据原始字符串创建新的字符串，将表征原始字符串关键的底层数组和hash值赋给新建字符串。但是采用这种方式在IDEA中编写时，会出现提示：这种构建字符串的方式是多余的，不必要的。为什么呢？因为String本身是不变的，当你使用一个已经存在的字符串利用 String(String original)创建新字符串时，是多余的没有必要的，这一点在源码注释上已经有所说明：“Unless an explicit copy of {@code original} is needed, use of this constructor is unnecessary since Strings are immutable.”

下面从等价原则说明上述两种构建字符串方法，在程序背后进行了什么：

JDK1.8文档中对 String str = "abc"进行了以下解释：

对于 String string = new String("abc") 等价于

String original = "abc";

String s = new String(original);

这也说明了，在IDEA中提示采用String string = new String("abc")构建字符串的多余性，因为String original = "abc"已经构建了，没有必要在进行new新建。

3.3.3、public String(char value[])与public String(char value[], int offset, int count)


    /**
     * Allocates a new {@code String} so that it represents the sequence of
     * characters currently contained in the character array argument. The
     * contents of the character array are copied; subsequent modification of
     * the character array does not affect the newly created string.
     *
     * @param  value
     *         The initial value of the string
     */
    public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

public String(char value[])直接调用Arrays.copyOf（）进行数组复制，创建字符串。 public String(char value[], int offset, int count)调用了Arrays.copyOfRange()进行复制创建。

3.3.4、public String(StringBuffer buffer)

 /**
     * Allocates a new string that contains the sequence of characters
     * currently contained in the string buffer argument. The contents of the
     * string buffer are copied; subsequent modification of the string buffer
     * does not affect the newly created string.
     *
     * @param  buffer
     *         A {@code StringBuffer}
     */
    public String(StringBuffer buffer) {
        synchronized(buffer) {
            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
        }
    }



    public String(StringBuilder builder) {
        this.value = Arrays.copyOf(builder.getValue(), builder.length());
    }

对比发现 public String(StringBuffer buffer)有synchronized（buffer）进行加锁处理，因为StringBuilder是线程安全的，为了保证一致性，这里可以看出设计师在进行设计时也进行了加锁处理。相比而下 public String(StringBuilder builder)没有进行加锁处理，因为StringBuilder 是线程不安全的，不必要在这里进行处理。

3.2、String常用方法

3.2.1、equals()

equals()的JDK文档这样说：“Compares this string to the specified object. The result is true if and only if the argument is not null and is a String object that represents the same sequence of characters as this object.”。equals()的设计思路比较简单，但非常清晰实用，思路主要是：从大到小，逐步缩小比较范围。首先简单的用“==”比较两个应用是否引用的同一个对象；接着从大向小开始比较，比较两个对象是否具有可比性，然后两个的长度，最后逐个字符比较。