Java基础01---String详解

最新推荐文章于 2024-07-22 14:15:06 发布

Johnson8702

最新推荐文章于 2024-07-22 14:15:06 发布

阅读量140

点赞数

文章标签： java

本文链接：https://blog.csdn.net/Johnson8702/article/details/120223793

版权

String作为日常开发中最长用到的类之一，每个开发人员都应该深刻理解String底层的一些机制，包括特性、对比逻辑等等。

本文从底层原理，对String进行梳理。

一、源码

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    private static final long serialVersionUID = -6849794470754667710L;

    /**
     * Class String is special cased within the Serialization Stream Protocol.
     *
     * A String instance is written into an ObjectOutputStream according to
     * <a href="{@docRoot}/../platform/serialization/spec/output.html">
     * Object Serialization Specification, Section 6.2, "Stream Elements"</a>
     */
    private static final ObjectStreamField[] serialPersistentFields =
        new ObjectStreamField[0];

    /**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = "".value;
    }

    /**
     * Initializes a newly created {@code String} object so that it represents
     * the same sequence of characters as the argument; in other words, the
     * newly created string is a copy of the argument string. Unless an
     * explicit copy of {@code original} is needed, use of this constructor is
     * unnecessary since Strings are immutable.
     *
     * @param  original
     *         A {@code String}
     */
    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }
    ......
}

以上是String的一部分源码，从源码可以看出：

1. String类被final修饰，不能被继承，String对象一旦创建，不能被修改；

2. 继承自 Serializable、Comparable、CharSequence；

2. 底层是一个字符数组，由char[] value字符数组、offset起始下标、count字符长度组成---不同的String对象底层可能共享一个字符数组，比如 helloworld 和 hello 和 world；

二、内存

主要分为JVM字符串常量池、堆内存、栈内存。

字符创常量池（堆内存的一部分）：存放字面量；

堆内存：存放new出来的对象；

栈内存：存放new出来的对象的引用；

案例1

String s1 = "aaa";

创建了2个对象：常量池中的字符数组对象 "aaa"；栈内存中字符数组的引用s1；

案例2

String s2 = new String("bbb");

创建了3个对象：常量池中的字符数组对象 "bbb"；堆内存中new出来的字符串对象；字符串对象的引用s2；

三、字符串的创建

1. 字面量

String s1 = "aaa";

2. new方式创建

String s2 = new String("bbb");

3. 拼接截取等操作

String s3 = s2.substring(0, 2);
String s4 = s2 + "ccc";

四、常用操作

1. 判断相等

判断两个字符串是否相等，有些只能使用equals()方法。

== 一定不能比较两个字符串是否相等，只能比较两个字符串是否存放在同一个位置。

2. 判空判null

2.1 判空

""，是长度为0的字符串，有自己的长度(0)和内容("")。

判断方法：str.length() == 0 或者 str.equals("")。

2.2 判断null

表示没有任何对象与该对象关联。

判断方法：str == null

3. 非空非null判断

3.1 常规方法

str != null && str.length() != 0

3.2 借用工具类

Strings.isNullOrEmpty(str)

4. 拼接

4.1 "+"连接符拼接

Java语言对"+"拼接符提供了特殊的支持，字符串对象可以通过"+"连接符连接其他对象。底层使用StringBuilder的append方法，每次调用"+"连接符进行拼接时，都会创建一个新的StringBuilder对象，调用append方法将"+"前后的数据进行拼接，拼接后的对象调用toString方法返回一个字符串。

案例1

String s5 = "hello" + "world";

4.2 append()方法

手动创建StringBuilder对象，调用StringBuilder对象的append方法，完成拼接操作，最后调用toString方法，返回一个字符串。

案例1

StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("hello");
stringBuilder.append("world");
String s6 = stringBuilder.toString();

4.3 join()方法

以固定分隔符，将指定的字符串拼接起来。

案例1

String s7 = String.join("|", "a", "b"); // 返回 a|b

4.4 "+"连接符和append效率分析

在简单拼接字符串的操作中，使用"+"连接符会很方便，底层会自动完成对象的创建、拼接、转换，最后返回拼接之后的字符串；但是在执行大量循环拼接的操作中，使用"+"连接符，效率就不是很高，因为每循环拼接一次都需要执行下面三步：

①底层都会创建一个StringBuilder对象；

②调用append方法拼接；

③调用toString方法将StringBuilder对象转成String对象；

这种情况下，可以在循环体外面创建一个StringBuilder对象，在循环体内部调用append方法；循环结束后，调用toString方法将StringBuilder对象转成String对象。

两种方式耗时实验对比如下：

实验①

创建1000个对象，每个对象循环1000次，拼接固定字符串"a"；分别使用"+"连接符、和使用StringBuilder的append方法拼接两种方式；执行10次测试，计算平均耗时ms

for (int i = 0; i < 10; i++) {
    long t1 = System.currentTimeMillis();
    for (int j = 0; j < 1000; j++) {
        String s = "";
        for (int k = 0; k < 1000; k++) {
            s += "a";
        }
    }
    long t2 = System.currentTimeMillis();

    long t3 = System.currentTimeMillis();
    for (int j = 0; j < 1000; j++) {
        StringBuilder stringBuilder = new StringBuilder();
        for (int k = 0; k < 1000; k++) {
            stringBuilder.append("a");
        }
        String s = stringBuilder.toString();
    }
    long t4 = System.currentTimeMillis();

    System.out.println("第" + i + "轮测试");
    System.out.println("+连接符耗时：" + (t2 - t1));
    System.out.println("append方法耗时：" + (t4 - t3));
    System.out.println();
}

测试轮数	"+"拼接符	append方法
1	226	10
2	168	21
3	264	10
4	251	12
5	155	6
6	128	8
7	125	7
8	108	6
9	96	5
10	102	6
Average	162.3	9.1

经过对比，使用"+"连接符耗时是append方法的18倍。

实验②

创建一个对象，循环100000次，拼接固定字符串"a"；分别使用"+"连接符、和使用StringBuilder的append方法拼接两种方式；执行10次测试，计算平均耗时ms

测试轮数	"+"拼接符	append方法
1	3883	1
2	944	0
3	829	0
4	828	0
5	853	0
6	845	0
7	835	1
8	846	0
9	840	2
10	837	0
Average	1154	0.4

经过对比，使用"+"连接符耗时是append方法的2885倍。

综上，在需要多次循环拼接字符串时，使用StringBuilder的append方法可以大大提高效率。

4.5 特殊场景下的"+"连接符拼接

①字符串直接相加

String s = "hello" + "wolrd";

此时，直接返回"helloworld"，在编译的时候已经确定两个字符串的值，编译时可以做优化，直接返回拼接后的值；

②编译期间确定值

String s1 = "ab";
final String b = "b";
String s2 = "a" + b;
System.out.println( s1 == s2 ); // 返回true

b用final修饰，在编译期间已经确定s2的两个拼接值，直接返回拼接后的结果"ab"，s1和s2两个对象指向同一个地址，所以返回true。

③编译期间不确定值

String s1 = "ab";
final String b = getString();
String s2 = "a" + b;
System.out.println(s1 == s2); // 返回false
public static String getString(){
	return "b";
}

虽然b也用final修饰，但是b是用方法获取到值，在编译期间无法确定值，只能在运行期间才能确定值，所以没办法优化，s1和s2两个对象指定不同的地址，所以返回false。

5. 截取

5.1 substring(int beginIndex)

public String substring(int beginIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    int subLen = value.length - beginIndex;
    if (subLen < 0) {
        throw new StringIndexOutOfBoundsException(subLen);
    }
    return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
}

只有起始下标，没有结束下标；

如果起始下标小于0，抛出异常；

字符数组的长度-起始下标作为新字符串的长度；如果新字符串长度小于0，抛出异常；

如果起始下标等于0，直接返回原字符串；如果不等于0，返回新的字符串，由原字符数组、起始下标和字符长度组成；和原字符串共享字符数组；

5.2 substring(int beginIndex, int endIndex)

public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > value.length) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    int subLen = endIndex - beginIndex;
    if (subLen < 0) {
        throw new StringIndexOutOfBoundsException(subLen);
    }
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
            : new String(value, beginIndex, subLen);
}

有起始下标，也有截止下标；

如果起始下标小于0，抛出异常；

如果截止下标大于字符数组的长度，抛出异常；

截止下标 - 起始下标作为新的字符串的长度；

如果新的字符串的长度小于0，抛出异常；

如果起始下标为0，截止下标为字符数组长度，则直接返回原字符串；否则，返回截取后的新字符串，由原字符数组、起始下标、字符串长度组成；和原字符串共享字符数组；

6. intern

这个方法比较少见，用于返回字符串在字符串常量池的引用。

String s1 = "aaa";
String s2 = new String("aaa");
System.out.println(s1 == s2);  // false

s2 = s2.intern();
System.out.println(s1 == s2);  // true

第一次比较，s1表示字符串常量池中"aaa"的引用，s2表示堆内存中新创建的字符串的引用，两者指向不同，所以返回false；

第二次比较，s2被重新赋值为s2.intern()，返回的是字符串在字符串常量池中的引用，也就是字符串常量池中"aaa"的引用，和s1的引用一致，所以返回true；

五、 String、StringBuilder和StringBuffer的区别

三者主要区别在于多次拼接字符串时的不同作用，"+"连接符拼接，每循环一次创建一个对象，效率低；StringBuilder使用append，只创建一个对象，效率高；StringBuffer和StringBuilder类似，区别在于，StringBuffer多了一个synchronized，是线程安全的，但是效率会低一点；

多次循环时，三者效率：StringBuilder > StringBuffer > "+"连接符；

综上，字面量直接拼接，或者简单的拼接场景，"+"连接符效率最高；多次循环拼接，单线程不需要考虑线程安全的情况下，使用StringBuilder，效率最高；多线程需要考虑线程安全的情况下，使用StringBuffer。

至此，String底层逻辑梳理完毕，如有不严谨之处，欢迎扶正！