String类的toCharArray方法【源码赏析】

最新推荐文章于 2024-06-24 13:38:53 发布

Mr. 良爷

最新推荐文章于 2024-06-24 13:38:53 发布

阅读量2.2k

点赞数 5

分类专栏： # String类（魔法伤害）【源码赏析】JAVA中的精彩绝伦文章标签： java 字符串

本文链接：https://blog.csdn.net/liangcheng0523/article/details/112341776

版权

【源码赏析】JAVA中的精彩绝伦同时被 2 个专栏收录

4 篇文章

订阅专栏

String类（魔法伤害）

3 篇文章

订阅专栏

String类的toCharArray方法【源码赏析】

碎碎念
jdk 1.8
- 源码
- 赏析
jdk 11、jdk 15
- 源码
- 赏析

每一篇文章都属于作者的劳动成果，尊重原创！尊重知识！从我做起。禁止一切形式的转载、抄袭、高相似度抄袭、借鉴，谢谢合作。

java源码赏析专栏会选取JDK1.8 、JDK11、JDK15的源码进行对比，比较每个jdk版本所作的改进，以及每份源码的设计理念。

碎碎念

String是java中处理字符串的有力“兵器”类，我们需要对这个被反复使用的类仔细研究，才能榨干每个工具的价值，才能学会更高超的编程技巧。
toCharArray()这个方法将String的每个字符按照顺序放在一个char[]数组中返回，特别实用。
虽然我们可用String自带的indexOf(int[] index)方法来遍历每个字符元素，但是对于习惯使用数组操作的同学来说，总感觉差了点感觉；如果能将字符串的每个字符串放入数组中遍历处理就更得心应手了，因此才有了toCharArray()方法。
看过一点String源码的同学都知道，String的每个char元素是存放在一个final修饰的char[]数组中的，所以char[] toCharArray()方法逻辑就更好处理了。下面来看看吧。

jdk 1.8

源码

/**
     * Converts this string to a new character array.
     *
     * @return  a newly allocated character array whose length is the length
     *          of this string and whose contents are initialized to contain
     *          the character sequence represented by this string.
     */
    public char[] toCharArray() {
        // Cannot use Arrays.copyOf because of class initialization order issues
        char result[] = new char[value.length];
        System.arraycopy(value, 0, result, 0, value.length);
        return result;
    }

赏析

1.8版本中的实现很简单，就是将String中存储char序列（字符序列）的char数组（char[]）直接复制给一个新的char数组，其实这个value就是被private final char[]修饰的那个存储字符序列的成员变量；然后返回。这个返回的数组result是完全全新的，意味着你可以对返回的数组任意操作，与String中的char数组没有关联。

我们注意到方法的开头有一个注释，意思是用Sytem.arraycopy()而没有用Array.copyOf()进行复制数组操作，是因为会有类初始化的顺序问题。我的理解是在JVM启动的时候是需要使用到String的toCharArray()方法的，此时可能还没有把Array这个类初始化（类加载到堆内存之类的）但是System这个类已经初始化了。以后碰到在深究此问题。

jdk 11、jdk 15

源码

/**
     * Converts this string to a new character array.
     *
     * @return  a newly allocated character array whose length is the length
     *          of this string and whose contents are initialized to contain
     *          the character sequence represented by this string.
     */
    public char[] toCharArray() {
        return isLatin1() ? StringLatin1.toChars(value)
                          : StringUTF16.toChars(value);
    }

	private boolean isLatin1() {
        return COMPACT_STRINGS && coder == LATIN1;
    }
	/**
     * If String compaction is disabled, the bytes in {@code value} are
     * always encoded in UTF16.
     *
     * For methods with several possible implementation paths, when String
     * compaction is disabled, only one code path is taken.
     *
     * The instance field value is generally opaque to optimizing JIT
     * compilers. Therefore, in performance-sensitive place, an explicit
     * check of the static boolean {@code COMPACT_STRINGS} is done first
     * before checking the {@code coder} field since the static boolean
     * {@code COMPACT_STRINGS} would be constant folded away by an
     * optimizing JIT compiler. The idioms for these cases are as follows.
     *
     * For code such as:
     *
     *    if (coder == LATIN1) { ... }
     *
     * can be written more optimally as
     *
     *    if (coder() == LATIN1) { ... }
     *
     * or:
     *
     *    if (COMPACT_STRINGS && coder == LATIN1) { ... }
     *
     * An optimizing JIT compiler can fold the above conditional as:
     *
     *    COMPACT_STRINGS == true  => if (coder == LATIN1) { ... }
     *    COMPACT_STRINGS == false => if (false)           { ... }
     *
     * @implNote
     * The actual value for this field is injected by JVM. The static
     * initialization block is used to set the value here to communicate
     * that this static final field is not statically foldable, and to
     * avoid any possible circular dependency during vm initialization.
     */
    static final boolean COMPACT_STRINGS;
    
	static {
        COMPACT_STRINGS = true;
    }
    /**
     * The identifier of the encoding used to encode the bytes in
     * {@code value}. The supported values in this implementation are
     *
     * LATIN1
     * UTF16
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     */
    private final byte coder;
    @Native static final byte LATIN1 = 0;
    @Native static final byte UTF16  = 1;

赏析

特别需要注意的是jdk11之后String类中的value类型由char[]变成了byte[]

/**
     * The value is used for character storage.
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     *
     * Additionally, it is marked with {@link Stable} to trust the contents
     * of the array. No other facility in JDK provides this functionality (yet).
     * {@link Stable} is safe here, because value is never null.
     */
    @Stable
    private final byte[] value;

jdk11与jdk15对这个方法都是相同的实现，我们来看下相对与jdk1.8，都做了哪些优化。

其实首先是分了两种编码方式：LATIN1和UTF16；它是怎么判断采用的那种编码呢？

首先是检查COMPACT_STRINGS这个布尔常量，为true则代表char[]中的字节被压缩过，就是有可能是LATIN1这种编码，否则就是UTF16这种形式的编码。compaction有压缩的意思。

COMPACT_STRINGS在类加载时会进行初始化，就是那个静态代码块，会自动置为true，也可以通过JVM进行注入的方式来更改（我自己的理解）。

当COMPACT_STRINGS为true时，还需要判断coder == LATIN1是否为true，LATIN1一开始就被初始化为0，而coder会在类构造方法中被赋值，会根据实际情况被赋予0或1，由于coder被final修饰，所以不可修改值。

总之就是会根据String对象设定的编码格式（LATIN1或者UTF16来分别调用两个工具类的复制算法）。下节我们继续讲解这个StringLatin1.toChars(char[] value)方法和StringUTF16.toChars(char[] value)方法。