String类的toCharArray方法【源码赏析】

String类的toCharArray方法【源码赏析】

每一篇文章都属于作者的劳动成果,尊重原创!尊重知识!从我做起。禁止一切形式的转载、抄袭、高相似度抄袭、借鉴,谢谢合作。

java源码赏析专栏会选取JDK1.8 、JDK11、JDK15的源码进行对比,比较每个jdk版本所作的改进,以及每份源码的设计理念。

碎碎念

String是java中处理字符串的有力“兵器”类,我们需要对这个被反复使用的类仔细研究,才能榨干每个工具的价值,才能学会更高超的编程技巧。
toCharArray()这个方法将String的每个字符按照顺序放在一个char[]数组中返回,特别实用。
虽然我们可用String自带的indexOf(int[] index)方法来遍历每个字符元素,但是对于习惯使用数组操作的同学来说,总感觉差了点感觉;如果能将字符串的每个字符串放入数组中遍历处理就更得心应手了,因此才有了toCharArray()方法。
看过一点String源码的同学都知道,String的每个char元素是存放在一个final修饰的char[]数组中的,所以char[] toCharArray()方法逻辑就更好处理了。下面来看看吧。

jdk 1.8

源码

/**
     * Converts this string to a new character array.
     *
     * @return  a newly allocated character array whose length is the length
     *          of this string and whose contents are initialized to contain
     *          the character sequence represented by this string.
     */
    public char[] toCharArray() {
        // Cannot use Arrays.copyOf because of class initialization order issues
        char result[] = new char[value.length];
        System.arraycopy(value, 0, result, 0, value.length);
        return result;
    }

赏析

1.8版本中的实现很简单,就是将String中存储char序列(字符序列)的char数组(char[])直接复制给一个新的char数组,其实这个value就是被private final char[]修饰的那个存储字符序列的成员变量;然后返回。这个返回的数组result是完全全新的,意味着你可以对返回的数组任意操作,与String中的char数组没有关联。

我们注意到方法的开头有一个注释,意思是用Sytem.arraycopy()而没有用Array.copyOf()进行复制数组操作,是因为会有类初始化的顺序问题。我的理解是在JVM启动的时候是需要使用到String的toCharArray()方法的,此时可能还没有把Array这个类初始化(类加载到堆内存之类的)但是System这个类已经初始化了。以后碰到在深究此问题。

jdk 11、jdk 15

源码

/**
     * Converts this string to a new character array.
     *
     * @return  a newly allocated character array whose length is the length
     *          of this string and whose contents are initialized to contain
     *          the character sequence represented by this string.
     */
    public char[] toCharArray() {
        return isLatin1() ? StringLatin1.toChars(value)
                          : StringUTF16.toChars(value);
    }

	private boolean isLatin1() {
        return COMPACT_STRINGS && coder == LATIN1;
    }
	/**
     * If String compaction is disabled, the bytes in {@code value} are
     * always encoded in UTF16.
     *
     * For methods with several possible implementation paths, when String
     * compaction is disabled, only one code path is taken.
     *
     * The instance field value is generally opaque to optimizing JIT
     * compilers. Therefore, in performance-sensitive place, an explicit
     * check of the static boolean {@code COMPACT_STRINGS} is done first
     * before checking the {@code coder} field since the static boolean
     * {@code COMPACT_STRINGS} would be constant folded away by an
     * optimizing JIT compiler. The idioms for these cases are as follows.
     *
     * For code such as:
     *
     *    if (coder == LATIN1) { ... }
     *
     * can be written more optimally as
     *
     *    if (coder() == LATIN1) { ... }
     *
     * or:
     *
     *    if (COMPACT_STRINGS && coder == LATIN1) { ... }
     *
     * An optimizing JIT compiler can fold the above conditional as:
     *
     *    COMPACT_STRINGS == true  => if (coder == LATIN1) { ... }
     *    COMPACT_STRINGS == false => if (false)           { ... }
     *
     * @implNote
     * The actual value for this field is injected by JVM. The static
     * initialization block is used to set the value here to communicate
     * that this static final field is not statically foldable, and to
     * avoid any possible circular dependency during vm initialization.
     */
    static final boolean COMPACT_STRINGS;
    
	static {
        COMPACT_STRINGS = true;
    }
    /**
     * The identifier of the encoding used to encode the bytes in
     * {@code value}. The supported values in this implementation are
     *
     * LATIN1
     * UTF16
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     */
    private final byte coder;
    @Native static final byte LATIN1 = 0;
    @Native static final byte UTF16  = 1;

赏析

特别需要注意的是jdk11之后String类中的value类型由char[]变成了byte[]

/**
     * The value is used for character storage.
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     *
     * Additionally, it is marked with {@link Stable} to trust the contents
     * of the array. No other facility in JDK provides this functionality (yet).
     * {@link Stable} is safe here, because value is never null.
     */
    @Stable
    private final byte[] value;

jdk11与jdk15对这个方法都是相同的实现,我们来看下相对与jdk1.8,都做了哪些优化。

其实首先是分了两种编码方式:LATIN1和UTF16;它是怎么判断采用的那种编码呢?

首先是检查COMPACT_STRINGS这个布尔常量,为true则代表char[]中的字节被压缩过,就是有可能是LATIN1这种编码,否则就是UTF16这种形式的编码。compaction有压缩的意思。

COMPACT_STRINGS在类加载时会进行初始化,就是那个静态代码块,会自动置为true,也可以通过JVM进行注入的方式来更改(我自己的理解)。

当COMPACT_STRINGS为true时,还需要判断coder == LATIN1是否为true,LATIN1一开始就被初始化为0,而coder会在类构造方法中被赋值,会根据实际情况被赋予0或1,由于coder被final修饰,所以不可修改值。

总之就是会根据String对象设定的编码格式(LATIN1或者UTF16来分别调用两个工具类的复制算法)。下节我们继续讲解这个StringLatin1.toChars(char[] value)方法和StringUTF16.toChars(char[] value)方法。

value是被private final char[]修饰的那个存储String对象中字符序列的成员变量

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Mr. 良爷

您每一分的打赏都是对原创的鼓励

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值