每一篇文章都属于作者的劳动成果,尊重原创!尊重知识!从我做起。禁止一切形式的转载、抄袭、高相似度抄袭、借鉴,谢谢合作。
java源码赏析专栏会选取JDK1.8 、JDK11、JDK15的源码进行对比,比较每个jdk版本所作的改进,以及每份源码的设计理念。
碎碎念
String是java中处理字符串的有力“兵器”类,我们需要对这个被反复使用的类仔细研究,才能榨干每个工具的价值,才能学会更高超的编程技巧。
toCharArray()这个方法将String的每个字符按照顺序放在一个char[]数组中返回,特别实用。
虽然我们可用String自带的indexOf(int[] index)方法来遍历每个字符元素,但是对于习惯使用数组操作的同学来说,总感觉差了点感觉;如果能将字符串的每个字符串放入数组中遍历处理就更得心应手了,因此才有了toCharArray()方法。
看过一点String源码的同学都知道,String的每个char元素是存放在一个final修饰的char[]数组中的,所以char[] toCharArray()方法逻辑就更好处理了。下面来看看吧。
jdk 1.8
源码
/**
* Converts this string to a new character array.
*
* @return a newly allocated character array whose length is the length
* of this string and whose contents are initialized to contain
* the character sequence represented by this string.
*/
public char[] toCharArray() {
// Cannot use Arrays.copyOf because of class initialization order issues
char result[] = new char[value.length];
System.arraycopy(value, 0, result, 0, value.length);
return result;
}
赏析
1.8版本中的实现很简单,就是将String中存储char序列(字符序列)的char数组(char[])直接复制给一个新的char数组,其实这个value就是被private final char[]修饰的那个存储字符序列的成员变量;然后返回。这个返回的数组result是完全全新的,意味着你可以对返回的数组任意操作,与String中的char数组没有关联。
我们注意到方法的开头有一个注释,意思是用Sytem.arraycopy()而没有用Array.copyOf()进行复制数组操作,是因为会有类初始化的顺序问题。我的理解是在JVM启动的时候是需要使用到String的toCharArray()方法的,此时可能还没有把Array这个类初始化(类加载到堆内存之类的)但是System这个类已经初始化了。以后碰到在深究此问题。
jdk 11、jdk 15
源码
/**
* Converts this string to a new character array.
*
* @return a newly allocated character array whose length is the length
* of this string and whose contents are initialized to contain
* the character sequence represented by this string.
*/
public char[] toCharArray() {
return isLatin1() ? StringLatin1.toChars(value)
: StringUTF16.toChars(value);
}
private boolean isLatin1() {
return COMPACT_STRINGS && coder == LATIN1;
}
/**
* If String compaction is disabled, the bytes in {@code value} are
* always encoded in UTF16.
*
* For methods with several possible implementation paths, when String
* compaction is disabled, only one code path is taken.
*
* The instance field value is generally opaque to optimizing JIT
* compilers. Therefore, in performance-sensitive place, an explicit
* check of the static boolean {@code COMPACT_STRINGS} is done first
* before checking the {@code coder} field since the static boolean
* {@code COMPACT_STRINGS} would be constant folded away by an
* optimizing JIT compiler. The idioms for these cases are as follows.
*
* For code such as:
*
* if (coder == LATIN1) { ... }
*
* can be written more optimally as
*
* if (coder() == LATIN1) { ... }
*
* or:
*
* if (COMPACT_STRINGS && coder == LATIN1) { ... }
*
* An optimizing JIT compiler can fold the above conditional as:
*
* COMPACT_STRINGS == true => if (coder == LATIN1) { ... }
* COMPACT_STRINGS == false => if (false) { ... }
*
* @implNote
* The actual value for this field is injected by JVM. The static
* initialization block is used to set the value here to communicate
* that this static final field is not statically foldable, and to
* avoid any possible circular dependency during vm initialization.
*/
static final boolean COMPACT_STRINGS;
static {
COMPACT_STRINGS = true;
}
/**
* The identifier of the encoding used to encode the bytes in
* {@code value}. The supported values in this implementation are
*
* LATIN1
* UTF16
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*/
private final byte coder;
@Native static final byte LATIN1 = 0;
@Native static final byte UTF16 = 1;
赏析
特别需要注意的是jdk11之后String类中的value类型由char[]变成了byte[]
/**
* The value is used for character storage.
*
* @implNote This field is trusted by the VM, and is a subject to
* constant folding if String instance is constant. Overwriting this
* field after construction will cause problems.
*
* Additionally, it is marked with {@link Stable} to trust the contents
* of the array. No other facility in JDK provides this functionality (yet).
* {@link Stable} is safe here, because value is never null.
*/
@Stable
private final byte[] value;
jdk11与jdk15对这个方法都是相同的实现,我们来看下相对与jdk1.8,都做了哪些优化。
其实首先是分了两种编码方式:LATIN1和UTF16;它是怎么判断采用的那种编码呢?
首先是检查COMPACT_STRINGS这个布尔常量,为true则代表char[]中的字节被压缩过,就是有可能是LATIN1这种编码,否则就是UTF16这种形式的编码。compaction有压缩的意思。
COMPACT_STRINGS在类加载时会进行初始化,就是那个静态代码块,会自动置为true,也可以通过JVM进行注入的方式来更改(我自己的理解)。
当COMPACT_STRINGS为true时,还需要判断coder == LATIN1是否为true,LATIN1一开始就被初始化为0,而coder会在类构造方法中被赋值,会根据实际情况被赋予0或1,由于coder被final修饰,所以不可修改值。
总之就是会根据String对象设定的编码格式(LATIN1或者UTF16来分别调用两个工具类的复制算法)。下节我们继续讲解这个StringLatin1.toChars(char[] value)方法和StringUTF16.toChars(char[] value)方法。
value是被private final char[]修饰的那个存储String对象中字符序列的成员变量