每一篇文章都属于作者的劳动成果,尊重原创!尊重知识!从我做起。禁止一切形式的转载、抄袭、高相似度抄袭、借鉴,谢谢合作。
java源码赏析专栏会选取JDK1.8 、JDK11、JDK15的源码进行对比,比较每个jdk版本所作的改进,以及每份源码的设计理念。
碎碎念
String的构造方法涉及到对final value的初始化,所以有必要了解一下其初始化的过程,以及它都接收哪些实例化方式。
String是java中处理字符串的有力“兵器”类,我们需要对这个被反复使用的类仔细研究,才能榨干每个工具的价值,才能学会更高超的编程技巧。
由于String的主要成员变量value在jdk1.8之后发生了类型的更换,因此构造方法也会发生很大的更换。
jdk 1.8
源码 & 赏析
构造方法1:
public String() {
this.value = "".value;
}
public String(byte bytes[]) {
this(bytes, 0, bytes.length);
}
public String(byte bytes[], int offset, int length) {
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(bytes, offset, length);
}
构造方法1的解析:
使用的是一个固定的解码方式将byte进行解码char。
构造方法2:
public String(byte bytes[], String charsetName)
throws UnsupportedEncodingException {
this(bytes, 0, bytes.length, charsetName);
}
public String(byte bytes[], int offset, int length, String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null)
throw new NullPointerException("charsetName");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charsetName, bytes, offset, length);
}
public String(byte bytes[], Charset charset) {
this(bytes, 0, bytes.length, charset);
}
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charset, bytes, offset, length);
}
构造方法2的解析:
调用StringCode类的decode方法将byte解码成Char类型。
构造方法3:
public String(char value[]) {
this.value = Arrays.copyOf(value, value.length);
}
String(char[] value, boolean share) {
// assert share : "unshared not supported";
this.value = value;
}
public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= value.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
public String(int[] codePoints, int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= codePoints.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > codePoints.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
final int end = offset + count;
// Pass 1: Compute precise size of char[]
int n = count;
for (int i = offset; i < end; i++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
continue;
else if (Character.isValidCodePoint(c))
n++;
else throw new IllegalArgumentException(Integer.toString(c));
}
// Pass 2: Allocate and fill in char[]
final char[] v = new char[n];
for (int i = offset, j = 0; i < end; i++, j++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
v[j] = (char)c;
else
Character.toSurrogates(c, v, j++);
}
this.value = v;
}
构造方法3的解析:
由于String中本身存储的就是char[],因此对于char[]型输入更加方便,只需要进行数组的复制即可,无需担心编码格式的限制。
构造方法4:
public String(int[] codePoints, int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= codePoints.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > codePoints.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
final int end = offset + count;
// Pass 1: Compute precise size of char[]
int n = count;
for (int i = offset; i < end; i++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
continue;
else if (Character.isValidCodePoint(c))
n++;
else throw new IllegalArgumentException(Integer.toString(c));
}
// Pass 2: Allocate and fill in char[]
final char[] v = new char[n];
for (int i = offset, j = 0; i < end; i++, j++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
v[j] = (char)c;
else
Character.toSurrogates(c, v, j++);
}
this.value = v;
}
构造方法4的解析:
将int[]型转化为char[]型,并未考虑编码格式的影响,只需要将int转化为char即可。很大一部分都只是进行边界校验抛出异常。
构造方法5:
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
public String(StringBuffer buffer) {
synchronized(buffer) {
this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
}
}
public String(StringBuilder builder) {
this.value = Arrays.copyOf(builder.getValue(), builder.length());
}
构造方法5的解析:
StringBuffer参数传入的构造方法进行了同步锁操作,避免在进行复制的时候由于线程不安全造成数据不一致。
StringBuilder没有特殊的地方,也是将对于的数组进行复制。
jdk 11 & jdk 15
源码 & 赏析
构造方法1:
public String() {
this.value = "".value;
this.coder = "".coder;
}
构造方法1的解析:
这个构造方法等同于没有进行构造,直接使用空字符串的数据;由于value的值一旦初始化就无法进行修改,因此该方法实用意义并不大。
构造方法2:
public String(byte[] bytes) {
this(bytes, 0, bytes.length);
}
构造方法2的解析:
该方法只传入了一个byte数组,复用了【】方法。
构造方法3:
String(byte[] value, byte coder) {
this.value = value;
this.coder = coder;
}
构造方法3的解析:
该方法传入了byte数组,同时选择了一个编码方式。
构造方法4:
@Deprecated(since="1.1")
public String(byte ascii[], int hibyte) {
this(ascii, hibyte, 0, ascii.length);
}
@Deprecated(since="1.1")
public String(byte ascii[], int hibyte, int offset, int count) {
checkBoundsOffCount(offset, count, ascii.length);
if (count == 0) {
this.value = "".value;
this.coder = "".coder;
return;
}
if (COMPACT_STRINGS && (byte)hibyte == 0) {
this.value = Arrays.copyOfRange(ascii, offset, offset + count);
this.coder = LATIN1;
} else {
hibyte <<= 8;
byte[] val = StringUTF16.newBytesFor(count);
for (int i = 0; i < count; i++) {
StringUTF16.putChar(val, i, hibyte | (ascii[offset++] & 0xff));
}
this.value = val;
this.coder = UTF16;
}
}
构造方法4的解析:
这两个方法建议弃用,不再解读。
构造方法5:
public String(byte bytes[], int offset, int length) {
checkBoundsOffCount(offset, length, bytes.length);
StringCoding.Result ret = StringCoding.decode(bytes, offset, length);
this.value = ret.value;
this.coder = ret.coder;
}
//额外方法:
static void checkBoundsOffCount(int offset, int count, int length) {
if (offset < 0 || count < 0 || offset > length - count) {
throw new StringIndexOutOfBoundsException(
"offset " + offset + ", count " + count + ", length " + length);
}
}
构造方法5的解析:
传输Byte数组,并未自定编码格式,将会使用StringCoding类进行解码返回Result内部类对象,将解码的byte[]以及编码格式coder赋值到
String成员变量
构造方法6:
public String(byte bytes[], int offset, int length, String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null)
throw new NullPointerException("charsetName");
checkBoundsOffCount(offset, length, bytes.length);
StringCoding.Result ret =
StringCoding.decode(charsetName, bytes, offset, length);
this.value = ret.value;
this.coder = ret.coder;
}
构造方法6的解析:
使用了自己指定的CharsetName,在解析解码的时候将不使用默认的编码格式进行解码。
构造方法7:
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBoundsOffCount(offset, length, bytes.length);
StringCoding.Result ret =
StringCoding.decode(charset, bytes, offset, length);
this.value = ret.value;
this.coder = ret.coder;
}
构造方法7的解析:
与方法7差不多,只是标时charset时使用的
构造方法8:
public String(byte bytes[], String charsetName)
throws UnsupportedEncodingException {
this(bytes, 0, bytes.length, charsetName);
}
public String(byte bytes[], Charset charset) {
this(bytes, 0, bytes.length, charset);
}
构造方法8的解析:
复用了6、7方法,将初始的坐标设置为了0。
构造方法9:
public String(char value[]) {
this(value, 0, value.length, null);
}
public String(char value[], int offset, int count) {
this(value, offset, count, rangeCheck(value, offset, count));
}
String(char[] value, int off, int len, Void sig) {
if (len == 0) {
this.value = "".value;
this.coder = "".coder;
return;
}
if (COMPACT_STRINGS) {
byte[] val = StringUTF16.compress(value, off, len);
if (val != null) {
this.value = val;
this.coder = LATIN1;
return;
}
}
this.coder = UTF16;
this.value = StringUTF16.toBytes(value, off, len);
}
构造方法9的解析:
直接根据COMPACT_STRINGS(是否支持压缩)来判断是采用LATIN1还是采用UTF16来进行编码。
构造方法10:
public String(int[] codePoints, int offset, int count) {
checkBoundsOffCount(offset, count, codePoints.length);
if (count == 0) {
this.value = "".value;
this.coder = "".coder;
return;
}
if (COMPACT_STRINGS) {
byte[] val = StringLatin1.toBytes(codePoints, offset, count);
if (val != null) {
this.coder = LATIN1;
this.value = val;
return;
}
}
this.coder = UTF16;
this.value = StringUTF16.toBytes(codePoints, offset, count);
}
构造方法10的解析:
将int[]类型根据编码格式要求转化为byte[]类型,offset与count这连个参数的意思是value变量从offset下标一直到offset+count-1下标的所有元素均需要被转化。
构造方法11:
public String(StringBuilder builder) {
this(builder, null);
}
String(AbstractStringBuilder asb, Void sig) {
byte[] val = asb.getValue();
int length = asb.length();
if (asb.isLatin1()) {
this.coder = LATIN1;
this.value = Arrays.copyOfRange(val, 0, length);
} else {
if (COMPACT_STRINGS) {
byte[] buf = StringUTF16.compress(val, 0, length);
if (buf != null) {
this.coder = LATIN1;
this.value = buf;
return;
}
}
this.coder = UTF16;
this.value = Arrays.copyOfRange(val, 0, length << 1);
}
}
构造方法11的解析:
该构造方法为参数是StringBuilder类型提供了方便,不需要将StringBuilder转化为String对象造成系统开销。复制的逻辑也是分编码格式,和之前的方法没有什么大的差别。
构造方法12:
public String(StringBuffer buffer) {
this(buffer.toString());
}
@HotSpotIntrinsicCandidate
public String(String original) {
this.value = original.value;
this.coder = original.coder;
this.hash = original.hash;
}
构造方法12的解析:
StringBuffer使用将调用StringBuffer的方法生成String,然后分别将String的value、coder、hash进行初始化。其实我们平常使用的String str = new String(“this”); JVM会将This转化为一个内部的String对象,这个对象的value是“This”。
比较总结
String在jdk1.8的时候就已经支持byte型数据转化为char类型数据了,只不过采用了比较繁多的编码格式进行转换,到了jdk11之后进行了简化格式的分类(LATIN1和UTF16两种)。
String在JDK11之后将构造方法中很多边界检查单独提取出来成为方法,简洁了代码。
jdk1.8之后StringBuffer类型参数的构造方法去掉了同步代码块操作。