java strings_聊聊Java 9的Compact Strings

本文主要研究一下Java 9的Compact Strings

Compressed Strings(Java 6)

Java 6引入了Compressed Strings,对于one byte per character使用byte[],对于two bytes per character继续使用char[];之前可以使用-XX:+UseCompressedStrings来开启,不过在java7被废弃了,然后在java8被移除

Compact Strings(Java 9)

Java 9引入了Compact Strings来取代Java 6的Compressed Strings,它的实现更过彻底,完全使用byte[]来替代char[],同时新引入了一个字段coder来标识是LATIN1还是UTF16

String

java.base/java/lang/String.java

public final class String

implements java.io.Serializable, Comparable, CharSequence,

Constable, ConstantDesc {

/**

* The value is used for character storage.

*

* @implNote This field is trusted by the VM, and is a subject to

* constant folding if String instance is constant. Overwriting this

* field after construction will cause problems.

*

* Additionally, it is marked with {@link Stable} to trust the contents

* of the array. No other facility in JDK provides this functionality (yet).

* {@link Stable} is safe here, because value is never null.

*/

@Stable

private final byte[] value;

/**

* The identifier of the encoding used to encode the bytes in

* {@code value}. The supported values in this implementation are

*

* LATIN1

* UTF16

*

* @implNote This field is trusted by the VM, and is a subject to

* constant folding if String instance is constant. Overwriting this

* field after construction will cause problems.

*/

private final byte coder;

/** Cache the hash code for the string */

private int hash; // Default to 0

/** use serialVersionUID from JDK 1.0.2 for interoperability */

private static final long serialVersionUID = -6849794470754667710L;

/**

* If String compaction is disabled, the bytes in {@code value} are

* always encoded in UTF16.

*

* For methods with several possible implementation paths, when String

* compaction is disabled, only one code path is taken.

*

* The instance field value is generally opaque to optimizing JIT

* compilers. Therefore, in performance-sensitive place, an explicit

* check of the static boolean {@code COMPACT_STRINGS} is done first

* before checking the {@code coder} field since the static boolean

* {@code COMPACT_STRINGS} would be constant folded away by an

* optimizing JIT compiler. The idioms for these cases are as follows.

*

* For code such as:

*

* if (coder == LATIN1) { ... }

*

* can be written more optimally as

*

* if (coder() == LATIN1) { ... }

*

* or:

*

* if (COMPACT_STRINGS && coder == LATIN1) { ... }

*

* An optimizing JIT compiler can fold the above conditional as:

*

* COMPACT_STRINGS == true => if (coder == LATIN1) { ... }

* COMPACT_STRINGS == false => if (false) { ... }

*

* @implNote

* The actual value for this field is injected by JVM. The static

* initialization block is used to set the value here to communicate

* that this static final field is not statically foldable, and to

* avoid any possible circular dependency during vm initialization.

*/

static final boolean COMPACT_STRINGS;

static {

COMPACT_STRINGS = true;

}

/**

* Class String is special cased within the Serialization Stream Protocol.

*

* A String instance is written into an ObjectOutputStream according to

*

* Object Serialization Specification, Section 6.2, "Stream Elements"

*/

private static final ObjectStreamField[] serialPersistentFields =

new ObjectStreamField[0];

/**

* Initializes a newly created {@code String} object so that it represents

* an empty character sequence. Note that use of this constructor is

* unnecessary since Strings are immutable.

*/

public String() {

this.value = "".value;

this.coder = "".coder;

}

//......

public char charAt(int index) {

if (isLatin1()) {

return StringLatin1.charAt(value, index);

} else {

return StringUTF16.charAt(value, index);

}

}

public boolean equals(Object anObject) {

if (this == anObject) {

return true;

}

if (anObject instanceof String) {

String aString = (String)anObject;

if (coder() == aString.coder()) {

return isLatin1() ? StringLatin1.equals(value, aString.value)

: StringUTF16.equals(value, aString.value);

}

}

return false;

}

public int compareTo(String anotherString) {

byte v1[] = value;

byte v2[] = anotherString.value;

if (coder() == anotherString.coder()) {

return isLatin1() ? StringLatin1.compareTo(v1, v2)

: StringUTF16.compareTo(v1, v2);

}

return isLatin1() ? StringLatin1.compareToUTF16(v1, v2)

: StringUTF16.compareToLatin1(v1, v2);

}

public int hashCode() {

int h = hash;

if (h == 0 && value.length > 0) {

hash = h = isLatin1() ? StringLatin1.hashCode(value)

: StringUTF16.hashCode(value);

}

return h;

}

public int indexOf(int ch, int fromIndex) {

return isLatin1() ? StringLatin1.indexOf(value, ch, fromIndex)

: StringUTF16.indexOf(value, ch, fromIndex);

}

public String substring(int beginIndex) {

if (beginIndex < 0) {

throw new StringIndexOutOfBoundsException(beginIndex);

}

int subLen = length() - beginIndex;

if (subLen < 0) {

throw new StringIndexOutOfBoundsException(subLen);

}

if (beginIndex == 0) {

return this;

}

return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)

: StringUTF16.newString(value, beginIndex, subLen);

}

//......

byte coder() {

return COMPACT_STRINGS ? coder : UTF16;

}

byte[] value() {

return value;

}

private boolean isLatin1() {

return COMPACT_STRINGS && coder == LATIN1;

}

@Native static final byte LATIN1 = 0;

@Native static final byte UTF16 = 1;

//......

}

COMPACT_STRINGS默认为true,即该特性默认是开启的

coder方法判断COMPACT_STRINGS为true的话,则返回coder值,否则返回UTF16;isLatin1方法判断COMPACT_STRINGS为true且coder为LATIN1则返回true

诸如charAt、equals、hashCode、indexOf、substring等等一系列方法都依赖isLatin1方法来区分对待是StringLatin1还是StringUTF16

StringConcatFactory

实例

public class Java9StringDemo {

public static void main(String[] args){

String stringLiteral = "tom";

String stringObject = stringLiteral + "cat";

}

}

这段代码stringObject由变量stringLiteral及cat拼接而来

javap

javac src/main/java/com/example/javac/Java9StringDemo.java

javap -v src/main/java/com/example/javac/Java9StringDemo.class

Last modified 2019年4月7日; size 770 bytes

MD5 checksum fecfca9c829402c358c4d5cb948004ff

Compiled from "Java9StringDemo.java"

public class com.example.javac.Java9StringDemo

minor version: 0

major version: 56

flags: (0x0021) ACC_PUBLIC, ACC_SUPER

this_class: #4 // com/example/javac/Java9StringDemo

super_class: #5 // java/lang/Object

interfaces: 0, fields: 0, methods: 2, attributes: 3

Constant pool:

#1 = Methodref #5.#14 // java/lang/Object."":()V

#2 = String #15 // tom

#3 = InvokeDynamic #0:#19 // #0:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;

#4 = Class #20 // com/example/javac/Java9StringDemo

#5 = Class #21 // java/lang/Object

#6 = Utf8

#7 = Utf8 ()V

#8 = Utf8 Code

#9 = Utf8 LineNumberTable

#10 = Utf8 main

#11 = Utf8 ([Ljava/lang/String;)V

#12 = Utf8 SourceFile

#13 = Utf8 Java9StringDemo.java

#14 = NameAndType #6:#7 // "":()V

#15 = Utf8 tom

#16 = Utf8 BootstrapMethods

#17 = MethodHandle 6:#22 // REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;

#18 = String #23 // \u0001cat

#19 = NameAndType #24:#25 // makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;

#20 = Utf8 com/example/javac/Java9StringDemo

#21 = Utf8 java/lang/Object

#22 = Methodref #26.#27 // java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;

#23 = Utf8 \u0001cat

#24 = Utf8 makeConcatWithConstants

#25 = Utf8 (Ljava/lang/String;)Ljava/lang/String;

#26 = Class #28 // java/lang/invoke/StringConcatFactory

#27 = NameAndType #24:#32 // makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;

#28 = Utf8 java/lang/invoke/StringConcatFactory

#29 = Class #34 // java/lang/invoke/MethodHandles$Lookup

#30 = Utf8 Lookup

#31 = Utf8 InnerClasses

#32 = Utf8 (Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;

#33 = Class #35 // java/lang/invoke/MethodHandles

#34 = Utf8 java/lang/invoke/MethodHandles$Lookup

#35 = Utf8 java/lang/invoke/MethodHandles

{

public com.example.javac.Java9StringDemo();

descriptor: ()V

flags: (0x0001) ACC_PUBLIC

Code:

stack=1, locals=1, args_size=1

0: aload_0

1: invokespecial #1 // Method java/lang/Object."":()V

4: return

LineNumberTable:

line 8: 0

public static void main(java.lang.String[]);

descriptor: ([Ljava/lang/String;)V

flags: (0x0009) ACC_PUBLIC, ACC_STATIC

Code:

stack=1, locals=3, args_size=1

0: ldc #2 // String tom

2: astore_1

3: aload_1

4: invokedynamic #3, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;

9: astore_2

10: return

LineNumberTable:

line 11: 0

line 12: 3

line 13: 10

}

SourceFile: "Java9StringDemo.java"

InnerClasses:

public static final #30= #29 of #33; // Lookup=class java/lang/invoke/MethodHandles$Lookup of class java/lang/invoke/MethodHandles

BootstrapMethods:

0: #17 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;

Method arguments:

#18 \u0001cat

javap之后可以看到通过Java 9利用InvokeDynamic调用了StringConcatFactory.makeConcatWithConstants方法进行字符串拼接优化;而Java 8则是通过转换为StringBuilder来进行优化

StringConcatFactory.makeConcatWithConstants

java.base/java/lang/invoke/StringConcatFactory.java

public final class StringConcatFactory {

//......

/**

* Concatenation strategy to use. See {@link Strategy} for possible options.

* This option is controllable with -Djava.lang.invoke.stringConcat JDK option.

*/

private static Strategy STRATEGY;

/**

* Default strategy to use for concatenation.

*/

private static final Strategy DEFAULT_STRATEGY = Strategy.MH_INLINE_SIZED_EXACT;

private enum Strategy {

/**

* Bytecode generator, calling into {@link java.lang.StringBuilder}.

*/

BC_SB,

/**

* Bytecode generator, calling into {@link java.lang.StringBuilder};

* but trying to estimate the required storage.

*/

BC_SB_SIZED,

/**

* Bytecode generator, calling into {@link java.lang.StringBuilder};

* but computing the required storage exactly.

*/

BC_SB_SIZED_EXACT,

/**

* MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.

* This strategy also tries to estimate the required storage.

*/

MH_SB_SIZED,

/**

* MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.

* This strategy also estimate the required storage exactly.

*/

MH_SB_SIZED_EXACT,

/**

* MethodHandle-based generator, that constructs its own byte[] array from

* the arguments. It computes the required storage exactly.

*/

MH_INLINE_SIZED_EXACT

}

static {

// In case we need to double-back onto the StringConcatFactory during this

// static initialization, make sure we have the reasonable defaults to complete

// the static initialization properly. After that, actual users would use

// the proper values we have read from the properties.

STRATEGY = DEFAULT_STRATEGY;

// CACHE_ENABLE = false; // implied

// CACHE = null; // implied

// DEBUG = false; // implied

// DUMPER = null; // implied

Properties props = GetPropertyAction.privilegedGetProperties();

final String strategy =

props.getProperty("java.lang.invoke.stringConcat");

CACHE_ENABLE = Boolean.parseBoolean(

props.getProperty("java.lang.invoke.stringConcat.cache"));

DEBUG = Boolean.parseBoolean(

props.getProperty("java.lang.invoke.stringConcat.debug"));

final String dumpPath =

props.getProperty("java.lang.invoke.stringConcat.dumpClasses");

STRATEGY = (strategy == null) ? DEFAULT_STRATEGY : Strategy.valueOf(strategy);

CACHE = CACHE_ENABLE ? new ConcurrentHashMap<>() : null;

DUMPER = (dumpPath == null) ? null : ProxyClassesDumper.getInstance(dumpPath);

}

public static CallSite makeConcatWithConstants(MethodHandles.Lookup lookup,

String name,

MethodType concatType,

String recipe,

Object... constants) throws StringConcatException {

if (DEBUG) {

System.out.println("StringConcatFactory " + STRATEGY + " is here for " + concatType + ", {" + recipe + "}, " + Arrays.toString(constants));

}

return doStringConcat(lookup, name, concatType, false, recipe, constants);

}

private static CallSite doStringConcat(MethodHandles.Lookup lookup,

String name,

MethodType concatType,

boolean generateRecipe,

String recipe,

Object... constants) throws StringConcatException {

Objects.requireNonNull(lookup, "Lookup is null");

Objects.requireNonNull(name, "Name is null");

Objects.requireNonNull(concatType, "Concat type is null");

Objects.requireNonNull(constants, "Constants are null");

for (Object o : constants) {

Objects.requireNonNull(o, "Cannot accept null constants");

}

if ((lookup.lookupModes() & MethodHandles.Lookup.PRIVATE) == 0) {

throw new StringConcatException("Invalid caller: " +

lookup.lookupClass().getName());

}

int cCount = 0;

int oCount = 0;

if (generateRecipe) {

// Mock the recipe to reuse the concat generator code

char[] value = new char[concatType.parameterCount()];

Arrays.fill(value, TAG_ARG);

recipe = new String(value);

oCount = concatType.parameterCount();

} else {

Objects.requireNonNull(recipe, "Recipe is null");

for (int i = 0; i < recipe.length(); i++) {

char c = recipe.charAt(i);

if (c == TAG_CONST) cCount++;

if (c == TAG_ARG) oCount++;

}

}

if (oCount != concatType.parameterCount()) {

throw new StringConcatException(

"Mismatched number of concat arguments: recipe wants " +

oCount +

" arguments, but signature provides " +

concatType.parameterCount());

}

if (cCount != constants.length) {

throw new StringConcatException(

"Mismatched number of concat constants: recipe wants " +

cCount +

" constants, but only " +

constants.length +

" are passed");

}

if (!concatType.returnType().isAssignableFrom(String.class)) {

throw new StringConcatException(

"The return type should be compatible with String, but it is " +

concatType.returnType());

}

if (concatType.parameterSlotCount() > MAX_INDY_CONCAT_ARG_SLOTS) {

throw new StringConcatException("Too many concat argument slots: " +

concatType.parameterSlotCount() +

", can only accept " +

MAX_INDY_CONCAT_ARG_SLOTS);

}

String className = getClassName(lookup.lookupClass());

MethodType mt = adaptType(concatType);

Recipe rec = new Recipe(recipe, constants);

MethodHandle mh;

if (CACHE_ENABLE) {

Key key = new Key(className, mt, rec);

mh = CACHE.get(key);

if (mh == null) {

mh = generate(lookup, className, mt, rec);

CACHE.put(key, mh);

}

} else {

mh = generate(lookup, className, mt, rec);

}

return new ConstantCallSite(mh.asType(concatType));

}

private static MethodHandle generate(Lookup lookup, String className, MethodType mt, Recipe recipe) throws StringConcatException {

try {

switch (STRATEGY) {

case BC_SB:

return BytecodeStringBuilderStrategy.generate(lookup, className, mt, recipe, Mode.DEFAULT);

case BC_SB_SIZED:

return BytecodeStringBuilderStrategy.generate(lookup, className, mt, recipe, Mode.SIZED);

case BC_SB_SIZED_EXACT:

return BytecodeStringBuilderStrategy.generate(lookup, className, mt, recipe, Mode.SIZED_EXACT);

case MH_SB_SIZED:

return MethodHandleStringBuilderStrategy.generate(mt, recipe, Mode.SIZED);

case MH_SB_SIZED_EXACT:

return MethodHandleStringBuilderStrategy.generate(mt, recipe, Mode.SIZED_EXACT);

case MH_INLINE_SIZED_EXACT:

return MethodHandleInlineCopyStrategy.generate(mt, recipe);

default:

throw new StringConcatException("Concatenation strategy " + STRATEGY + " is not implemented");

}

} catch (Error | StringConcatException e) {

// Pass through any error or existing StringConcatException

throw e;

} catch (Throwable t) {

throw new StringConcatException("Generator failed", t);

}

}

//......

}

makeConcatWithConstants方法内部调用了doStringConcat,而doStringConcat方法则调用了generate方法来生成MethodHandle;generate根据不同的STRATEGY来生成MethodHandle,这些STRATEGY有BC_SB、BC_SB_SIZED、BC_SB_SIZED_EXACT、MH_SB_SIZED、MH_SB_SIZED_EXACT、MH_INLINE_SIZED_EXACT,默认是MH_INLINE_SIZED_EXACT(可以通过-Djava.lang.invoke.stringConcat来改变默认的策略)

小结

Java 9引入了Compact Strings来取代Java 6的Compressed Strings,它的实现更过彻底,完全使用byte[]来替代char[],同时新引入了一个字段coder来标识是LATIN1还是UTF16

isLatin1方法判断COMPACT_STRINGS为true且coder为LATIN1则返回true;诸如charAt、equals、hashCode、indexOf、substring等等一系列方法都依赖isLatin1方法来区分对待是StringLatin1还是StringUTF16

Java 9利用InvokeDynamic调用了StringConcatFactory.makeConcatWithConstants方法进行字符串拼接优化,相比于Java 8通过转换为StringBuilder来进行优化,Java 9提供了多种STRATEGY可供选择,这些STRATEGY有BC_SB(等价于Java 8的优化方式)、BC_SB_SIZED、BC_SB_SIZED_EXACT、MH_SB_SIZED、MH_SB_SIZED_EXACT、MH_INLINE_SIZED_EXACT,默认是MH_INLINE_SIZED_EXACT(可以通过-Djava.lang.invoke.stringConcat来改变默认的策略)

doc

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值