引子
今天,同学发来一个关于String的程序段,让我判断,先给出代码,大家可先想想执行结果。
public class InternTest {
public static void main(String[] args) {
String s = new String("1");
s.intern();
String s2 = "1";
System.out.println(s == s2);
System.out.println(s.equals(s2));
String s3 = new String("1") + new String("1");
s3.intern();
String s4 = "11";
System.out.println(s3 == s4);
System.out.println(s3.equals(s4));
}
}
何为String
在《java8语言规范中》中String类型的说明如下:
1、Instances of class String represent sequences of Unicode code points(String类的实例表示Unicode字符序列)
2、A String object has a constant (unchanging) value (一个String对象有一个不可变的常量值)。
3、String literals are references to instances of class String(String字段时String类实例的引用)。
4、The string concatenation operator + implicitly creates a new String object when the result is not a constant expression(String的合并操作“+”会隐式的生成一个新的String对象)。
有了语言规范的定义,我们大概清楚了String使用的限制。一般,String变量的定义方式有3种:
1、使用关键字 new,如:String str = new String("spring");
2、直接定义,如 String str = “spring";
3、连接生成,如 String str = "spr"+new String("ing");
== & equals
我们知道java中使用 == 和 equals来比较两个对象。equals最初是在Object对象中实现的。
public boolean equals(Object obj) {
return (this == obj); //这里 equals与==是等价的
}
但是一般我们定义类的时候,会重载Object的hashCode与equals方法。String也不例外,重载后equals表示String的内容组成是否相等。
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
总结下来,==一般用来比较java虚拟机栈中的对象(虚拟机栈中保存基本类型和引用类型的引用)是否相等,而equals表示堆中的内容是否相等。
String的intern()方法
public native String intern();它返回一个字符串对象的标准表示形式。字符串池最初是空的,由String类私有并维护。调用该方法,如果池中包含一个字符串,有equals(Object)判断,等于该字符串对象 则返回池中的字符串。否则,该字符串对象将添加到池中,并返回该字符串对象的引用。
因此, 对于任意两个字符串S和T,S intern() = = T intern()是真的当且仅当s.equals(t)是真的。
String的intern()方法时一个本地方法。通过JNI调用底层的C++动态库,其实现源代码如下
因此, 对于任意两个字符串S和T,S intern() = = T intern()是真的当且仅当s.equals(t)是真的。
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
JVMWrapper("JVM_InternString");
JvmtiVMObjectAllocEventCollector oam;
if (str == NULL) return NULL;
oop string = JNIHandles::resolve_non_null(str);
oop result = StringTable::intern(string, CHECK_NULL);
return (jstring) JNIHandles::make_local(env, result);
JVM_END
再继续看看StringTable::intern(String,CHECK_NULL)
oop StringTable::intern(oop string, TRAPS)
{
if (string == NULL) return NULL;
ResourceMark rm(THREAD);
int length;
Handle h_string (THREAD, string);
jchar* chars = java_lang_String::as_unicode_string(string, length);
oop result = intern(h_string, chars, length, CHECK_NULL);
return result;
}
oop StringTable::intern(Handle string_or_null, jchar* name,
int len, TRAPS) {
unsigned int hashValue = hash_string(name, len);
int index = the_table()->hash_to_index(hashValue);
oop found_string = the_table()->lookup(index, name, len, hashValue); //调用lookup()方法
// Found
if (found_string != NULL) return found_string;
debug_only(StableMemoryChecker smc(name, len * sizeof(name[0])));
assert(!Universe::heap()->is_in_reserved(name) || GC_locker::is_active(),
"proposed name of symbol must be stable");
Handle string;
// try to reuse the string if possible
if (!string_or_null.is_null() && (!JavaObjectsInPerm || string_or_null()->is_perm())) {
string = string_or_null;
} else {
string = java_lang_String::create_tenured_from_unicode(name, len, CHECK_NULL);
}
// Grab the StringTable_lock before getting the_table() because it could
// change at safepoint.
MutexLocker ml(StringTable_lock, THREAD);
// Otherwise, add to symbol to table
return the_table()->basic_add(index, string, name, len,
hashValue, CHECK_NULL);
}
Symbol* SymbolTable::lookup(int index, const char* name,
int len, unsigned int hash) {
int count = 0;
for (HashtableEntry<Symbol*, mtSymbol>* e = bucket(index); e != NULL; e = e->next()) {
count++; // count all entries in this bucket, not just ones with same hash
if (e->hash() == hash) {
Symbol* sym = e->literal();
if (sym->equals(name, len)) { //如上所述,用equals方式比较
// something is referencing this symbol now.
sym->increment_refcount();
return sym;
}
}
}
// If the bucket size is too deep check if this hash code is insufficient.
if (count >= BasicHashtable<mtSymbol>::rehash_count && !needs_rehashing()) {
_needs_rehashing = check_rehash_table(count);
}
return NULL;
}
下面是StringTable的数据结构,注意,StringTable并非常量池。
class StringTable : public Hashtable<oop, mtSymbol> {
friend class VMStructs;
private:
// The string table
static StringTable* _the_table;
// Set if one bucket is out of balance due to hash algorithm deficiency
static bool _needs_rehashing;
// Claimed high water mark for parallel chunked scanning
static volatile int _parallel_claimed_idx;
static oop intern(Handle string_or_null, jchar* chars, int length, TRAPS);
oop basic_add(int index, Handle string_or_null, jchar* name, int len,
unsigned int hashValue, TRAPS);
oop lookup(int index, jchar* chars, int length, unsigned int hashValue);
// Apply the give oop closure to the entries to the buckets
// in the range [start_idx, end_idx).
static void buckets_do(OopClosure* f, int start_idx, int end_idx);
StringTable() : Hashtable<oop, mtSymbol>((int)StringTableSize,
sizeof (HashtableEntry<oop, mtSymbol>)) {} ....}
StringTable数据结构是我们常用的java中的hashtable, 先计算字符串的hashcode,根据hashcode到对应的数组,然后遍历里面的链表结构比较字符串里的每个字符,直到找到相同的。当数据比较多的时候,会导致查找效率变慢,java会在进入safepoint点的时候判断是否需要做一次rehash,就是扩大数组的容量来提高查找的效率。
引子的具体分析
1、命令行切换到类所在目录,编译程序:javac InternTest.java
2、分析编译后的字节码:javap -verbose InternTest
首先是常量池:
public class InternTest
SourceFile: "InternTest.java"
minor version: 0
major version: 52
flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
#1 = Methodref #16.#29 // java/lang/Object."<init>":()V
#2 = Class #30 // java/lang/String
#3 = String #31 // 1
#4 = Methodref #2.#32 // java/lang/String."<init>":(Ljava/l
ang/String;)V
#5 = Methodref #2.#33 // java/lang/String.intern:()Ljava/la
ng/String;
#6 = Fieldref #34.#35 // java/lang/System.out:Ljava/io/Prin
tStream;
#7 = Methodref #36.#37 // java/io/PrintStream.println:(Z)V
#8 = Methodref #2.#38 // java/lang/String.equals:(Ljava/lan
g/Object;)Z
#9 = Methodref #36.#39 // java/io/PrintStream.println:()V
#10 = Class #40 // java/lang/StringBuilder
#11 = Methodref #10.#29 // java/lang/StringBuilder."<init>":(
)V
#12 = Methodref #10.#41 // java/lang/StringBuilder.append:(Lj
ava/lang/String;)Ljava/lang/StringBuilder;
#13 = Methodref #10.#42 // java/lang/StringBuilder.toString:(
)Ljava/lang/String;
#14 = String #43 // 11
#15 = Class #44 // InternTest
#16 = Class #45 // java/lang/Object
#17 = Utf8 <init>
#18 = Utf8 ()V
#19 = Utf8 Code
#20 = Utf8 LineNumberTable
#21 = Utf8 main
#22 = Utf8 ([Ljava/lang/String;)V
#23 = Utf8 StackMapTable
#24 = Class #46 // "[Ljava/lang/String;"
#25 = Class #30 // java/lang/String
#26 = Class #47 // java/io/PrintStream
#27 = Utf8 SourceFile
#28 = Utf8 InternTest.java
#29 = NameAndType #17:#18 // "<init>":()V
#30 = Utf8 java/lang/String
#31 = Utf8 1
#32 = NameAndType #17:#48 // "<init>":(Ljava/lang/String;)V
#33 = NameAndType #49:#50 // intern:()Ljava/lang/String;
#34 = Class #51 // java/lang/System
#35 = NameAndType #52:#53 // out:Ljava/io/PrintStream;
#36 = Class #47 // java/io/PrintStream
#37 = NameAndType #54:#55 // println:(Z)V
#38 = NameAndType #56:#57 // equals:(Ljava/lang/Object;)Z
#39 = NameAndType #54:#18 // println:()V
#40 = Utf8 java/lang/StringBuilder
#41 = NameAndType #58:#59 // append:(Ljava/lang/String;)Ljava/l
ang/StringBuilder;
#42 = NameAndType #60:#50 // toString:()Ljava/lang/String;
#43 = Utf8 11
#44 = Utf8 InternTest
#45 = Utf8 java/lang/Object
#46 = Utf8 [Ljava/lang/String;
#47 = Utf8 java/io/PrintStream
#48 = Utf8 (Ljava/lang/String;)V
#49 = Utf8 intern
#50 = Utf8 ()Ljava/lang/String;
#51 = Utf8 java/lang/System
#52 = Utf8 out
#53 = Utf8 Ljava/io/PrintStream;
#54 = Utf8 println
#55 = Utf8 (Z)V
#56 = Utf8 equals
#57 = Utf8 (Ljava/lang/Object;)Z
#58 = Utf8 append
#59 = Utf8 (Ljava/lang/String;)Ljava/lang/StringBuilder;
#60 = Utf8 toString
再看看我们的main方法
<public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=4, locals=5, args_size=1 //<span style="color:#ff0000;">深度为4的操作数栈,局部变量Slot个数为5,一个输入参数</span>
0: new #2 // class java/lang/String
3: dup //复制栈顶数值 并将 复制值压入栈顶
4: ldc #3 // String 1
6: invokespecial #4 // Method java/lang/String."<init>
":(Ljava/lang/String;)V //创建String s对象
9: astore_1 // 将String 1的引用 保存到 slot 1中,即s变量。
10: aload_1
11: invokevirtual #5 // Method java/lang/String.intern:()Ljava/lang/String;
14: pop
<span style="color:#ff0000;">15: ldc #3 </span> // String 1
<span style="color:#ff0000;">17: astore_2 </span>
18: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
21: aload_1
22: aload_2
23: if_acmpne 30
26: iconst_1
27: goto 31
30: iconst_0
31: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
34: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
37: aload_1
38: aload_2
39: invokevirtual #8 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
42: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
45: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
48: invokevirtual #9 // Method java/io/PrintStream.println:()V
51: new #10 // class java/lang/StringBuilder
54: dup
55: invokespecial #11 // Method java/lang/StringBuilder."<init>":()V
58: new #2 // class java/lang/String
61: dup
62: ldc #3 // String 1
64: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V
67: invokevirtual #12 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
70: new #2 // class java/lang/String
73: dup
74: ldc #3 // String 1
76: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V
79: invokevirtual #12 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
82: invokevirtual #13 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
85: astore_3
86: aload_3
87: invokevirtual #5 // Method java/lang/String.intern:()Ljava/lang/String;
90: pop
91: ldc #14 // String 11
93: astore 4
95: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
98: aload_3
99: aload 4
101: if_acmpne 108
104: iconst_1
105: goto 109
108: iconst_0
109: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
112: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
115: aload_3
116: aload 4
118: invokevirtual #8 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
121: invokevirtual #7 // Method java/io/PrintStream.println:(Z)V
124: return
其中 String s2 = "1";的代码对应字节码 为 ldc #3 ,astore_2 。其中ldc表示将 int.float或者String类型从常量池中推到 操作数栈顶。
在interpreterRuntime.cpp中我们看到了ldc的执行
IRT_ENTRY(void, InterpreterRuntime::ldc(JavaThread* thread, bool wide))
// access constant pool
constantPoolOop pool = method(thread)->constants();
int index = wide ? get_index_u2(thread, Bytecodes::_ldc_w) : get_index_u1(thread, Bytecodes::_ldc);
constantTag tag = pool->tag_at(index);
if (tag.is_unresolved_klass() || tag.is_klass()) {
klassOop klass = pool->klass_at(index, CHECK);
oop java_class = klass->java_mirror();
thread->set_vm_result(java_class);
} else {
#ifdef ASSERT
// If we entered this runtime routine, we believed the tag contained
// an unresolved string, an unresolved class or a resolved class.
// However, another thread could have resolved the unresolved string
// or class by the time we go there.
assert(tag.is_unresolved_string()|| tag.is_string(), "expected string");
#endif
oop s_oop = pool->string_at(index, CHECK);
thread->set_vm_result(s_oop);
}
IRT_END
因为这是个字符串常量,代码调用了pool->string_at(index, CHECK) ,最后代码调用了string_at_impl方法
oop constantPoolOopDesc::string_at_impl(constantPoolHandle this_oop, int which, TRAPS) {
oop str = NULL;
CPSlot entry = this_oop->slot_at(which);
if (entry.is_metadata()) {
ObjectLocker ol(this_oop, THREAD);
if (this_oop->tag_at(which).is_unresolved_string()) {
// Intern string
Symbol* sym = this_oop->unresolved_string_at(which);
<span style="font-size:14px;">str = StringTable::intern(sym, CHECK_(constantPoolOop(NULL)));</span>
this_oop->string_at_put(which, str);
} else {
// Another thread beat us and interned string, read string from constant pool
str = this_oop->resolved_string_at(which);
}
} else {
str = entry.get_oop();
}
assert(java_lang_String::is_instance(str), "must be string");
return str;
}
在代码中,我们可以看到在没有调用ldc 之前,字符串常量值是用symbol 来表示的,而当调用ldc之后,通过调用StringTable::intern产生了String的引用,并且存放在常量池中。如果再调用ldc指令的话,直接从常量池根据索引取出String的引用(this_oop->resolved_string_at(which)),而避免再次从StringTable中去查找一次。
以此方法来分析。
1、堆中new一个String变量,s持有其堆中引用,并且会在常量池中生成一个”1“对象。
2、调用s.intern()方法,最终调用StringTable.intern(),试图将变量s的引用加入到常量池中,发现其已经存在。
3、s2="1",查找常量池中是否有”1“,有,则返回常量池中”1“的引用 保存在 s2中。
3、s2="1",查找常量池中是否有”1“,有,则返回常量池中”1“的引用 保存在 s2中。
4、所以 s==s2 结果为false。
5、s3 = new String("1")+new String("1"); 首先 会在堆中生成String对象 并在常量池中生成”1“。我们知道jvm会使用StringBuilder来优化使用”+“的字符串生成。语句执行完成后,堆中有 String "11"的对象,而常量池中并没有。
6、s3.intern()将其加入常量池,jdk7开始,不再复制常量值,与堆栈中的s3相同,常量池中保存s3在堆中的引用。
7、s4 = "11",调用ldc命令,查询常量池,存在,直接返回其引用。所以 s3==s4.
大致就是这样子,后来搜索了一下,发现同学也是在一篇博客中看到的,
深入解析String#intern,讲解的很细致,推荐大家看看,本篇对其也有参考,另外,参考了
Java (JDK7)中的String常量和String.intern的实现,因为String的intern()方法使用hashTable,故数据量比较大的时候会出现较多的哈希冲突,链接法效率较低,所以会经常出现性能问题,这方面暂不讨论,上述博客有分析到,大家自己去看看并探索吧。