我们在日常开发中经常用字符串,只有了解了String的实现机制,才能写出更好,更健壮的代码。
常量池
Java代码被编译成class文件时,会生成常量池(Constant pool)数据结构,用于保存字面常量和符号引用(类名、方法名、接口名和字段名等)
public class Test {
public static void main(String []args)
{
String s = "micky";
}
}
在编译器中编写上述代码,然后Teminal 输入:javac Test.java 将类加载如编译器
然后通过
javap -verbose Test.class
查看Test.class的常量池实现
Constant pool:
#1 = Methodref #4.#13 // java/lang/Object."<init>":()V
#2 = String #14 // micky
#3 = Class #15 // com/mingzhao/www/myapplication/Test
#4 = Class #16 // java/lang/Object
#5 = Utf8 <init>
#6 = Utf8 ()V
#7 = Utf8 Code
#8 = Utf8 LineNumberTable
#9 = Utf8 main
#10 = Utf8 ([Ljava/lang/String;)V
#11 = Utf8 SourceFile
#12 = Utf8 Test.java
#13 = NameAndType #5:#6 // "<init>":()V
#14 = Utf8 micky
#15 = Utf8 com/mingzhao/www/myapplication/Test
#16 = Utf8 java/lang/Object
可以看出字符串“micky”在常量池中的定义方式:
#2 = String #14 // micky
#14 = Utf8 micky
这里也解释了我们之前所说的String类型的变量是引用变量,存储的是字符串的地址,String s = “micky”, s对应就是“micky”的地址”。
下面是main()方法的字节码指令:
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=2, args_size=1
0: ldc #2 // String micky
2: astore_1
3: return
LineNumberTable:
line 11: 0
line 12: 3
}
String s = “micky” 对应:
0: ldc #2 // String micky
2: astore_1
- 当Test类被虚拟机加载的时候,“micky”字符串在常量池中使用符号引用标识symbol,当调用
ldc #2
,如果Constant pool 中的#2 的symbol 没有解析,就回去调用C++底层的StringTable::intern
方法生成char数组,并将引用保存在StringTable和常量池中,下次如果再次调用ldc #2
,直接从#2获取字符串的引用,避免再次查找。 - astore_1 将“micky”的引用保存到局部变量表
扩展: 我们经常在面试的时候被问到:
public class Test {
public static void main(String []args)
{
String s = "micky";
String s2 = "micky";
System.out.println(s==s2);
}
}
请问输出??
答案是true。因为字符串“micky”会被存储在常量池的某个位置,s 被赋值为这个地址,s2同样被赋值了这个地址,两个地址是同一个地址。下面我们看下它的常量池:
Constant pool:
#1 = Methodref #6.#19 // java/lang/Object."<init>":()V
#2 = String #20 // micky
#3 = Fieldref #21.#22 // java/lang/System.out:Ljava/io/PrintStream;
#4 = Methodref #23.#24 // java/io/PrintStream.println:(Z)V
#5 = Class #25 // com/mingzhao/www/myapplication/Test
#6 = Class #26 // java/lang/Object
#7 = Utf8 <init>
#8 = Utf8 ()V
#9 = Utf8 Code
#10 = Utf8 LineNumberTable
#11 = Utf8 main
#12 = Utf8 ([Ljava/lang/String;)V
#13 = Utf8 StackMapTable
#14 = Class #27 // "[Ljava/lang/String;"
#15 = Class #28 // java/lang/String
#16 = Class #29 // java/io/PrintStream
#17 = Utf8 SourceFile
#18 = Utf8 Test.java
#19 = NameAndType #7:#8 // "<init>":()V
#20 = Utf8 micky
#21 = Class #30 // java/lang/System
#22 = NameAndType #31:#32 // out:Ljava/io/PrintStream;
#23 = Class #29 // java/io/PrintStream
#24 = NameAndType #33:#34 // println:(Z)V
#25 = Utf8 com/mingzhao/www/myapplication/Test
#26 = Utf8 java/lang/Object
#27 = Utf8 [Ljava/lang/String;
#28 = Utf8 java/lang/String
#29 = Utf8 java/io/PrintStream
#30 = Utf8 java/lang/System
#31 = Utf8 out
#32 = Utf8 Ljava/io/PrintStream;
#33 = Utf8 println
#34 = Utf8 (Z)V
可以看出,两个字符串被赋值的时候,并没有在常量池重新开辟空间存储“micky”,只存在一个“micky”。
我们再看下它的main()方法字节指令,更加直观。
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=3, locals=3, args_size=1
0: ldc #2 // String micky
2: astore_1
3: ldc #2 // String micky
5: astore_2
6: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
9: aload_1
10: aload_2
11: if_acmpne 18
14: iconst_1
15: goto 19
18: iconst_0
19: invokevirtual #4 // Method java/io/PrintStream.println:(Z)V
22: return
LineNumberTable:
line 11: 0
line 12: 3
line 13: 6
line 14: 22
StackMapTable: number_of_entries = 2
frame_type = 255 /* full_frame */
offset_delta = 18
locals = [ class "[Ljava/lang/String;", class java/lang/String, class java/lang/String ]
stack = [ class java/io/PrintStream ]
frame_type = 255 /* full_frame */
offset_delta = 0
locals = [ class "[Ljava/lang/String;", class java/lang/String, class java/lang/String ]
stack = [ class java/io/PrintStream, int ]
}
String s = “micky” 对应:
0: ldc #2 // String micky
2: astore_1
String s2 = "micky"对应:
3: ldc #2 // String micky
5: astore_2
可以看出都是通过#2 获取字符串的引用,这就解释了为什么打印的是true
打印结果:
常量池分配
1、JDK6及之前版本中,常量池的内存在永久代PermGen进行分配,所以常量池会受到PermGen内存大小的限制。
2、JDK7中,常量池的内存在Java堆上进行分配,意味着常量池不受固定大小的限制了。
3、JDK8中,虚拟机团队移除了永久代PermGen。
字符串初始化
我们对字符串进行初始化:字面常量[直接赋值]
和字符串对象
字面常量的方式(直接赋值)
public class Test {
public static void main(String []args)
{
String s = "micky";
String s1 = "micky";
String s2 = "mi" + "cky";
}
}
通过javap -c 查看字节码指令:
public static void main(java.lang.String[]);
Code:
0: ldc #2 // String micky
2: astore_1
3: ldc #2 // String micky
5: astore_2
6: ldc #2 // String micky
8: astore_3
9: return
通过指令发现,s、s1、s2都指向常量池字符串“micky”,这里s2在编译过程中将‘mi’+'cky’的结果“micky”直接赋值给了s2。
String对象
public class Test {
public static void main(String []args)
{
String s1 = "micky";
String s2 = new String("micky");
}
}
上述代码s1 == s2 ? ? 我们同样看下他的字节码指令:
public static void main(java.lang.String[]);
Code:
0: ldc #2 // String micky
2: astore_1
3: new #3 // class java/lang/String
6: dup
7: ldc #2 // String micky
9: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V
12: astore_2
13: return
String s1 = “micky” 对应:
0: ldc #2 // String micky
2: astore_1
String s2 = new String(“micky”) 对应:
3: new #3 // class java/lang/String
6: dup
7: ldc #2 // String micky
9: invokespecial #4 // Method java/lang/String."<init>":(Ljava/lang/String;)V
- new 指令 为字符串对象s2在java堆上申请内存空间
- 第7行指令:从常量池取“micky”字符串,如果没有对应的字符串,则会在常量池创建并返回
- invokespecial 调用构造方法,初始化字符串对象
变量s1通过char数组存储字符串,s1指向常量池的字符串“micky,而s2直接指向的是java堆上的一个对象,而java堆上的这个对象才是指向常量池的字符串“micky”,所以显然s1 != s2
。 从网上找到一张图,可以很形象的说明。字面常量和String对象初始化中间只差了一个媒介。
扩展
下面这段代码既有字面常量,也有String对象,结果回事什么样的。
public class Test {
public static void main(String []args)
{
String s1 = "micky";
String s2 = "is great";
String s3 = s1 + s2;
String s4 = "micky is great";
}
这种情况s3 == s4吗? 看下字节码指令:
public static void main(java.lang.String[]);
Code:
0: ldc #2 // String micky
2: astore_1
3: ldc #3 // String is great
5: astore_2
6: new #4 // class java/lang/StringBuilder
9: dup
10: invokespecial #5 // Method java/lang/StringBuilder."<init>":()V
13: aload_1
14: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
17: aload_2
18: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
21: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
24: astore_3
25: ldc #8 // String micky is great
27: astore 4
29: return
我们接下来一步一步分析下:
- String s1 = “micky”;
0: ldc #2 // String micky
2: astore_1
- String s2 = “is great”;
3: ldc #3 // String is great
5: astore_2
- String s3 = s1 + s2;
6: new #4 // class java/lang/StringBuilder
9: dup
10: invokespecial #5 // Method java/lang/StringBuilder."<init>":()V
13: aload_1
14: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
17: aload_2
18: invokevirtual #6 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
21: invokevirtual #7 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
a. 第6行指令:在java堆上为StringBuilder对象申请内存空间
b. 第10行指令:调用构造方法,初始化StringBuilder对象
c. 14-18行invokevirtua指令,调用append方法,添加s1、s2的字符串
d. 第21行invokevirtua指令,调用toString()方法,将S听Builder转换为String对象。
- String s4 = “micky is great”;
25: ldc #8 // String micky is great
27: astore 4
String s3 = s1 + s2 经过一系列操作:首先在Java堆生出一个StringBuilder对象,通过append方法添加字符串,最后调用toString()转换为String对象,最后s3实际指向的是java堆上的一个对象,而s4则直接指向常量池的”micky is great“ 字符串,所以s3 != s4。
当字符串被final 修饰时,情况又会有变化。
public class Test {
public static void main(String []args)
{
final String s1 = "micky";
final String s2 = " is great";
String s3 = s1 + s2;
String s4 = "micky is great";
}
}
看下字节码指令:
public static void main(java.lang.String[]);
Code:
0: ldc #2 // String micky is great
2: astore_3
3: ldc #2 // String micky is great
5: astore 4
7: return
final 修饰后, final String s1 = "micky"相当一个常量”micky“, final String s2 = " is great"相当常量” is great“,所以String s3 = s1 + s2 相当于:String s3 = ”micky“ + ” is great“,前面我们已经提到了,编译器在编译的时候会直接把”micky is great“的引用赋值给s3,所以s3 == s4。