创建字符串
public class Test {
private static final String ABC = "abc";
public static void main(String[] args) {
String s1 = "abc";
String s2 = new String("abc");
String s3 = "a" + "bc";
String s4 = "a" + new String("bc");
final String s5 = "a";
final String s6 = "bc";
String s7 = "a" + new String("bc");
System.out.println("s1 == ABC ? " + (s1==ABC));
System.out.println("s2 == ABC ? " + (s2==ABC));
System.out.println("s3 == ABC ? " + (s3==ABC));
System.out.println("s4 == ABC ? " + (s4==ABC));
System.out.println("(s5+s6) == ABC ? " + ((s5+s6) == ABC));
System.out.println("s7.intern() == ABC ? " + (s7.intern() == ABC));
}
}/* output:
s1 == ABC ? true
s2 == ABC ? false
s3 == ABC ? true
s4 == ABC ? false
(s5+s6) == ABC ? true
s7.intern() == ABC ? true
*///:~
String s1 = "abc"; 会首先在常量池中查找"abc"是否已存在,若存在则直接将s1指向该字符串,若不存在,则在常量池中创建该字符串并将s1指向该字符串。
String s2 = new String("abc"); 首先会在常量池中查找或创建"abc",再使用String构造器在堆中创建一个字符串对象(内部关联常量池的"abc"),最后将s2指向堆内存中的字符串对象。
/**
* Initializes a newly created {@code String} object so that it represents
* the same sequence of characters as the argument; in other words, the
* newly created string is a copy of the argument string. Unless an
* explicit copy of {@code original} is needed, use of this constructor is
* unnecessary since Strings are immutable.
*/
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}
String s3 = "a" + "bc"; 当右边表达式中全是字符串常量用+号连接时,编译器会优化成String s3 = "abc";(反编译代码完全一致),故而与s1相同。
String s4 = "a" + new String("bc"); new String("bc")的情形与s1相同。但使用+号连接时,与s3不同。编译器会先创建一个StringBuilder对象,使用append代替+号拼接字符串,最后使用toString将结果传递给s4。对应的反编译代码如下:
0: new #16 // class java/lang/StringBuilder
3: dup
4: ldc #18 // String a
6: invokespecial #20 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
9: new #23 // class java/lang/String
12: dup
13: ldc #25 // String bc
15: invokespecial #27 // Method java/lang/String."<init>":(Ljava/lang/String;)V
18: invokevirtual #28 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
21: invokevirtual #32 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
24: astore_1
25: return
除只有字符串常量的+号连接的情况,字符串使用+号连接时,编译器都会自动引入StringBuilder对象。再使用append连接,最后toString返回拼接后的字符串。
s5 + s6 == ABC; 其中s5和s6都是final变量,会直接做字符串替换,s5+s6就等价于"abc"。
s7.intern() == ABC; 涉及intern的用法。
(补充intern用法:JDK 1.7后,intern方法还是会先去查询常量池中是否有已经存在,如果存在,则返回常量池中的引用,这一点与之前没有区别,区别在于,如果在常量池找不到对应的字符串,则不会再将字符串拷贝到常量池,而只是在常量池中生成一个对原字符串的引用。简单的说,就是往常量池放的东西变了:原来在常量池中找不到时,复制一个副本放到常量池,1.7后则是将在堆上的地址引用复制到常量池。)
更多内容参考:https://blog.csdn.net/soonfly/article/details/70147205
String的方法
public class Test {
private static final String ABC = "abc";
public static void main(String[] args) {
String s1 = " abced fg";
String s2 = "h ijk mln";
StringBuilder sb = new StringBuilder(" abced fg");
System.out.println("s1.length() = " + s1.length());
System.out.println("s1.charAt() = " + s1.charAt(s1.length()-1));
char[] cb = new char[20];
byte[] bb;
s1.getChars(3, s1.length(), cb, 0);
bb = s1.getBytes();
System.out.println("s1.getChars() = " + cb);
System.out.println("s1.getBytes() = " + bb);
System.out.println("s1.toCharArray() = " + s1.toCharArray());
System.out.println("s1.equals() = " + s1.equals(sb));
System.out.println("s1.contentEquals() = " + s1.contentEquals(sb));
System.out.println("s1.compareTo() = " + s1.compareTo(s2));
System.out.println("s1.startsWith() = " + s1.startsWith("abc", 3));
System.out.println("s1.endsWith() = " + s1.endsWith("g"));
System.out.println("s1.indexOf() = " + s1.indexOf(97));
System.out.println("s1.lastIndexOf() = " + s1.lastIndexOf(99));
System.out.println("s1.subString() = " + s1.substring(3));
System.out.println("s1.concat() = " + s1.concat(s2));
System.out.println("s1.replace() = " + s1.replace("abc", "123"));
System.out.println("s1.replaceAll() = " + s1.replaceAll(" ", ""));
System.out.println("s1.trim() = " + s1.trim());
System.out.println("String.valueOf() = " + String.valueOf(123));
}
}/* output:
s1.length() = 11
s1.charAt() = g
s1.getChars() = [C@1db9742
s1.getBytes() = [B@106d69c
s1.toCharArray() = [C@52e922
s1.equals() = false
s1.contentEquals() = true
s1.compareTo() = -72
s1.startsWith() = true
s1.endsWith() = true
s1.indexOf() = 3
s1.lastIndexOf() = 5
s1.subString() = abced fg
s1.concat() = abced fgh ijk mln
s1.replace() = 123ed fg
s1.replaceAll() = abcedfg
s1.trim() = abced fg
String.valueOf() = 123
*///:~
格式化输出
格式修饰符:%[argument_index$][flags][width][.precision]conversion
(1)System.out.format
Java的format方法可用于PrintStream或PrintWriter对象,类似于C语言的printf。
(2)Formatter类
import java.util.Formatter;
public class Test {
public static void main(String[] args) {
String s = "formatted";
int x = 10;
float y = 20.0f;
Formatter f = new Formatter(System.out);
f.format("%s : %d,%.2f\n", s, x, y);
}
}/* output:
formatted : 10,20.00
*///:~
Formatter类的format返回的是Formatter类,并将格式化后的字符串输出到流中。
(3)String.format()
返回格式化后的字符串。
正则表达式
正则表达式(regular expression)用一个“字符串”来描述一个特征,然后去验证另一个“字符串”是否符合这个特征。
正则表达式可用于验证、查找和替换。
参考:https://www.cnblogs.com/yw0219/p/8047938.html
1、规则
(1)字符
普通字符:x 字符x [aeiou] 一个元音字符 [0-9] 等价于\d [^0-9] 等价于\D
特殊字符:\b \t \r \f \n \\ \" \' \uhh \uhhhh
(2)元字符
. 除换行以外任意字符 ^ 字符串的开始 $ 字符串的结束 \b 单词边界
\w 字母/数字/下划线/中文字符 \d 数字 \s 任意空白符
(3)反义
\W \D \S \B 分别与\w \d \s \b完全相反
(4)转义
\^ \$ \.等 对特殊字符转义使用
(5)重复
* 零次或多次 + 一次或多次 ? 零次或1次
{n} n次 {n,}至少n次 {m,n} n到m次
(6)贪婪与懒惰
(重复+?)为懒惰匹配。如a.*b会尽匹配符合条件的最长的字符串,而a.*?b会匹配符合条件的最短的字符串。
(7)分组
(X) 捕获组,可使用\i引用第i个捕获组;
((A)(B(C))):1、((A)(B(C)));2、(A);3、(B(C));4、(C)
(8)与或
&& 与(如[a-z&&[abc]]等价于a|b|c)
| 或者(如cat|dog可以匹配cat或dog)
(9)零宽断言
(?=exp) 断言自身出现的位置的后面能匹配表达式exp;
(?<=exp) 断言自身出现的位置的前面能匹配表达式exp;
(?!exp) 断言此位置的后面不能匹配表达式exp;
(?<!exp) 断言此位置的前面不能匹配表达式exp;
(10)注释
(?#comment) 其中comment为注释部分
(11)Pattern标记(flag参数)
Pattern.CANON_EQ 两个字符当且仅当它们的完全范式分解相匹配时,则它们是匹配的;
Pattern.CASE_INSENSITIVE(?i) 不区分大小写。Unicode字符集则还需结合Pattern.UNICODE_CASE才能开启大小写不敏感模式;
Pattern.COMMENTS(?x) 忽略空格符,且以#开头一行都将被忽略;
Pattern.DOTALL(?s) “.”会匹配所有字符,包括终结符;
Pattern.MULTILINE(?m) 多行模式下,^和$分别匹配一行的开始和结尾;
Pattern.UNICODE_CASE(?u)
Pattern.UNIX_LINES(?d) .、^和$只识别\n
2、应用
参考:https://blog.csdn.net/u013836857/article/details/52251358
3、String类、Pattern和Matcher类
(1)String有关正则的方法(验证、分割、替换)
public boolean matches(String regex); //验证
public String replaceAll(String regex, String replacement); //替换
public String replaceFirst(String regex, String replacement);
public String[] split(String regex); //分割
public String[] split(String regex, int limit);
(源码实现使用的是Pattern和Matcher)
(2)Pattern和Matcher
验证
boolean Pattern.matches(String regex, CharSequence input);
boolean m.matches();
查找
m.find(); m.find(int start); lookingAt();
支持分组:m.groupCount()和m.group(i)
分割
p.split(s, limit);
(3)其他替换操作
m.appendReplacement(StringBuffer sbuf, String replacement)执行渐进式替换;
m.appendTail(StringBuffer sbuf)在执行一次或多次appendReplacement后,将余下字符串复制到sbuf中。
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "Wuhan University of Technology.";
// Pattern.matches()
System.out.println("1: " + Pattern.matches("[A-Z].*\\.", s));
// Pattern.split()
Pattern p = Pattern.compile("\\W+");
String[] ss = p.split(s, 0);
System.out.print("2: ");
for(String s0 : ss) {
System.out.print(s0 + "/");
}
System.out.println();
// Matcher: find()/m.group()
p = Pattern.compile("\\b[A-Z][a-zA-Z]*\\b");
Matcher m = p.matcher(s);
System.out.println("3:");
while(m.find()) {
System.out.println("\tMatch \"" + m.group() + "\" at positions "
+ m.start() + "-" + (m.end()-1));
}
System.out.println("4: " + m.matches());
System.out.println("5: " + m.lookingAt());
// Matcher: find(i)
p = Pattern.compile("\\w+");
m = p.matcher(s);
System.out.print("6: ");
for(int i = 0; m.find(i); i++) {
if((i & 3) == 0)
System.out.print("\n\t");
System.out.print(m.group() + " ");
}
System.out.println();
// Matcher: appendReplacement和appendTail
p = Pattern.compile("\\b[A-Z]");
m = p.matcher(s);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, m.group().toLowerCase());
}
m.appendTail(sb);
System.out.println("7: " + sb);
}
}/* output:
1: true
2: Wuhan/University/of/Technology/
3:
Match "Wuhan" at positions 0-4
Match "University" at positions 6-15
Match "Technology" at positions 20-29
4: false
5: true
6:
Wuhan uhan han an
n University University niversity
iversity versity ersity rsity
sity ity ty y
of of f Technology
Technology echnology chnology hnology
nology ology logy ogy
gy y
7: wuhan university of technology.
*///:~
(4)reset
m.reset(String s); 将Matcher对象应用于新的字符串;
m.reset(); 将Matcher对象设置到当且字符的起始位置。
4、Scanner类
JDK5新增Scanner类,大大减轻扫描输入的工作负担。
Scanner构造器可接受的输入对象有:File、InputStream、String及Readable对象。
Scanner还支持正则表达式扫描方式。
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.regex.MatchResult;
public class Test {
public static void main(String[] args) throws FileNotFoundException {
Scanner in = new Scanner(new File("Test.java"));
in.useDelimiter("\r\n");
String pattern = "import\\s+(([a-z]+[.])+)([A-Z][a-zA-Z]*);";
while(in.hasNext(pattern)) {
in.next(pattern);
MatchResult mr = in.match();
System.out.print("package: " +
mr.group(1).substring(0, mr.group(1).length()-1));
System.out.println("; class: " + mr.group(3));
}
}
}/* output:
package: java.io; class: File
package: java.io; class: FileNotFoundException
package: java.util; class: Scanner
package: java.util.regex; class: MatchResult
*///:~
注意:使用正则表达式扫描时,仅针对下一个输入分词进行匹配,若其中含有定界符,则必定匹配失败。另外,Scanner的默认定界符是空白符,包括空格,制表和换行符。在windows环境下,换行符为\r\n,Unix下才是\n。