第十三章 字符串

第十三章 字符串

标签 : Java编程思想


可以证明字符串操作是计算机中最常见的行为。

13.1 不可变String

String对象是不可变的(immutable),String类中每一个修改字符串值得方法都是重新创建了一个全新的String类对象。

package com.string;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 11:00
 */
public class Immutable {
    public static String upcase(String s) {
        return s.toUpperCase();
    }

    public static void main(String[] args) {
        String a = "abc";
        System.out.println("a : " + a);
        System.out.println("upcase(a) : " + upcase(a));
        System.out.println("a : " + a);
    }
}

output:

a : abc
upcase(a) : ABC
a : abc

Process finished with exit code 0

实际传递的是引用的拷贝,每次进入upcase()方法时引用拷贝才会存在,离开upcase()方法,s便会消失。upcase()的返回值只是最终结果的引用,upcase()返回的引用已经指向了新的对象,而原本的a还处于原处。

重载”+”与StringBuilder

String对象具有只读特性,所以指向它的任何引用都不能改变其值。但是不可变性会产生效率问题,如:

package com.string;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 11:55
 */
public class Concatenation {
    public static void main(String[] args) {
        String mango = "mango";
        String s = "abc" + mango + "def" + 47;
        System.out.println(s);
    }
}

此方式会产生大量的中间对象,导致程序效率低下。

反编译结果:

E:\ThinkInJava\ThinkInJavaDemo\src\com\string>javap -c Concatenation
警告: 二进制文件Concatenation包含com.string.Concatenation
Compiled from "Concatenation.java"
public class com.string.Concatenation {
  public com.string.Concatenation();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public static void main(java.lang.String[]);
    Code:
       0: ldc           #2                  // String mango
       2: astore_1
       3: new           #3                  // class java/lang/StringBuilder
       6: dup
       7: invokespecial #4                  // Method java/lang/StringBuilder."<
init>":()V
      10: ldc           #5                  // String abc
      12: invokevirtual #6                  // Method java/lang/StringBuilder.ap
pend:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      15: aload_1
      16: invokevirtual #6                  // Method java/lang/StringBuilder.ap
pend:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      19: ldc           #7                  // String def
      21: invokevirtual #6                  // Method java/lang/StringBuilder.ap
pend:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      24: bipush        47
      26: invokevirtual #8                  // Method java/lang/StringBuilder.ap
pend:(I)Ljava/lang/StringBuilder;
      29: invokevirtual #9                  // Method java/lang/StringBuilder.to
String:()Ljava/lang/String;
      32: astore_2
      33: getstatic     #10                 // Field java/lang/System.out:Ljava/
io/PrintStream;
      36: aload_2
      37: invokevirtual #11                 // Method java/io/PrintStream.printl
n:(Ljava/lang/String;)V
      40: return
}

从字节码中可以看出,编译器自动使用了StringBuilder类,并且为每次拼接调用了StringBuilder类的append()方法。
但是不要以为编译器能优化代码,就可以随意使用String类和StringBuilder类,因为编译器自动优化程度有限。
两个方法,implicit()使用编译器自动优化,而explicit()方法是用StringBuilder。

package com.string;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 12:10
 */
public class WhitherStringBuilder {
    public String implicit(String[] fields) {
        String result = "";
        for (int i = 0; i < fields.length; i++) {
            result += fields[i];
        }
        return result;
    }

    public String explicit(String[] fields) {
        StringBuilder stringBuilder = new StringBuilder();
        for (int i = 0; i < fields.length; i++) {
            stringBuilder.append(fields[i]);
        }
        return stringBuilder.toString();
    }
}

反编译字节码:

E:\ThinkInJava\ThinkInJavaDemo\src\com\string>javap -c WhitherStringBuilder
警告: 二进制文件WhitherStringBuilder包含com.string.WhitherStringBuilder
Compiled from "WhitherStringBuilder.java"
public class com.string.WhitherStringBuilder {
  public com.string.WhitherStringBuilder();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  public java.lang.String implicit(java.lang.String[]);
    Code:
       0: ldc           #2                  // String
       2: astore_2
       3: iconst_0
       4: istore_3
       5: iload_3
       6: aload_1
       7: arraylength
       8: if_icmpge     38
      11: new           #3                  // class java/lang/StringBuilder
      14: dup
      15: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V
      18: aload_2
      19: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      22: aload_1
      23: iload_3
      24: aaload
      25: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      28: invokevirtual #6                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      31: astore_2
      32: iinc          3, 1
      35: goto          5
      38: aload_2
      39: areturn

  public java.lang.String explicit(java.lang.String[]);
    Code:
       0: new           #3                  // class java/lang/StringBuilder
       3: dup
       4: invokespecial #4                  // Method java/lang/StringBuilder."<init>":()V
       7: astore_2
       8: iconst_0
       9: istore_3
      10: iload_3
      11: aload_1
      12: arraylength
      13: if_icmpge     30
      16: aload_2
      17: aload_1
      18: iload_3
      19: aaload
      20: invokevirtual #5                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      23: pop
      24: iinc          3, 1
      27: goto          10
      30: aload_2
      31: invokevirtual #6                  // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      34: areturn
}

可以看出编译器自动优化StringBuilder类对象是在循环体中创建,使用StringBuilder类只创建了一个StringBuilder类对象。
如果可以预测字符串的大概长度,StringBuilder允许预先指定大小,可以避免多次缓冲分配。
最终的结果是StringBuilder类的append()方法拼接起来的,如果想走捷径使用append(a + “string” + b),那编译器会创建一个StringBuilder对象处理a + “string” + b。

StringBuilder类提供的方法包括:
- append()
- insert()
- replace()
- substring()
- reverse()
- toString()

13.3 无意识的递归

Java中的每个类都继承自Object类,标准容器类也是如此,因此标准容器类也有toString()方法,并且覆盖了该方法,因此容器类不仅能表达自己,也可以表达容器中的元素,即调用每个元素的toString()方法。
如果希望使用toString()方法打印内存地址,考虑使用this关键字:

package com.string;

import java.util.ArrayList;
import java.util.List;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 12:34
 */
public class InfiniteRecursion {
    public String toString() {
        return "InfiniteRecursion: " + this + "\n";
    }

    public static void main(String[] args) {
        List<InfiniteRecursion> infiniteRecursions = new ArrayList<>();
        for (int i = 0; i < 10; i++) {
            infiniteRecursions.add(new InfiniteRecursion());
        }
        System.out.println(infiniteRecursions);
    }
}

output:

Exception in thread "main" java.lang.StackOverflowError

一段非常长的异常,栈溢出。因为上面的代码中发生了自动类型转换,”+”后面跟的不是字符串,java会尝试调用this上的toString()方法将其转换为String类型,于是发生了递归调用。
此时应该使用Object.toString()方法,调用super.toString()方法便可得到正确的结果。

13.4 String上的操作

String类的常用方法:
1. 构造器 参数类型不同
2. length()
3. charAt()
4. compareTo()
5. contains()
6. equalsIgnoreCase()
7. startsWith()
8. endsWith()
9. substring()
10. concat()
11. replace()
12. trim()
13. valueOf()

13.5 格式化输出

13.5.1 printf()

格式修饰符输出:

printf("row 1 : [%d %f]", x, y);

13.5.2 System.out.format()

与printf()用法一致。

13.5.3 Formatter类

package com.string;

import java.io.PrintStream;
import java.util.Formatter;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 13:05
 */
public class Turtle {
    private String name;
    private Formatter formatter;

    public Turtle(String name, Formatter formatter) {
        this.name = name;
        this.formatter = formatter;
    }

    public void move(int x, int y) {
        formatter.format("%s the turtle is at (%d, %d)\n", name, x, y);
    }

    public static void main(String[] args) {
        PrintStream printStream = System.out;
        Turtle turtle = new Turtle("turtle", new Formatter(printStream));
        turtle.move(0, 5);
    }
}

13.5.5 Formatter转换

字符类型字符类型
d整数型e浮点型
cUnicode字符x整数(十六进制)
bBoolean值h散列码(十六进制)
sString%字符%
f浮点数

13.5.6 String.format()

是一个static方法,返回String对象,在该方法内部也是创建Formatter类对象,然后将参数传给该对象,使用起来更加方便。

13.6 正则表达式

13.6.1 基础

据 Java Language Specification 的要求,Java 源代码的字符串中的反斜线被解释为 Unicode 转义或其他字符转义。因此必须在字符串字面值中使用两个反斜线,表示正则表达式受到保护,不被 Java 字节码编译器解释。例如,当解释为正则表达式时,字符串字面值 “\b” 与单个退格字符匹配,而 “\b” 与单词边界匹配。字符串字面值 “(hello)” 是非法的,将导致编译时错误;要与字符串 (hello) 匹配,必须使用字符串字面值 “\(hello\)”。

package com.string;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 13:29
 */
public class IntegerMatch {
    public static void main(String[] args) {
        System.out.println("-1234".matches("-?\\d+"));
        System.out.println("5678".matches("-?\\d+"));
        System.out.println("+5434".matches("-?\\d+"));
        System.out.println("+909".matches("([-+])?\\d+"));
    }
}

output:

true
true
false
true

Process finished with exit code 0

split():正则分割字符串,返回一个字符数组。
replace():正则替换。

13.6.2 创建正则表达式 && 13.6.3 量词

详见 jdk: java.util.regex.Pattern

Character classes
[abc]   a, b, or c (simple class)
[^abc]  Any character except a, b, or c (negation)
[a-zA-Z]    a through z or A through Z, inclusive (range)
[a-d[m-p]]  a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]    d, e, or f (intersection)
[a-z&&[^bc]]    a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]   a through z, and not m through p: [a-lq-z](subtraction)

Predefined character classes
.   Any character (may or may not match line terminators)
\d  A digit: [0-9]
\D  A non-digit: [^0-9]
\h  A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000]
\H  A non-horizontal whitespace character: [^\h]
\s  A whitespace character: [ \t\n\x0B\f\r]
\S  A non-whitespace character: [^\s]
\v  A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029]
\V  A non-vertical whitespace character: [^\v]
\w  A word character: [a-zA-Z_0-9]
\W  A non-word character: [^\w]

Greedy quantifiers
X?  X, once or not at all
X*  X, zero or more times
X+  X, one or more times
X{n}    X, exactly n times
X{n,}   X, at least n times
X{n,m}  X, at least n but not more than m times

Reluctant quantifiers
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}?   X, exactly n times
X{n,}?  X, at least n times
X{n,m}? X, at least n but not more than m times

Possessive quantifiers
X?+ X, once or not at all
X*+ X, zero or more times
X++ X, one or more times
X{n}+   X, exactly n times
X{n,}+  X, at least n times
X{n,m}+ X, at least n but not more than m times

Logical operators
XY  X followed by Y
X|Y Either X or Y
(X) X, as a capturing group

13.6.4 Pattern和Matcher

  1. 导入java.util.regex;
  2. 使用Pattern.compile()编译正则表达式;
  3. 传入待测字符给Pattern对象的matcher()方法。

13.6.5 split()

package com.string;

import java.util.Arrays;
import java.util.regex.Pattern;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 13:55
 */
public class SplitDemo {
    public static void main(String[] args) {
        String input = "This!!unusual use!!of exclamation!!points";
        System.out.println(Arrays.toString(Pattern.compile("!!").split(input)));
        System.out.println(Arrays.toString(Pattern.compile("!!").split(input, 3)));
    }
}

output:

[This, unusual use, of exclamation, points]
[This, unusual use, of exclamation!!points]

Process finished with exit code 0

第二种给split()方法加上limit参数可以限制生成字符串的数量。

13.6.6 替换操作

  1. replaceFirst(String replacement);
  2. replaceAll(String replacement);
  3. appendReplacement(String replacement);
  4. appendTail(StringBuffer stringBuffer);

13.6.7 reset()

通过reset()方法,可将Matcher对象应用于另一个字符序列。

package com.string;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 14:15
 */
public class Reset {
    public static void main(String[] args) {
        Matcher matcher = Pattern.compile("[frb][aiu][gx]").matcher("fix the rug with bags.");
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
        System.out.println("--------");
        matcher.reset("fix the fig with rags.");
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}

output:

fix
rug
bag
--------
fix
fig
rag

Process finished with exit code 0

13.6.8 正则表达式与Java I/O

正则表达式不仅可以用于静态字符串,也可结合文件操作使用。

13.7 扫描输入

package com.string;

import java.util.Scanner;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 14:24
 */
public class BetterRead {
    public static void main(String[] args) {
        Scanner input = new Scanner(System.in, "UTF-8");
        int a = input.nextInt();
        float b = input.nextFloat();
        double c = input.nextDouble();
        String d = input.next();
    }
}

13.7.1 Scanner定界符

默认情况下,Scanner根据空白字符对输入进行分词,但是可以使用正则表达式指定所需的定界符。

package com.string;

import java.util.Scanner;

/**
 * @author zhulongkun20@163.com
 * @since 2018-06-05 14:48
 */
public class ScannerDelimiter {
    public static void main(String[] args) {
        Scanner scanner = new Scanner("12, 42, 78, 99, 42");
        scanner.useDelimiter("\\s*,\\s*");
        while (scanner.hasNextInt()) {
            System.out.println(scanner.nextInt());
        }
        System.out.println(scanner.delimiter());
    }
}

output:

12
42
78
99
42
\s*,\s*

Process finished with exit code 0

使用逗号(包括前后的空白字符)作为定界符,同样的技术可以用来读取逗号分隔的文件。
delimiter()方法可以返回当前正在使用的定界符。

13.7.2 用正则表达式扫描

除了能够扫描基本数据类型之外,可以使用自定义的正则表达式进行扫描,这在扫描复杂数据时非常有用,例如日志信息。

13.8 StringTokenizer

可以使用正则表达式和Scanner类代替,已废弃。

附录:Mathcer.group()的用法

package cn.mingyuan.regexp.singlecharacter;  

    import java.util.regex.Matcher;  
    import java.util.regex.Pattern;  

    public class GroupIndexAndStartEndIndexTest {  

    /** 
    * @param args 
    */  
    public static void main(String[] args) {  
       // TODO Auto-generated method stub  
       String str = "Hello,World! in Java.";  
       Pattern pattern = Pattern.compile("W(or)(ld!)");  
       Matcher matcher = pattern.matcher(str);  
       while(matcher.find()){  
        System.out.println("Group 0:"+matcher.group(0));//得到第0组——整个匹配  
        System.out.println("Group 1:"+matcher.group(1));//得到第一组匹配——与(or)匹配的  
        System.out.println("Group 2:"+matcher.group(2));//得到第二组匹配——与(ld!)匹配的,组也就是子表达式  
        System.out.println("Start 0:"+matcher.start(0)+" End 0:"+matcher.end(0));//总匹配的索引  
        System.out.println("Start 1:"+matcher.start(1)+" End 1:"+matcher.end(1));//第一组匹配的索引  
        System.out.println("Start 2:"+matcher.start(2)+" End 2:"+matcher.end(2));//第二组匹配的索引  
        System.out.println(str.substring(matcher.start(0),matcher.end(1)));//从总匹配开始索引到第1组匹配的结束索引之间子串——Wor  
       }  
    }  
}  

output:

Group 0:World!  
Group 1:or  
Group 2:ld!  
Start 0:6 End 0:12  
Start 1:7 End 1:9  
Start 2:9 End 2:12  
Wor 
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值