笔记_from_Java编程思想（第十三章字符串）

最新推荐文章于 2024-10-16 10:13:45 发布

20230716

最新推荐文章于 2024-10-16 10:13:45 发布

阅读量157

点赞数

分类专栏： Thinking_in_Java 文章标签： java 正则表达式 string 字符串

本文链接：https://blog.csdn.net/keepkind/article/details/118613014

版权

Thinking_in_Java 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

不可变String

String对象是不可变的。查看JDK文档发现，String类中每一个看似会修改String值的方法实际上是创建了一个全新的String对象，以包含修改后的字符串内容，而最初的String对象则丝毫未动（对于一个方法而言，参数就是为该方法提供信息的，而不是让该方法改变自己）。

重载“+”与StringBuilder

String对象是不可变的，你可以给String对象加任意多的别名。因为String对象具有只读特性，所以指向它的任何引用都不可能改变它的值，因此，也就不会对其他的引用有什么影响。
不可变型会带来一定的效率问题。为String对象重载“+”操作符就是一个例子，用于String的“+”和“+=”是Java中仅有的两个重载过的操作符，而Java不允许程序员重载任何操作符。
编译器自动引入了java.lang.String.StringBuilder类，虽然我们在源代码中并没有使用StringBuilder类，但是编译器却自作主张地使用了它，因为它更高效。
显示地创建StringBuilder还允许你预先为其指定大小，如果已知最终地字符串大概有多长，那预先指定StringBuilder的大小可以避免多次重新分配缓冲。
因此，当为一个类编写toString()方法时，如果字符串比较简单，那就可以信赖编译器，它会合理地构造最终的字符串结果。但是，如果要在toString()方法中使用循环，那么最好创建一个StringBuilder对象，用它来构造最终结果

import java.util.Random;
public class test
{
    public static Random rand= new Random(100);
    public String toString()
    {
        StringBuilder result =new StringBuilder("[");
        for(int i=0;i<5;i++)
        {
            result.append(rand.nextInt(100));
            result.append(". ");
        }
        result.delete(result.length()-2,result.length());
        result.append("]");
        return result.toString();
    }
    public static void main(String[] args) {
        test useStringBuilder= new test();
        System.out.println(useStringBuilder);
    }
}
/*
输出：
[15. 50. 74. 88. 91]
*/

StringBuilder提供了丰富而全面的方法，包括insert()、replace()、substring()甚至reverse()，但是最常用的还是append()和toString()。还有delete()方法，上面的例子中我们用它删除了最后一个句号和空格，以便添加右括号。
StringBuilder是JavaSE5引入的，在这之前Java用StringBuffer。后者是线程安全，因此开销也会大些，因此在JavaSE5/6中，字符串操作应该还会更快些。

无意识的递归调用

Java中的每个类从根本上都是继承自Object，标准容器类自然也不例外。因此容器类都有toString()方法，并且覆盖了该方法，使得它生成的String结果能够表达容器自身，以及容器所包含的对象。例如ArrayList.toString()，它会遍历ArrayList中包含的所有对象，调用每个元素上的toString()方法：

import java.util.ArrayList;
import java.util.Random;
public class test
{
    public static void main(String[] args) {
        ArrayList<Integer> Num=new ArrayList<Integer>();
        Random rand= new Random(100);
        for(int i=0;i<10;i++)
            Num.add(rand.nextInt(100));
        System.out.print(Num.toString());
    }
}
/*
输出：
[15, 50, 74, 88, 91, 66, 36, 88, 23, 13]
*/

public class test
{
    /*以下注释的toString()部分打印时，会得到一串非常长的异常（几千行）
    如果要获取对象的内存地址应该调用Object.toString()方法，
    因此不是this，而是super.toString()*/
    /*
    public String toString()
    {
        return "test's address:"+this;
    }
    */
    public String toString()
    {
        return "test's address:"+super.toString();
    }
    public static void main(String[] args) {
        test temp=new test();
        System.out.print(temp.toString());
    }
}
/*
输出：
test's address:test@8efb846
*/

String上的操作

在这里插入图片描述
从表中也可以看出，当需要改变字符串内容时，String类的方法都会返回一个新的String对象。同时，如果内容没有改变，String的方法只是返回原对象的引用而已。这可以节约存储空间以及避免额外的开销。

格式化输出

JavaSE5推出了C语言中printf()风格的格式化输出，用特殊的占位符表示数据将来的位置，这些占位符称作格式修饰符，它们不但说明了插入数据的位置，同时还说明了将插入什么类型的变量，以及如何对其格式化

public class test
{
    public static void main(String[] args) {
        int x=5;
        double y=5.123456;
        //普通的输出
        System.out.println("Row 1:["+x+" "+y+"]");
        //新方法
        System.out.format( "Row 1:[%d %f]\n", x,y);
        System.out.printf("Row 1:[%d %f]\n", x,y);  //C语言怀旧版，与上面那种是等价的
    }
}
/*
//输出：
Row 1:[5 5.123456]
Row 1:[5 5.123456]
Row 1:[5 5.123456]
*/

Formatter类

在Java中，所有新的格式化功能都是由java.util.Formatter类处理。当创建一个Formatter对象时，需要向其构造器传递一些信息，告诉它最终结果将向哪里输出

import java.io.PrintStream;
import java.text.Normalizer.Form;
import java.util.Formatter;
public class test
{
    private String name;
    private Formatter f;
    public test(String name,Formatter f)
    {
        this.name=name; this.f=f;
    }
    public void move(int x,int y)
    {
        f.format("%s The Turtle is at (%d,%d)\n",name,x,y);
    }
    public static void main(String[] args) {
        PrintStream out=System.out;
        test Tom= new test("Tom",new Formatter(System.out));
        test Jerry= new test("Jerry",new Formatter(out));
        Tom.move(0,0);
        Jerry.move(4,8);
        Tom.move(3,4);
        Jerry.move(2,5);
    }
}
/*
输出：
Tom The Turtle is at (0,0)
Jerry The Turtle is at (4,8)
Tom The Turtle is at (3,4)
Jerry The Turtle is at (2,5)
*/

所有的Tom都将输出到System.out，而所有的Jerry则都输出到System.out的一个别名中。Formatter的构造器经过重载可以接受多种输出目的地，不过最常用的还是PrintStream()、OutPutStream和File。

格式化说明符

在插入数据时，如果想要控制空格与对齐，需要更精细复杂的格式修饰符。
最常见的应用是控制一个域的最小尺寸，这可以通过指定的width来实现。Formatter对象通过在必要时添加空格，来确保一个域至少达到某个长度。默认情况下，数据右对齐，可通过使用“-”来改变对其方式。
与width相对的是precision，它用来指明最大尺寸。Width可以应用与各种类型的数据转换，并且行为方式都一样。Precision则不然，precision用于String时，表示最大输出字符数量，用于浮点数时表示小数部分显示的位数（默认为6位），太少则尾部补0。用于整数则触发异常。
在这里插入图片描述

正则表达式

基础：在正则表达式中，用\d表示一位数字。Java中对反斜杠有不同处理。在其它语言中，\表示“我想要在正则表达式中插入一个普通的反斜杠”。而在Java中，\意为“我要插入一个正则表达式的反斜杠，所以其后的字符具有特殊意义”。例如：-?\d+ 表示“可能有一个负号，后面跟着一位或多位数字（+表示）”

public class test
{
    public static void main(String[] args) {
        System.out.println("-1234".matches("-?\\d+"));
        System.out.println("5678".matches("-?\\d+"));
        System.out.println("+911".matches("-?\\d+"));
        System.out.println("+911".matches("(-|\\+)?\\d+"));
    }
}
/*
输出：
true
true
false
true
*/

在正则表达式中，括号有着将表达式分组的效果，而竖直线|则表示或操作。(-|\+)? 表示起始字符可能是一个-或+或二者都没有（因为后面跟着?修饰符）。因为字符+在正则表达式中有特殊意义，所以必须使用\将其转义，使之成为表达式中的一个普通字符。
String自带split()方法，功能：将字符串从正则表达式匹配的地方切开。

import java.util.Arrays;

public class test
{
    public static String knights="Then, when you have found "+
    "the shrubby, you must cut down the mightiest tree in "+
    "the forest... with...a herring!"; 
    public static void split(String regex)
    {
        System.out.println(Arrays.toString(knights.split(regex)));
    }
    public static void main(String[] args) {
        split(" "); //只按空格划分字符串
        split("\\W+");  //非单词字符，将标点字符删除
        split("n\\W+"); //字母n后面跟着一个或多个非单词字符
    }
}
/*
输出：
[Then,, when, you, have, found, the, shrubby,, you, must, cut, down, the, mightiest, tree, in, the, forest..., with...a, herring!]
[Then, when, you, have, found, the, shrubby, you, must, cut, down, the, mightiest, tree, in, the, forest, with, a, herring]
[The, whe, you have found the shrubby, you must cut dow, the mightiest tree i, the forest... with...a herring!]
*/

String.split()还有一个重载版本，允许限制字符串分割次数。
String自带最后一个正则表达式工具是“替换”。可以只替换第一个匹配到的或所有匹配的。

public class test
{
    public static String knights="Then, when you have found "+
    "the shrubby, you must cut down the mightiest tree in "+
    "the forest... with...a herring!"; 
    public static void main(String[] args) {
        //以字母f开头，后面跟一个或多个字母，且只替换第一个匹配部分
        System.out.println(knights.replaceFirst("f\\w+", "located")); 
        //匹配三个单词中的任意一个，以竖线分隔表示“或”，并且替换所有匹配部分
        System.out.println(knights.replaceAll("shrubby|tree|herring", "banana"));
    }
}
/*
输出：
Then, when you have located the shrubby, you must cut down the mightiest tree in the forest... with...a herring!
Then, when you have found the banana, you must cut down the mightiest banana in the forest... with...a banana!
*/

在这里插入图片描述

量词

1、贪婪型：量词总是贪婪的，除非有其他的选项被设置。贪婪表达式会为所有可能的模式发现尽可能多的匹配。
2、勉强型：用问号指定，这个量词匹配满足模式所需的最少字符数。因此也叫懒惰的，最少匹配的，非贪婪的，不贪婪的。
3、占有型：目前只有Java中才可用。当正则表达式被应用于字符串时，它会产生相当多的状态，以便在匹配失败时可以回溯。而“占有的”量词并不保存这些中间状态，因此可以防止回溯。它们常常用于防止正则表达式失控，因此可以使正则表达式执行起来更有效。
在这里插入图片描述

Pattern和Matcher

构造功能强大的正则表达式对象只需导入java.util.regex，然后用static Pattern.compile()来编译所需正则表达式。

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class test
{
    public static void main(String[] args) {
        Matcher m= Pattern.compile("\\w+").matcher("Evening is full of the linnet's wings");
        while(m.find()) System.out.print(m.group()+" ");
        System.out.println();
        int i=0;
        //以一个整数为参数，该整数表示字符串中字符的位置，并以其为搜索起点
        while(m.find(i))
        {
            System.out.print(m.group()+" ");    i++;
        }
    }
}
/*
输出：
Evening is full of the linnet s wings 
Evening vening ening ning ing ng g is is s full full ull ll l of of f the the he e linnet linnet innet nnet net et t s s wings wings ings ngs gs s 
*/

public int groupCount() 返回该匹配器的模式中分组数目，第0组不包括。
public String group(int i) 返回前一次匹配操作期间指定的组号，如果匹配成功，但是指定的组没有匹配输入自符串的任何部分，则将返回null。
public int start(int group) 返回前一次匹配操作中寻找到的组的起始索引。
public int end(int group) 返回前一次匹配操作中寻找到的组的最后一个字符索引加1的值。

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class test
{
    public static void main(String[] args) {
        //可以通过“或”(|)操作符组合多个标记的功能
        Pattern p= Pattern.compile("^java",Pattern.CASE_INSENSITIVE|Pattern.MULTILINE);
        Matcher m= p.matcher("java has regex\nJava has regex\n"+
        "JAVA has pretty good regular expression;\n"+
        "Regular expressions are in Java");
        while(m.find()) System.out.println(m.group());
    }
}
/*
输出：
java
Java
JAVA
*/

注意：group()方法只返回已匹配部分

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class test
{
    public static void main(String[] args) {
        Matcher m= Pattern.compile("[frb][aiu][gx]").matcher("fix the rug with bags");
        while(m.find()) System.out.print(m.group()+" ");
        System.out.println();
        m.reset("fix the rig with rags");
        while(m.find()) System.out.print(m.group()+" ");
    }
}
/*
输出：
fix rug bag 
fix rig rag
*/

扫描输入

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class test
{
    /*StringReader将String转化为可读对象，然后用这个对象构造BufferReader对象
    因为我们需要使用BufferReader的nextLine()方法。最终，我们可以使用input对象一次读取一行文本，
    就像是从控制台读入标准输入一样。
    */
    public static BufferedReader input=new BufferedReader(
        new StringReader("Sir Robin of Camelot\n22 1.61830")
    );
    public static void main(String[] args) {
       try
       {
            System.out.println("What is your name?");
            String name= input.readLine();
            System.out.println("How old are you? What is your favorite double?");
            System.out.println("(input:<age> <double>)");
            String numbers=input.readLine();
            System.out.println(numbers);
            String[] numArray= numbers.split(" ");
            int age= Integer.parseInt(numArray[0]);
            double favorite=Double.parseDouble(numArray[1]);
            System.out.format("Hi %s.\n",name);
            System.out.format("In 5 years you will be %d.\n",age+5);
            System.out.format("My favorite double is %f.",favorite/2);
       }
       catch(IOException e){ System.out.println("I/O exception");}
    }
}
/*
输出：
What is your name?
How old are you? What is your favorite double?
(input:<age> <double>)
22 1.61830
Hi Sir Robin of Camelot.
In 5 years you will be 27.
My favorite double is 0.809150.
*/

如果两个输入值在同一行，事情就变得棘手了，我们需要分解此行，在这里使用numArray分解。JavaSE5后，可以使用Scanner类：

import java.io.BufferedReader;
import java.io.StringReader;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class test
{
    /*StringReader将String转化为可读对象，然后用这个对象构造BufferReader对象
    因为我们需要使用BufferReader的nextLine()方法。最终，我们可以使用input对象一次读取一行文本，
    就像是从控制台读入标准输入一样。
    */
    public static BufferedReader input=new BufferedReader(
        new StringReader("Sir Robin of Camelot\n22 1.61830")
    );
    public static void main(String[] args) {
        Scanner stdin =new Scanner(input);
        System.out.println("What is your name?");
        String name= stdin.nextLine();
        System.out.println(name);
        System.out.println("How old are you? What is your favorite double?");
        System.out.println("(input:<age> <double>)");
        int age=stdin.nextInt();
        double favorite=stdin.nextDouble();
        System.out.println(age);
        System.out.println(favorite);
        System.out.format("Hi %s.\n",name);
        System.out.format("In 5 years you will be %d.\n",age+5);
        System.out.format("My favorite double is %f.",favorite/2);
       stdin.close();
    }
}
/*
输出：
What is your name?
Sir Robin of Camelot
How old are you? What is your favorite double?
(input:<age> <double>)
22
1.6183
Hi Sir Robin of Camelot.
In 5 years you will be 27.
My favorite double is 0.809150.
*/

Scanner会在输出结束时会抛出IOException，所以Scanner会把IOException吞掉。不过通过ioException()方法可以找到最近的异常。
Scanner定界符
Scanner根据空白字符对输入进行分词，但是可以根据正则表达式指点自己所需定界符：

import java.util.Scanner;

public class test
{
    public static void main(String[] args) {
        Scanner scanner= new Scanner("12,42,78,99,42");
        scanner.useDelimiter("\\s*,\\s*");
        while(scanner.hasNextInt()) System.out.println(scanner.nextInt());
        scanner.close();
    }
}
/*
输出：
12
42
78
99
42
*/

这里使用了逗号作为分隔符（包括逗号前后任意的空白字符）。
除了能够扫描基本类型外，还可以自定义正则表达式进行扫描。下面的例子为防火墙日志文件中记录的威胁数据。

import java.util.Scanner;
import java.util.regex.MatchResult;

public class test
{
    static String threatData=
    "12.23.34.45@01/01/2001\n"+
    "23.34.45.56@02/01/2002\n"+
    "34.45.56.67@03/02/2003\n"+
    "45.56.67.78@04/04/2003\n"+
    "56.67.78.789@12/12/2009\n"+
    "[Next log section with different data format]";
    public static void main(String[] args) {
        Scanner scanner=new Scanner(threatData);
        String pattern="(\\d+[.]\\d+[.]\\d+[.]\\d+)@"+
        "(\\d{2}/\\d{2}/\\d{4})";
        while(scanner.hasNext(pattern))
        {
            scanner.next(pattern);
            MatchResult match=scanner.match();
            String ip= match.group(1);
            String data=match.group(2);
            System.out.printf("Threat on %s from %s\n",data,ip);
        }
        scanner.close();
    }
}
/*
输出：
Threat on 01/01/2001 from 12.23.34.45
Threat on 02/01/2002 from 23.34.45.56
Threat on 03/02/2003 from 34.45.56.67
Threat on 04/04/2003 from 45.56.67.78
Threat on 12/12/2009 from 56.67.78.789
*/