Java 中正则表达式的经典用法总结——捕获组

最新推荐文章于 2023-11-05 01:21:16 发布

Casionx

最新推荐文章于 2023-11-05 01:21:16 发布

阅读量1.8k

点赞数 3

分类专栏： java 文章标签：正则 java

本文链接：https://blog.csdn.net/summerxiachen/article/details/79780002

版权

java 专栏收录该内容

13 篇文章

订阅专栏

正则表达式常用功能：匹配、切割、替换、获取(从字符串中提取指定格式字符)

【String类下的正则的使用】

String 类中有几个常用的方法，会涉及到正则表达式。如下：

//根据正则表达式regex判断是否匹配，匹配为true 否则false
boolean  matches(String regex)

//将满足正则表达式的地方，替换为指定的字符replacement。
String   replaceAll(String regex, String replacement)

//将满足正则表达式的地方作为切分点，切分为字符数组
String[] split(String regex)

上面三个方法涉及到了匹配、切割、替换。String功能有限，所以有了正则表达式对象Pattern和Matcher，用来提供更加强大的功能。

【通过正则对象来使用】
步骤：

1. 将正则封装成对象
    Pattern p = Pattern.compile(regex);
    p.split(str)//切割
2. 通过正则对象获取匹配器对象 
    Matcher m = p.matcher(str)
3. 使用Matcher对象的方法对字符串进行操作
    //匹配判断 Attempts to match the entire region against the pattern.
    m.matches();

    //替换Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
    m.replaceAll(String replacement)

    //获取 Attempts to find the next subsequence of the input sequence that matches the pattern.
    m.find();

一般情况下，除了“获取”功能需求，其他基本使用String类的函数方法就可以了。

我们看一下具体的含义：

//匹配判断 Attempts to match the entire region against the pattern.
boolean m.matches();

//替换Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
String m.replaceAll(String replacement)

//获取 Attempts to find the next subsequence of the input sequence that matches the pattern.
boolean m.find();

匹配过程中遇到失败便会终止，直接返回fasle
切割，替换是对整个字符串操作，字符串中所有满足正则条件的地方都会进行相应操作。也就是一次调用，作用于整个字串。
获取就不一样了，每调用一次，判断是否有下一个匹配的子串，返回值是boolbean类型所以一般是结合while循环使用，先用 boolean m.find()判断是否有下一个匹配的子串，再使用String m.group(),来获取匹配的子串。

【正则表达式常用构造摘要】

1.字符类

[abc] a、b 或 c（简单类）
[^abc] 任何字符，除了 a、b 或 c（否定）
[a-zA-Z] a 到 z 或 A 到 Z，两头的字母包括在内（范围）

2. 预定义字符类

. 任何字符（与行结束符可能匹配也可能不匹配）
\d 数字：[0-9]
\D 非数字： [^0-9]
\s 空白字符：[ \t\n\x0B\f\r]
\S 非空白字符：[^\s]
\w 单词字符：[a-zA-Z_0-9]
\W 非单词字符：[^\w]

3. Greedy 数量词

X?       X，一次或一次也没有
X*       X，零次或多次
X+      X，一次或多次
X{n}     X，恰好 n 次
X{n,}    X，至少 n 次
X{n,m} X，至少 n 次，但是不超过 m 次

4. 边界匹配器

^ 行的开头
$ 行的结尾
\b 单词边界
\B 非单词边界
\A 输入的开头
\G 上一个匹配的结尾
\Z 输入的结尾，仅用于最后的结束符（如果有的话）
\z 输入的结尾

【正则中的捕获组】
正则式中可以使用( )括号将多个元素封装成组，封装成的组可以看做是一个大元素，这样就可以使用数量词进行处理。

[ATCG]+ //[ATCG]表示匹配ATCG四个中的任意一个，[ATCG]+表示0个或多个,也就是匹配只含有A、T、G、C的字串如ATGGCTAGCGATG

(ATGC)+表示ATGC整体出现0次或多次 如ATGCATGCATGC....

在表达式 ((A)(B(C))) 中，存在四个这样的组：

1 ((A)(B(C)))
2 (A)
3 (B(C))
4 (C)

获取过程中可以通过传入m.group(int num)组号，获取相应组的匹配结果。

public class B {
    public static void main(String[] args) {
        String pattern = "((A)(B(C)))D";
        // 创建 Pattern 对象
        Pattern r = Pattern.compile(pattern);
        // 现在创建 matcher 对象
        Matcher m = r.matcher("ABCDEABCDF");
        while(m.find()) {
            System.out.println("group(0) : " + m.group(0));// 匹配结果
            System.out.println("group(1) : " + m.group(1));// 第一个括号((A)(B(C))所匹配的内容
            System.out.println("group(2) : " + m.group(2));// 第二个括号(A)所匹配的内容
            System.out.println("group(3) : " + m.group(3));// 第三个括号(B(C))所匹配的内容
            System.out.println("group(4) : " + m.group(4));// 第三个括号(C)所匹配的内容
            System.out.println("---------------");
        } 
    }

输出结果如下

group(0) : ABCD
group(1) : ABC
group(2) : A
group(3) : BC
group(4) : C
---------------
group(0) : ABCD
group(1) : ABC
group(2) : A
group(3) : BC
group(4) : C
---------------

另外(Ax)\\1+(BM)\\2+，\\1+表示与第1组(Ax)相同的出现1次以上，与第二组(BM)相同的也出现1次以上

public static void main(String[] args) {
        String pattern = "(Ax)\\1+(BM)\\2+";
        //String pattern = "(Ax)(BM)\\1+\\2+";//这样无法匹配
        //String pattern = "(Ax)(BM)\\1+\\2{1,}";//这样也无法匹配
        // 创建 Pattern 对象
        Pattern r = Pattern.compile(pattern);
        // 现在创建 matcher 对象
        Matcher m = r.matcher("AxAxBMBMBMBM");
        while(m.find()) {
            System.out.println("group(0) : " + m.group(0));// 匹配结果
            System.out.println("group(1) : " + m.group(1));// 第一个括号(Ax)所匹配的内容
            System.out.println("group(2) : " + m.group(2));// 第二个括号(BM)所匹配的内容
            System.out.println("---------------");
        } 
    }

输出结果

group(0) : AxAxBMBMBMBM
group(1) : Ax
group(2) : BM
---------------=

word文档中也可以使用这也通配符来进行高级替换，很实用的技巧
详细见另一篇博文WPS以及Office 下 word 文档，使用通配符进行高级替换

在使用split()时，可能会使用到$num,
如 str = str.replaceAll("(.)\\1+","$1");$1表示前一个参数的第一组

public static void main(String[] args) {
        String pattern = "(.)\\1+(BB)\\2+";

        // 创建 Pattern 对象
        Pattern r = Pattern.compile(pattern);
        // 现在创建 matcher 对象
        Matcher m = r.matcher("...BBBBCC..BBBBCC");
        String res=m.replaceAll("$1$2");//替换为.BB
        System.out.println(res);
    }

输出结果

.BBCC.BBCC

【例子1】

/*我我...我我...我我我我...要要要要...要要要要...
学学学学学...学学编编...编编编编..编..编...程程
...程程程——>我要学编程*/
    public class RegexTest
    {
            public static void main(String[] args){
                    test();
            }

            /*
             * 1. 治疗口吃：我我...我我...我我我我...要要要要...要要要要...学学学学学...学学编编...编编编编..编..编...程程...程程程
             */

             /*
              * 1. 治口吃
              */
              public static void test(){
                    String str = "我我...我我...我我我我...要要要要...要要要要...学学学学学...学学编编...编编编编..编..编...程程...程程程";

                    //1. 将字符串中.去掉，用替换。
                    str = str.replaceAll("\\.+","");

                    //2. 替换叠词
                    str = str.replaceAll("(.)\\1+","$1");
                    System.out.println(str);
              }
    }

【例子2】

import java.util.TreeSet;
import java.io.PrintStream;

public class RegexTest
{
        public static void main(String[] args){
                test();
        }

        /*
         * ip地址排序。
         * 192.168.10.34 127.0.0.1 3.3.3.3 105.70.11.55
         */

        public static void test(){
                String ip_str = "192.168.10.34 127.0.0.1 3.3.3.3 105.70.11.55";

                //1. 为了让ip可以按照字符串顺序比较，只要让ip的每一段的位数相同。
                //所以，补零，按照每一位所需最多0进行补充，每一段都加两个0。

                ip_str = ip_str.replaceAll("(\\d+)","00$1");
                System.out.println(ip_str);

                //然后每一段保留数字3位。
                ip_str = ip_str.replaceAll("0*(\\d{3})","$1");
                System.out.println(ip_str);

                //1. 将ip地址切出。
                String[] ips = ip_str.split(" +");

                TreeSet<String> ts = new TreeSet<String>();

                for(String ip : ips){
                        ts.add(ip);
                }

                for(String ip : ts){
                        System.out.println(ip.replaceAll("0*(\\d+)","$1"));
                }
        }
}