正则表达式

最新推荐文章于 2024-05-05 11:36:24 发布

can_chen

最新推荐文章于 2024-05-05 11:36:24 发布

阅读量1.4k

点赞数 1

分类专栏： java基础文章标签：正则表达式

本文链接：https://blog.csdn.net/can_chen/article/details/121057701

版权

java基础专栏收录该内容

11 篇文章 1 订阅

订阅专栏

文章目录

菜鸟教程正则表达式： https://www.runoob.com/regexp/regexp-tutorial.html
Java正则表达式： https://www.runoob.com/java/java-regular-expressions.html

一、整体匹配和部分匹配

部分匹配一般用于提取字符串中符合规则的子串，而整体匹配一般用于验证某个字符串是否符合规则；整体匹配默认是从字符串的开头开始匹配，也就是相当于给正则表达式加了^和$两个定位符。

部分匹配举例：

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "https://www.baidu.com123";
        // 如果给正则表达式加上^$定位符，那么将匹配不到字符串
        // String regStr = "^(http|https)://([a-zA-Z]+\\.)+([a-zA-Z]+)+$";
        String regStr = "(http|https)://([a-zA-Z]+\\.)+([a-zA-Z]+)+";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println(matcher.group(0)); // 运行结果为：https://www.baidu.com
        }
    }
}

整体匹配举例：

public class RegExp_Test {
    public static void main(String[] args) {
        // 如果字符串改为：https://www.baidu.com，那么匹配结果为true
        // String content = "https://www.baidu.com";
        String content = "https://www.baidu.com123";
        String regStr = "(http|https)://([a-zA-Z]+\\.)+([a-zA-Z]+)+";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        System.out.println(matcher.matches()); // 运行结果为false
    }
}

注意：String类有一个matches方法，也是属于整体匹配，常用于验证某个字符串是否符合规则

二、贪婪匹配和非贪婪匹配

在正则表达式中，默认是贪婪匹配，也就是说会尽可能的匹配更长的字符串，例如以下例子：我们匹配的规则是连续的三位数至六位数，结果贪婪匹配给我们匹配了一位六位数，这就是贪婪匹配的思想

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "112233哈哈哈";
        String regStr = "\\d{3,6}";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println(matcher.group(0)); //112233
        }
    }
}

非贪婪匹配跟贪婪匹配相反，就是尽可能少的匹配串，在正则表达式限定符后面紧接一个?就可以实现非贪婪匹配，例如：

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "112233哈哈哈";
        String regStr = "\\d{3,6}?";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println(matcher.group(0)); // 程序输出两行，分别是112 233
        }
    }
}

以下例子中，限定符+紧接了一个?，实现非贪婪匹配，如果没有?，那么匹配的结果是：Aa1_

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "Aa1_啊啊";
        String regStr = "\\w+?";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println(matcher.group(0)); // 程序输出四行，分别是A、a、1、_
        }
    }
}

三、分组捕获反向引用

分组捕获和反向引用通常配合使用，反向引用可以在正则表达式内部被引用，也可以在正则表达式外部被引用，在内部被引用使用"\\分组号"，在外部被引用使用“$分组号”，例如\\1代表反向引用第一个分组，\\2代表反向引用第二个分组，以此类推。

注：使用小括号进行分组，例如(\\d\\d)

分组捕获举例：匹配字符串中连续的四位数，并且连着的两位为一组

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "我爱Java 123456";
        String regStr = "(\\d\\d)(\\d\\d)";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println("整体：" + matcher.group(0));
            System.out.println("第一组：" + matcher.group(1));
            System.out.println("第二组：" + matcher.group(2));
        }
    }
}

程序运行输出结果为：

整体：1234
第一组：12
第二组：34

反向引用举例：

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "1122 1234 1221 1332 13133-111222333";
        // 需求一：找出连续相同的两位数，例如11 22 33
        //String regStr = "(\\d)\\1";
        // 需求二：找出连续相同的三位数，例如111 222
        //String regStr = "(\\d)\\1{2}";
        // 需求三：找出一个四位数，满足第一位和第四位相同，第二位和第三位相同，例如1221
        // String regStr = "(\\d)(\\d)\\2\\1";
        // 需求四：找出满足前面是一个五位数，然后是一个-，然后是一个九位数，连着的每三位要相同，例如13133-111222333
        String regStr = "\\d{5}-(\\d)\\1{2}(\\d)\\2{2}(\\d)\\3{2}";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println(matcher.group(0));
        }
    }
}

使用分组、捕获、反向引用去除语句中重复的字符，并且把语句中的.去掉，例如：我我我…爱.爱中国国国，改成：我爱中国

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "我我我...爱.爱中国国国";
        String regStr = "\\.";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        content = matcher.replaceAll("");
        content = Pattern.compile("(.)\\1+").matcher(content).replaceAll("$1");
        System.out.println(content);
    }
}

四、String类关于正则表达式的应用

1. 字符串替换 replace、replaceAll、replaceFirst三者的区别
    replace是全部替换，不支持正则表达式
    replaceAll也是全部替换，支持正则表达式
    replaceFirst是部分替换，支持正则表达式

2. 注意：-写在[]里面，如果产生歧义，需要加上转义符，因为-在[]里面有连续的意思，例如[a-z]
        [#~\\-]这种情况是需要加转义符的
        [\\w-]这种情况可以不加

public class RegExp_Test {
    public static void main(String[] args) {
        // 需求一：将字符串中的JDK1.3和JDK1.4替换成JDK，正则表达式也可以写成"JDK1.3|JDK1.4"
        String content1 = "JDK1.3和JDK1.4都是属于JDK";
        String s1 = content1.replaceAll("JDK1.[34]", "JDK");
        System.out.println(s1);
        // 需求二：验证字符串是否符合规则：以138或者139开头的11位数
        String content2 = "138123456789";
        boolean flag = content2.matches("(138|139)\\d{9}"); // 整体匹配
        System.out.println(flag);
        // 需求三：按照#或者-或者~或者数字切割字符串，正则表达式也可以写成“#|-|~|\d+”
        String content3 = "hello#mar~ry-哈哈11我是";
        String[] split = content3.split("[#~\\-]|\\d+");
        System.out.println(Arrays.toString(split));
    }
}

五、匹配任意字符：[\s\S]与.

在正则表达式中，匹配任意字符我们通常用".“来实现，但是”."是匹配除了换行符之外的任意字符，如果我们希望能够匹配包括换行符在内的所有字符，那么可以使用表达式：[\\s\\S]，\\s指的是匹配所有空白字符，\\S指的是匹配所有非空白字符，所有空白字符+所有非空白字符，其实就等同于所有字符。
使用举例：

public class RegExp_Test {
    public static void main(String[] args) {
        String content1 = "abc123\n111";
        String regStr1 = "[\\s\\S]*";
        System.out.println(content1.matches(regStr1)); // true
        
        String content2 = "abc123\n111";
        String regStr2 = ".*";
        System.out.println(content2.matches(regStr2)); // false
    }
}

六、其它正则表达式案例

验证电子邮件格式是否合法

格式要求：
2. 只能有一个@
3. @前面是用户名，可以是a-z A-Z 0-9 _ -等字符
4. @后面是域名，并且域名只能是英文字母或者数字，比如qq.com，163.com，tsinghua.org.cn

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "2416871211@163.com";
        String regStr = "[\\w-]+@{1}([a-zA-Z0-9]+\\.)+[a-zA-Z0-9]+"; 
        System.out.println(content.matches(regStr)); // String类的matches是整体匹配
    }
}

验证一个字符串是不是整数或者小数，注意考虑正数和负数

例如：123,-345,34.89,-87.9,-0.01,0.45等

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "-0.45";
        // 整体匹配加不加定位符^$效果都是一样的
        // String regStr = "^[-+]?([1-9]\\d*|0)(\\.\\d+)?$";
        String regStr = "[-+]?([1-9]\\d*|0)(\\.\\d+)?";
        System.out.println(content.matches(regStr));
    }
}

对一个url进行解析，解析出协议、域名、端口、文件名

例如：http://www.baidu.com:8080/abc/index.html
	 协议：http
	 域名：www.baidu.com
	 端口号：8080
	 文件名：index.html

public class RegExp_Test {
    public static void main(String[] args) {
        // 需要用到分组捕获的方法
        String content = "http://www.baidu.com:8080/abc/index.html";
        String regStr = "([a-zA-Z]+)://([a-zA-Z\\.]+):(\\d+)[\\w-/]*/([\\w\\.]+)";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        // 先整体匹配，验证是否成功，成功再获取分组内容
        if (matcher.matches()) {
            System.out.println("匹配成功");
            System.out.println("url为：" + matcher.group(0));
            System.out.println("协议为：" + matcher.group(1));
            System.out.println("域名为：" + matcher.group(2));
            System.out.println("端口号为：" + matcher.group(3));
            System.out.println("文件名为：" + matcher.group(4));
        } else {
            System.out.println("匹配失败");
        }
    }
}

程序运行结果为：

匹配成功
url为：http://www.baidu.com:8080/abc/index.html
协议为：http
域名为：www.baidu.com
端口号为：8080
文件名为：index.html

处理字符串中引号内容

提取字符串中带引号部分内容（包括单引号和双引号）

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "select * from student where name = 'cc' and city like '%深圳%' and sex = \"男的\" and age = ''";
        String regStr = "('|\")[^'\"]*('|\")";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            System.out.println(matcher.group(0));
        }
    }
}

程序运行结果为：

'cc'
'%深圳%'
"男的"
''

只提取字符串中带引号部分的内容，也就是说提取出来的字符串不带引号
（这时候分组捕获就派上用场了）

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "select * from student where name = 'cc' and city like '%深圳%' and sex = \"男的\" and age = ''";
        // 分组用()包裹
        String regStr = "('|\")([^'"]*)('|\")";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
        	// matcher.group(0)返回的是满足条件的整个子串，也就是带引号的
            System.out.println(matcher.group(2));
        }
    }
}

程序运行结果为：

cc
%深圳%
男的

将字符串中带引号部分内容直接替换成固定字符串“uuid”，实现一：

public class RegExp_Test {
    public static void main(String[] args) {
        String content = "select * from student where name = 'cc' and city like '%深圳%' and sex = \"男的\" and age = ''";
        String regStr = "('|\")[^'\"]*('|\")";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        content = matcher.replaceAll("uuid");
        System.out.println(content);
    }
}

程序运行结果为：

select * from student where name = uuid and city like uuid and sex = uuid and age = uuid

将字符串中带引号部分内容直接替换成固定字符串“uuid”，实现二：

public class RegExp_Test {
    public static void main(String[] args) {
        // String s = "select * from test where name = ccj and age = 18";
        String content = "select * from test where name = 'ccj' and age = \"18\"";
        String regStr = "[\\s\\S]*((\'|\").*(\'|\"))[\\s\\S]*";
        while (true) {
            if (content.matches(regStr)) {
                // 说明字符串是带有引号的
                // CASE_INSENSITIVE：忽略大小写
                Pattern pattern = Pattern.compile(regStr, Pattern.CASE_INSENSITIVE);
                Matcher matcher = pattern.matcher(content);
                if (matcher.find()) {
                    content = content.replaceFirst(matcher.group(1), "uuid");
                }
            } else {
                break;
            }
        }
        System.out.println(content);
    }
}

将字符串中每个引号部分的内容都替换成一个随机生成的字符串uuid（UUID.randomUUID().toString()），并且将被替换的引号内容以及替换它的uuid记录到一个Map集合中，方便后续替换回来。

public class RegExp_Test {
    public static void main(String[] args) {
        List<String> res = new ArrayList<>();
        Map<String, String> map = new HashMap<>();
        String content = "select * from student where name = 'cc' and city like '%深圳%' and sex = \"男的\" and age = ''";
        String regStr = "('|\")[^'\"]*('|\")";
        Pattern pattern = Pattern.compile(regStr);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            res.add(matcher.group(0));
        }
        for (String str : res) {
            String uuid = UUID.randomUUID().toString();
            map.put(str, uuid);
            content = content.replaceFirst(str, uuid);
        }
        System.out.println(content);
        for (String key : map.keySet()) {
            System.out.println(key + "," + map.get(key));
        }
    }
}

程序运行结果为：

select * from student where name = 883e44a2-2040-4bcf-9ae7-02daf5b0b46f and city like 4fb6d622-f75b-4fef-b211-40a9947e486f and sex = b23be302-205d-45d4-879c-7882edc85f0b and age = 1bb2b213-fc33-421a-ae6a-a6f81ea536dc
'',1bb2b213-fc33-421a-ae6a-a6f81ea536dc
'cc',883e44a2-2040-4bcf-9ae7-02daf5b0b46f
'%深圳%',4fb6d622-f75b-4fef-b211-40a9947e486f
"男的",b23be302-205d-45d4-879c-7882edc85f0b