正则表达式

空壳_

已于 2024-02-01 00:54:13 修改

阅读量826

点赞数 24

文章标签：正则表达式 java

于 2023-12-07 13:06:10 首次发布

本文链接：https://blog.csdn.net/qq_61848060/article/details/134852869

版权

更多内容欢迎访问个人博客 https://kongke7.github.io/

正则表达式

一、语法与元字符

1. 基本语法与元字符

基本语法与元字符
\\s：匹配空白字符
\\S：与\\s取反，匹配任意非空白字符
.：匹配除换行符外的所有字符（Java中换行符为 \n）
(?i) 忽略大小写
- a(?i)bc 对bc忽略大小写
- a((?i)b)c 对b忽略大小写

  // 在Java中 加上Pattern.CASE_INSENSITIVE 也表示忽略大小写
  Pattern pattern = Pattern.compile(regex , Pattern.CASE_INSENSITIVE);

| ：选择匹配符
- ab|cd：匹配 ab 或者 cd

2. 限定符

用于指定其之前的字符或组合连续出现的次数

yufa3

yufa4

3. 定位符

规定字符在字符串中的位置

\\b：这里边界指一个单词的结尾若字符串中有空格，则空格作为分隔符分隔单词
\\B：指每个单词的开头或中间

4. 捕获分组

(pattern)

  String regex = "([0-9]{2})(\\d)(\\d)";

(?<name>pattern)

String regex = "(?<g1>[0-9]{2})(?<g2>\\d)(?<g3>\\d)";

5.非捕获分组

fz2

(?:pattern)

  String str = "jack=10 bob=19 tom=10";
  
  String regex = "\\w*=(?:10)";
  //得到 jack=10 tom=10

(?=pattern)

String regex = "\\w*=(?=10)";
//得到 jack= tom=

(?!pattern)

String regex = "\\w*=(?!10)";
//得到 bob=

6.反向引用

内部：在正则表达式中引用

外部：在其他方法中引用

\\n 内部反向引用
- 表示正则式中第n组匹配的值
- **n ** 表示一个捕获分组，在group中的组序号

String str = "12312-111222333";
// 匹配如 15237-333444555 的编号
String regex = "\\d{5}-(\\d)\\1{2}(\\d)\\2{2}(\\d)\\3{2}";

$n 外部反向引用

  Pattern pattern1 = Pattern.compile("(.)\\1+");
  Matcher matcher1 = pattern1.matcher(s);
  // 使用外部反向引用，$1 表示正则式中的第一个分组捕获的值
  String res = matcher1.replaceAll("$1");

7. 贪婪匹配与懒惰匹配

正则表达式中默认为贪婪匹配

懒惰匹配 -> 尽可能少的匹配

默认为贪婪匹配 -> 尽可能多的匹配

lazzy

String str = "asd123123ds";
// 默认贪婪匹配
// String regex = "\\d+";
// 得到123123

// 懒惰匹配
// tring regex = "\\d+?";
// 得到 1 2 3 1 2 3

// String regex = "<\\w.+>";
// 得到 <b name=123/><b name=321/>
String str1 = "<b name=123/><b name=321/>";
// 懒惰匹配
String regex = "<\\w.+?>";
// 得到 <b name=123/>和<b name=321/>

三、Java中常用的三个类

1. Pattern

matches(regex , url) 整体匹配，只能返回Boolean
```
boolean isMatch = Pattern.matches(regex, url);
```

compile(regex) 返回正则表达式对象能用于生成匹配器对象传入匹配器的字符串中只要有符合的就能匹配，能输出结果值

Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);

while (matcher.find()){
    System.out.println(matcher.group(0));
}

2. Matcher

matcher 是一个匹配对象

matcher

matcher2

3. String

replaceAll(regex , s)

String str = "jdk1.3dadasdasdjdk1.4asdas3423dfsjdk1.5";
String res = str.replaceAll("jdk\\d+\\.\\d+", "JDK");

matches(regex) 整体匹配

split(regex)

String str2 = "AAA#CCC&AAA~CCC12GGG";
//用#~&数字分割字符串
str2.split("[#&~]|\\d+");

二、实际应用

1. 判断汉字

注意汉字的编码范围在 \u0391-\uffe5 ，\u4e00-\u9fa5

String str = "你好你好你好";
String regex = "^[\u4e00-\u9fa5]+$";

2. 判断邮编

以1开头的六位数字

String str = "110203";
String regex = "^[1-9]\\d{5}$";

3. 判断URL

^((http)s?://)?([\\w-]+\\.)+[a-zA-Z0-9]+((/[\\w-#]+)+\\?([\\w-]+=[\\w-]+&?)*)?$

``^((http)s?😕/)?`
- s 可有可无
- https:// 可有可无
([\\w-]+\\.)+
- 可能存在多级域名
  - abc.dc.aaa.com.cn
[a-zA-Z0-9]+
- 域名结尾
((/[\\w-#]+)+\\?([\\w-]+=[\\w-]+&?)*)?$
- (/[\\w-#]+)+
  - 可能存在所层路径
    - /video/aa/bb
- \\?
  - 路径结尾带有参数
- ([\\w-]+=[\\w-]+&?)*
  - [\\w-]+ 参数名由这些字符组成
  - = 名与值之间用 = 相连
  - [\\w-]+ 参数值由这些字符组成
  - &? 每对参数之间用 & 相连，若只有一对参数则无需&
  - * 参数可能有一对或多对
- ?$ 整个域名后的路径或值都可有可无并都已存在的元素作为整个URL结尾

/**
 * 判断URL是否合法
  */
@Test
public void isUrl(){
    String url = "https://www.bilibili.com" +
            "/video/BV1Eq4y1E79W?p=17&spm_id_from=pageDriver" +
            "&vd_source=1515d4ece87146a640eebb6175354668";
    
    String regex = "^((http)s?://)?([\\w-]+\\.)+[a-zA-Z0-9]+((/[\\w-#]+)+\\?([\\w-]+=[\\w-]+&?)*)?$";
    
    boolean isMatch = Pattern.matches(regex, url);
    if (isMatch) {
        System.out.println("符合！");
    }else{
        System.out.println("不符合！");
    }
}

4. 结巴问题

将 ‘’我我我…要要…吃吃吃吃…饭‘’ 变成 ”我要吃饭“

public void spla() {
    String str = "我我我....要要....吃吃吃吃..饭";
    
    Pattern pattern = Pattern.compile("\\.");
    Matcher matcher = pattern.matcher(str);
//    将所有 . 替换成 空白
    String s = matcher.replaceAll("");

    Pattern pattern1 = Pattern.compile("(.)\\1+");
    Matcher matcher1 = pattern1.matcher(s);
//    使用外部反向引用，$1 表示正则式中的第一个分组捕获的值
    String res = matcher1.replaceAll("$1");
    System.out.println(res);

}