java正则表达式
正则表达式的语法
1. 常见的元字符
.*+()$/?[]^{}
2. 限定符:
? * + {}
其中? 表示出现0次或1次
*表示出现0次或n次
+表示出现1次或n次
{}表示出现指定次数,{2}表示2次 {2,3}出现两次或3次{3,}表示出现3次以上
3. 或运算:
|
比如 a(ac|bc) 可以匹配字符aac和abc
4. 字符类:
[] []+ [^]
比如[a,b,c,d] 可以匹配a,b,c,d
[0-9a-zA-Z]+表示可以匹配所有的字母和数字
[^abc]表示可以匹配除了a,b,c之外的任意1个字符,包括数字和特殊符号
5. 元字符
\d表示[0-9]匹配所有数字
\w表示英文数字下划线
\s表示空白符
\D表示非数字字符
\W表示非单词字符
\S表示非空白字符,但不包括换行符
**.**表示任意字符,但不包括换行符
**^**匹配行首
**$**匹配行尾
代码测试
package org.example.test01;
import java.util.regex.Pattern;
public class test02 {
public static void main(String[] args) {
//一. 限定符
//1. ?
System.out.println("==================?=================tt");
System.out.println(Pattern.matches("used?","use"));
System.out.println(Pattern.matches("used?","used"));
System.out.println("==================?=================");
//2. *
System.out.println("==================*=================tttf");
System.out.println(Pattern.matches("ab*c","ac"));
System.out.println(Pattern.matches("ab*c","abc"));
System.out.println(Pattern.matches("ab*c","abbc"));
System.out.println(Pattern.matches("ab*c","bbc"));
System.out.println("==================*=================");
//3. +
System.out.println("==================+=================ftt");
System.out.println(Pattern.matches("ab+c","ac"));
System.out.println(Pattern.matches("ab+c","abc"));
System.out.println(Pattern.matches("ab+c","abbc"));
System.out.println("==================+=================");
//4. {}
System.out.println("=================={}=================tttt");
System.out.println(Pattern.matches("ab{2}c","abbc"));
System.out.println(Pattern.matches("ab{2,3}c","abbc"));
System.out.println(Pattern.matches("ab{2,3}c","abbbc"));
System.out.println(Pattern.matches("ab{2,}c","abbbbc"));
System.out.println("=================={}=================");
//二. 或运算"|"
System.out.println("==================|=================ttf");
System.out.println(Pattern.matches("a(bc|cd)","abc"));
System.out.println(Pattern.matches("a(bc|cd)","acd"));
System.out.println(Pattern.matches("a(bc&cd)","abccd"));
System.out.println("==================|=================");
//三. 字符类
System.out.println("==================[]=================tf");
System.out.println(Pattern.matches("a[bcd]","ab"));
System.out.println(Pattern.matches("a[bcd]","abcd"));
System.out.println("==================[]=================");
System.out.println("==================[]+=================tt");
System.out.println(Pattern.matches("a[bcd]+","abcd"));
System.out.println(Pattern.matches("a[bcd]+","abbd"));
System.out.println("==================[]+=================");
System.out.println("==================[]^=================ftft");
System.out.println(Pattern.matches("a[^bcd]","ac"));
System.out.println(Pattern.matches("a[^bcd]","ae"));
System.out.println(Pattern.matches("a[^bcd]","aef"));
System.out.println(Pattern.matches("a[bc^d]","ac"));
System.out.println("==================[]^=================");
System.out.println("==================[]-=================tf");
System.out.println(Pattern.matches("a[0-9]","a9"));
System.out.println(Pattern.matches("a[0-9]","ab"));
System.out.println("==================[]-=================");
//四. 元字符(\d,\w,\s,\D,\W,\S,.,^,$)
System.out.println("==================\\d表示所有数字=================ft");
System.out.println(Pattern.matches("a\\d","a123"));
System.out.println(Pattern.matches("a\\d","a1"));
System.out.println("==================\\d=================");
System.out.println("==================\\w表示英文,数字,下划线=================tt");
System.out.println(Pattern.matches("\\w","_"));
System.out.println(Pattern.matches("\\w","Z"));
System.out.println("==================\\w=================");
System.out.println("==================\\s表示空白符=================ftt");
System.out.println(Pattern.matches("\\s"," "));
System.out.println(Pattern.matches("\\s"," "));
System.out.println(Pattern.matches("\\s","\t"));
System.out.println("==================\\s=================");
System.out.println("==================\\D表示非数字字符=================t");
System.out.println(Pattern.matches("\\s","\n"));
System.out.println("==================\\D=================");
System.out.println("==================\\W表示非单词字符=================tt");
System.out.println(Pattern.matches("\\W","\n"));
System.out.println(Pattern.matches("\\W","\t"));
System.out.println("==================\\W=================");
System.out.println("==================\\S表示非空白字符=================ff");
System.out.println(Pattern.matches("\\S","\n"));
System.out.println(Pattern.matches("\\S","\t"));
System.out.println("==================\\S=================");
System.out.println("==================.表示任意字符=================ft");
System.out.println(Pattern.matches(".","\n"));
System.out.println(Pattern.matches(".","\t"));
System.out.println("==================.=================");
System.out.println("==================^表示匹配开始=================fttf");
System.out.println(Pattern.matches("^\\d\\w*","as"));
System.out.println(Pattern.matches("^\\d\\w*","1a"));
System.out.println(Pattern.matches("^\\d{2}\\w*","13jha"));
System.out.println(Pattern.matches("^\\d{2}\\w*","1jha"));
System.out.println("==================^=================");
System.out.println("==================&表示匹配结尾=================tt");
System.out.println(Pattern.matches("\\d+a$","11a"));
System.out.println(Pattern.matches("\\w*\\d$","1a1"));
System.out.println("==================&=================");
}
}
java 中正则表达式的使用步骤
1. 需求: 匹配IPv4的地址
String content = "192.234.65.12端口和12.4.5.6还有189.256.23.9等等";
2. 设计正则表达式
((25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)
String regStr = "((25[0-5]|2[0-4]\\d|1\\d{2}|[1-9]?\\d)\\.){3}(25[0-5]|2[0-4]\\d|1\\d{2}|[1-9]?\\d)";
3. 创建模式对象即正则表达式对象(Pattern)
Pattern pattern = Pattern.compile(regStr);
4. 创建匹配器(创建匹配器matcher,按照正则表达式的规则)
Matcher matcher = pattern.matcher(content);
5. 开始匹配
while(matcher.find()){
System.out.println(matcher.group());
}
原码分析
Matcher matcher = pattern.matcher(content);
首先就是创建匹配器对象,调用了Matcher的构造方法
Matcher中的CharSequence属性此属性为String的一个父接口
通过构造器将CharSequence赋初值(子类的引用指向父类的对象)
matcher.group(0);
这个地方的text.getSubSequence()其实运用了多态,运行的是String类中的此方法
Mathcher中find方法和group方法分析
- 在开始匹配的过程中有以下几个步骤(matcher.find()分析)
-
根据指定的规则,定位满足规则的子字符串
-
找到后,将子字符串的开始的索引记录到matcher对象的属性int[] groups;
第一次找到就将groups[0]记录下来,在结束的位置就记录下groups[1] -
同时记录属性oldLastd 值为 子字符串的结束的索引+1的值,下一次执行find时就从此位置开始
- matcher.group()分析
- 根据groups[0] 和 groups[1]的记录的位置,从content开始截取子字符串返回,当然是不包含groups[1]的位置
正则表达式在java中的常见运用(ipv4地址匹配)
package org.example.test01;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class test03 {
public static void main(String[] args) {
String content = "1932年1024到2021年1025等等";
// String regStr = "\\d{4}";
String regStr = "(\\d\\d)(\\d\\d)";
//创建模式对象即正则表达式对象
Pattern pattern = Pattern.compile(regStr);
//创建匹配器(创建匹配器matcher,按照正则表达式的规则)
Matcher matcher = pattern.matcher(content);
while(matcher.find()){
System.out.println(matcher.group(0));
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
String content1 = "192.234.65.12端口和12.4.5.6还有189.256.23.9等等";
String regStr1 = "((25[0-5]|2[0-4]\\d|1\\d{2}|[1-9]?\\d)\\.){3}(25[0-5]|2[0-4]\\d|1\\d{2}|[1-9]?\\d)";
//创建模式对象即正则表达式对象
Pattern pattern1 = Pattern.compile(regStr);
//创建匹配器(创建匹配器matcher,按照正则表达式的规则)
Matcher matcher1 = pattern.matcher(content);
while(matcher1.find()){
System.out.println(matcher1.group(0));
}
}
}