文章目录
java中的正则表达式主要有两个作用,一个是判断字符串是否匹配正则表达式,另一个是通过正则表达式来截取字符串中的一部分。
判断是否匹配指定表达式的语法
String str = "33as";
String pattern = "\\d{2}.*$";
System.out.println(Pattern.matches(pattern, str));
通过正则表达式来截取字符串
简单截取
String str = "a112b234c543d";
String pattern = "\\d+";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
while(matcher.find())
System.out.println(matcher.group());
输出
112
234
543
捕获组
主要使用find和group方法
private static void common3() {
//截取+分组
String str = "a123b456c7890d";
String pattern = "(\\d)\\d(\\d+)";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
while(matcher.find()){
System.out.println(matcher.group());
for(int i = 0; i < matcher.groupCount(); i++){
System.out.println(matcher.group(i + 1));
}
}
}
输出
123
1
3
456
4
6
7890
7
90
group()
方法获取整个正则表达式的结果,group(int i)
获取与i值对应下标的正则表达式中括号内的匹配值。以上面例子来说,(\\d)\\d(\\d+)
的第一个括号获取的是第一个数字,对应group(0)
,第二个括号对应第三个数字及之后的数字,对应group(1)
。
匹配不捕获 (?:pattern)
匹配 pattern 但不捕获该匹配的子表达式
可以理解成|
的另一种写法
asd(?:g|fg)
等价于 asdg|asdfg
asd(?:g|fg)(?:gh|h)
等价于asdggh|asdgh|asdfggh|asdfgh
private static void v1(){
String str = "asdfgh";
String pattern = "asd(?:g|fg)";
pattern = "asd(?:g|fg)(?:gh|h)";
//asdggh|asdgh|asdfggh|asdfgh
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
while(matcher.find()){
System.out.println(matcher.group());
System.out.println(matcher.groupCount());
}
}
输出
asdfgh
0
可以看到括号中的表达式被匹配了出来,但是并没有被存储起来。
正向预测不捕获 (?=pattern)
windows(?=\\d{3})
表示匹配后面紧跟3个数字的windows。
private static void v3(){
String str = "windows123";
String pattern = "windows(?=\\d{3})";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
while(matcher.find()){
System.out.println(matcher.group());
}
}
反向预测不捕获 (?!pattern)
这个反向不是指方向,而是相对正向取反。
linux(?!\\d{2})
可以匹配linux1
但不能匹配linux12
或者linux123
private static void v4(){
String str = "linux1";
String pattern = "linux(?!\\d{2})";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
while(matcher.find()){
System.out.println(matcher.group());
}
}
输出
linux
后向引用
\1 用于正则表达式内取值,取的是第一个分组匹配到的值
例如匹配连续重复的两个字符
String content = "aa";
String pattern = "(.)\\1";
boolean isMatch = Pattern.matches(pattern, content);
System.out.println(isMatch);//true
匹配是否存在连续字符
String content = "asdffjhkjkk";
String pattern = ".*(.)\\1+.*";
boolean isMatch = Pattern.matches(pattern, content);
System.out.println(isMatch);//true
自我练习
获取a标签的超链接地址
private static void test1(){
String str = "<a href='localhost:9999'></a>\r\r\n ";
str += "<a id='a1' href=\"https://www.baidu.com\" class=\"a-cs1\">bbb</a>";
// localhost:9999
// https://www.baidu.com
str = str.replace("\"", "'");
String pattern = "<a(?:[^>]*)href='([^']*)";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
while(matcher.find()){
// System.out.println(matcher.group());
for(int i = 0; i < matcher.groupCount(); i++){
System.out.println(matcher.group(i + 1));
}
}
}
获取class中包含a-cs1的全部a标签的超链接地址
private static void test2(){
String str = "<a href='localhost:9999' class=\"a-cs2\"></a>\r\r\n ";
str += "<a id='a1' href=\"https://www.baidu.com\" class=\"a-cs1\">bbb</a>";
str += "<a id='a2' href=\"https://bbs.csdn.net/forums/Java\" class=\"a-cs1; rrwer\">c</a>";
// https://www.baidu.com
// https://bbs.csdn.net/forums/Java
str = str.replace("\"", "'");
String pattern = "<a(?:[^>]*)class='[^a-cs1]*a-cs1[^a-cs1]*";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
while(matcher.find()){
// System.out.println(matcher.group());
String temp = matcher.group();
String pattern2 = "href='(.*?)'";
Pattern r2 = Pattern.compile(pattern2);
Matcher matcher2 = r2.matcher(temp);
while(matcher2.find()){
System.out.println(matcher2.group(1));
}
}
}
下划线转驼峰
private static void test3(){
String str = "aaa_bbb_c_ddd";//aaaBbbCDdd
String pattern = "_[a-z]";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
StringBuffer sb = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(sb, matcher.group().toUpperCase().replace("_", ""));
}
matcher.appendTail(sb);
System.out.println(sb);
}
驼峰转下划线
private static void test4(){
String str = "aaaBbbCDdd";//aaa_bbb_c_ddd
String pattern = "[A-Z]";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
StringBuffer sb = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(sb, "_" + matcher.group().toLowerCase());
}
matcher.appendTail(sb);
System.out.println(sb);
}
数字转为财务格式(三位逗号分割)
private static void test6(){
String str = "23456789.12545";//23,456,789.13
//23456789 23,456,789.00
str = new BigDecimal(str).setScale(2, BigDecimal.ROUND_HALF_UP).toString();
str = str.replaceAll("(\\d)(?=(\\d{3})+\\.)", "$1,");
System.out.println(str);
}
获取url中的参数
private static void test7(){
String str = "http://aaa.bbb.com?aa=12&b1=b2b&ccc=123";
// aa = 12
// b1 = b2b
// ccc = 123
String pattern = "[\\?|\\&]([^=]+)=([^&]+)";
Pattern r = Pattern.compile(pattern);
Matcher matcher = r.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1) + " = " + matcher.group(2));
}
}
连续重复字符去重
String str = "aaa...123321222ggg";
String regex = "(.)\\1+";
Matcher matcher = Pattern.compile(regex).matcher(str);
String res = matcher.replaceAll("$1");
System.out.println(res);//a.123212g
简单解释一下,$1
代表第一个括号内的匹配值,Matcher.replaceAll的作用是吧每个匹配的表达式("(.)\\1+"
)替换为括号内的值。