Java正则表达式实战-CSDN博客

本文链接：https://blog.csdn.net/wyyrockking/article/details/84032912

1.java正则例子

java中每一次匹配上正则表达式的字符均会缓存在"0组"中，正则表达式中捕获型组对应匹配到的字符串按照"("顺序依次缓存在1、2、3……组。其都通过java.util.regex.Matcher的public String group(int group)方法访问。

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class JavaPattern {
	public static void main(String[] args) {
		String input = "<p>hello</p>";
		String regex = "<p>(.*)</p>";
		println(input, regex);
	}

	/**
	 * 打印正则匹配的信息
	 * 
	 * @param input
	 *            字符串
	 * @param regex
	 *            正则表达式
	 */
	public static void println(String input, String regex) {
		Pattern p = Pattern.compile(regex);
		Matcher m = p.matcher(input);
		StringBuilder sb = new StringBuilder();
		while (m.find()) {
			sb.append("groupCount:")
				.append(m.groupCount())
				.append("\t");
			for (int i = 0; i <= m.groupCount(); i++) {
				sb.append(m.group(i))
				.append("\t");
			}
			sb.append("\r\n");
		}
		System.out.println(sb.toString());
		System.out.println(m.matches());
	}
}

程序运行打印：

groupCount:1	<p>hello</p>	hello	

true

正则表达式：

元字符	描述
.	匹配除“\n”和"\r"之外的任何单个字符。
*	匹配前面的子表达式任意次
( )	将( 和 ) 之间的表达式定义为“组”（group），并且将匹配这个“组”的字符保存到一个临时区域（一个正则表达式中最多可以保存9个），它们可以用 $1 到$9 的符号来引用。（详细操作见下文）

2.非贪心模式匹配

元字符	描述
?	当此字符紧随任何其他限定符（*、+、?、{n}、{n,}、{n,m}）之后时，匹配模式是"非贪心的"。"非贪心的"模式匹配搜索到的、尽可能短的字符串，而默认的"贪心的"模式匹配搜索到的、尽可能长的字符串。见下例子

改动1：默认贪心匹配

		String input = "<p>hello</p><p>world</p>";
		String regex = "<p>(.*)</p>";

打印1：

groupCount:1	<p>hello</p><p>world</p>	hello</p><p>world	

true

改动2：非贪心匹配

		String input = "<p>hello</p><p>world</p>";
		String regex = "<p>(.*?)</p>";

打印2：

groupCount:1	<p>hello</p>	hello	
groupCount:1	<p>world</p>	world	

true

以上改1中.*表示匹配除“\n”和"\r"之外的任何单个字符，任意次。默认是贪心匹配模式，就是尽可能多的匹配。所以一下子匹配到了结尾。改2中.*后面加上了?就是启用了非贪心模式。尽可能少的匹配。

非捕获型括号

符号	描述
(?:pattern)	非获取匹配，与(pattern)唯一的区别就是不将匹配这个“组”的字符保存到临时区域

优势：1.避免了不必要的捕获操作，提高了匹配效率。2.根据情况选择合适的括号能够让程序清晰，看代码的人不会被括号的具体细节所困扰。

改动3：

		String input = "<p>hello</p>";
		String regex = "<p>(?:.*)</p>";

打印3：

groupCount:0	<p>hello</p>	

true

原原始例子对比发现，没有缓存括号匹配的内容hello到$1中。其他不受影响。

重点：零宽断言

这部分容易混淆，术语经常忘，我直接解释用法。

符号	描述
(?=pattern)	匹配一个位置，此位置后面字符串可匹配pattern
(?!pattern)	匹配一个位置，即此位置后面字符串不匹配pattern
(?<=pattern)	匹配一个位置，即此位置前面字符串可匹配pattern
(?<!pattern)	匹配一个位置，即此位置前面字符串不匹配pattern

改4：匹配数字，其后紧挨着aa

		String input = "123aa456bb789";
		String regex = "\\d+(?=aa)";

打印4：

groupCount:0	123	

false

改5：匹配数字，其后不紧挨着aa

		String input = "123aa456bb789";
		String regex = "\\d+(?!aa)";

打印5：注意：\d+是贪婪匹配

groupCount:0	12	
groupCount:0	456	
groupCount:0	789	

false

改6：匹配数字，其前面是aa

		String input = "123aa456bb789";
		String regex = "(?<=aa)\\d+";

打印6：

groupCount:0	456	

false

改7：匹配数字，其前面不是aa

		String input = "123aa456bb789";
		String regex = "(?<!aa)\\d+";

打印7：

groupCount:0	123	
groupCount:0	56	
groupCount:0	789	

false

捕获的缓存组例子

假如说有段html，需要把<b></b>中的文本追加上斜体样式。如何做呢？

		String input = "我叫<b>张三</b>，性别<b>男</b>，爱好<b>编程</b>。";
		String regex = "(?<=<b>)(.*?)(?=</b>)";
		Pattern pattern = Pattern.compile(regex);
		Matcher matcher = pattern.matcher(input);
		System.out.println(matcher.replaceAll("<i>$1</i>"));

输出：

我叫<b><i>张三</i></b>，性别<b><i>男</i></b>，爱好<b><i>编程</i></b>。

其中的$1就是(.*?)匹配到的字符串。