Java正则表达式用法汇总

我喺小VIE

于 2024-07-23 09:25:09 发布

阅读量929

点赞数 31

文章标签： java 正则表达式学习方法后端

本文链接：https://blog.csdn.net/xiaovie/article/details/140626194

版权

一、表达式元字符

\ ，转义符，将下一个字符标记为特殊字符，例如"n"表示匹配字符"n"，而"\n"表示匹配换行符。
^，匹配字符串开始位置
$，匹配字符串结束位置
*，匹配零或多次
+，匹配正整数次
?，匹配有（一次）或无（零次）
{n}，匹配恰好n次
{n,}，匹配n及以上
{n,m}，匹配[n,m](n<=m,n和m为非负整数)区间次数
.，匹配任意一个字符
?，在*、+后紧跟? 构成非贪婪匹配，以尽可能短的字符匹配
x|y，匹配x或y，从左至右优先匹配
[xyz]，匹配字符集中任意一个字符
[^xyz]，匹配未包含的任何字符
[a-z]，匹配字符范围内的任一字符
[^a-z]，匹配不包含在字符范围内的任一字符
\d，匹配数字字符，等效于[0-9]
\D，匹配非数字字符
\f，匹配换页符
\n，匹配换行符
\r，匹配回车符
\t，匹配制表符
\s，匹配任何空白字符，包括空格、制表符、换页符、回车符、换行符
\S，匹配任何非空白字符
\w，匹配任何字类字符，包括下划线，等效于[a-zA-Z0-9_]
\W，匹配任何非字类字符

二、特殊字符的表示

括号"(“或”)“在正则表达式中有捕获组的特殊含义，如果要匹配”(“或”)“则配合使用转义符”$“或”$“。
连字符”-“表示字符范围，如果要匹配”-“则配合使用”\-“。
在正则表达式中反斜杠”\ “代表转义符，但由于Java字节码编译器，Java源代码中的反斜线被解释为Unicode转义或其他字符转义，例如”\n"被解释为换行，因此在字符串字面值中使用两个反斜线表示正则表达式受到保护。
"\\“在Java正则表达式中使用”\\\\"匹配。

三、Pattern和Matcher

java.util.regex.Pattern

Pattern用于创建一个正则表达式，或说创建一个匹配模式，
通过Patter.compile(String regex)创建一个实例：

Pattern pattern = Pattern.compile("\\d+");
pattern.pattern();    //返回：\d+

使用正则表达式分割字符串：

String[] Pattern.split(CharSequence input);
Pattern pattern = Pattern.compile("\\d+");
String[] array = pattern.split("123A456B789C");
//分别是array[0] = "A", array[1] = "B", array[2] = "C"

使用Pattern模式匹配字符串：

Matcher Pattern.matcher(CharSequence input);
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = patter.matcher("123");

java.util.regex.Matcher

Matcher的三个匹配方法

boolean Matcher.matchers(); //全部匹配
boolean Matcher.lookingAt(); //匹配最前面
boolean Matcher.find(); //包含匹配

Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("A1B2C3");
matcher.find();    //返回true

Matcher的三个筛选方法

int Matcher.start(); //匹配到的子字符串在原字符串的索引位置
int Matcher.end(); //匹配到的子字符串最后一个字符在原字符串的索引位置
String Matcher.group(); //匹配到的子字符串

Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("A123B");
matcher.find();
matcher.start();    //返回1
matcher.group();    //返回"123"
matcher.end();    //返回3

四、捕获组

在表达式规则中使用括号()可以对匹配后的字符串按括号为组的形式划分。

Pattern pattern = Pattern.compile("(\\d+A)(\\d+B)(\\d+C)");
Matcher matcher = pattern.matcher("123A456B789C000");
while(matcher.find()) {
  matcher.group();    //"123A"\"456B"\"789C"
}

/*
matcher.find();
matcher.groupCount();    //3
matcher.group(1);
matcher.group(2);
matcher.group(3);
*/

捕获组中表达式的括号里是一个子正则表达式；
在上例中"123A456B789C000"被分组为：(123A)(456B)(789C)。

命名捕获组的写法
使用命名捕获组，在子表达式的前面以"?<xxx>"开始。

Pattern pattern = Pattern.compile("(?<a>\\d+A)(?<b>\\d+B)(?<c>\\d+C)");
Matcher matcher = pattern.matcher("123A456B789C000");
matcher.find();
matcher.gourp("a");    //return "123A"
matcher.group("b");    //return "456B"
matcher.group("c");    //return "789C"