Java正则表达式学习(三)

最新推荐文章于 2021-11-04 23:30:27 发布

AlanLiu1988

最新推荐文章于 2021-11-04 23:30:27 发布

阅读量128

点赞数

分类专栏： java 文章标签： java

本文链接：https://blog.csdn.net/AlanLiu1988/article/details/84225261

版权

java 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

8.捕获组

捕获组（capturing group）是将多个字符作为单独的单元来对待的一种方式。构建它们可以通过把字符放在一对圆括号中而成为一组。例如，正则表达式（dog）建了单个的组，包括字符"d" "o" 和 "g"。匹配捕获组输入的字符串部分将会存放于内存中，稍后通过反向引用再次调用。

8.1 编号方式

在Pattern的API描述中，获取组通过从左至右计算开始的圆括号进行编码。例如，在表达式((A)(B(C)))中，有下面的四组：

1.((A)(B(C)))

2.(A)

3.(B(C))

4.(C)

要找出当前的表达式中有多少组，通过调用Matcher对象的groupCount方法。groupCount方法返回int类型值，表示当前Matcher模式中捕获组的数量。例如，groupCount返回4时，表示模式中包含4个捕获组。

有一个特别的组--组0，它表示整个表达式。这个组不包括groupCount的报告范围内。以（？开始的组是纯粹的非捕获组（non-catturinggroup））,它不捕获文本，也不作为组总数而计数

Matcher中的一些方法，可以指定int类型的特定组号作为参数，因此理解组是如何编号的是尤为重要的。

public int start(int group):返回之前的匹配操作期间，给定组所捕获子序列的初始索引。

public int end(int group)：返回之前的匹配操作期间,给定组所捕获的字序列的最后字符索引加1。

public String group(int group):返回之前的匹配操作期间，通过给定组而捕获的输出字序列。

8.2 反向引用

匹配输入字符串的捕获组部分会存放在内存中，通过反向引用（backreferences）稍后再调用。在正则表达式中，反向引用使用反斜线（\）后跟一个表示需要在调用组号的数字来表示。例如，在表达式（\d\d）定义了匹配一行中的两个数字的捕获组，通过反向引用\1，表达式稍后会被再次调用。

匹配两个数字，且后面跟着两个完全相同的数字时，就可以使用（\d\d）\1 作为正则表达式：

Enter your regex: (\d\d)\1
Enter input string to search: 1212
I found the text "1212" starting at index 0 and ending at index 4.

如果更改最后两个数字，这时匹配就会失效：

Enter your regex: (\d\d)\1
Enter input string to search: 1234
No match found.

对于嵌套的捕获组而言，反向引用采用完全相同的方式进行工作，即指定一个反斜线加上需要被再次调用的组号。

9. 边界匹配器

通过指定一些边界匹配器（boundary matches）的信息，可以使模式匹配更为精确。比如说你对某个特定的单词感兴趣，并且它只出现在行首或者是行尾。又或者你想匹配发生在单词边界（word boundary），或者是上一个匹配的尾部。

下表中列出了所有的边界匹配器及其说明。

边界匹配器
`^`	行首
`$`	行尾
`\b`	单词边界
`\B`	非单词边界
`\A`	输入的开头
`\G`	上一个匹配的结尾
`\Z`	输入的结尾，仅用于最后的结束符（如果有的话）
`\z`	输入的结尾

接下来的例子中，说明了^和$边界匹配器的用法。注意上表中,^匹配行首，$匹配行尾。

Enter your regex: ^dog$
Enter input string to search: dog
I found the text "dog" starting at index 0 and ending at index 3.

Enter your regex: ^dog$
Enter input string to search:      dog
No match found.

Enter your regex: \s*dog$
Enter input string to search:                           dog
I found the text "                          dog" starting at index 0 and ending at index 29.

Enter your regex: ^dog\w*
Enter input string to search: dogblahblah
I found the text "dogblahblah" starting at index 0 and ending at index 11.

第一个例子匹配是成功的，这是因为模式占据了整个输入的字符串。第二个例子失败了，是由于输入的字符串在开始部分包含了额外的空格。第三个例子指定的表达式是不限的空格，后跟着在行尾的dog。第四个例子，需要dog放在行首，后面跟着是不限数量的单词字符。

对于检查一个单词开始和结束的边界模式（用于长字符里子字符串），这时可以在两边使用\b,例如\bdog\b。

Enter your regex: \bdog\b
Enter input string to search: The dog plays in the yard.
I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \bdog\b
Enter input string to search: The doggie plays in the yard.
No match found.

对于匹配非单词边界的表达式，可以使用\B来代替：

Enter your regex: \bdog\B
Enter input string to search: The dog plays in the yard.
No match found.

Enter your regex: \bdog\B
Enter input string to search: The doggie plays in the yard.
I found the text "dog" starting at index 4 and ending at index 7.

对于需要匹配仅出现在前一个匹配的结尾，可以使用\G:

Enter your regex: dog
Enter input string to search: dog dog
I found the text "dog" starting at index 0 and ending at index 3.
I found the text "dog" starting at index 4 and ending at index 7.

Enter your regex: \Gdog
Enter input string to search: dog dog
I found the text "dog" starting at index 0 and ending at index 3.

这里的第二个例子仅找到了一个匹配，这时由于第二次出现"dog"不是在前一个匹配结尾的开始。

AlanLiu1988

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Java正则表达式学习(三)

8.捕获组捕获组（capturing group）是将多个字符作为单独的单元来对待的一种方式。构建它们可以通过把字符放在一对圆括号中而成为一组。例如，正则表达式（dog）建了单个的组，包括字符"d" "o" 和 "g"。匹配捕获组输入的字符串部分将会存放于内存中，稍后通过反向引用再次调用。8.1 编号方式在Pattern的API描述中，获取组通过从左至右计算开始的圆括号进行...
复制链接

扫一扫