java正则 长度大于11,正则表达式在Java中没有明显的最大长度

我一直认为,Java的regex-API(以及许多其他语言)中的后瞻性断言必须有明显的长度。因此,STAR和PLUS量词不允许在look-behinds内部使用。

“[…] Java takes things a step further by

allowing finite repetition. You still

cannot use the star or plus, but you

can use the question mark and the

curly braces with the max parameter

specified. Java recognizes the fact

that finite repetition can be

rewritten as an alternation of strings

with different, but fixed lengths.

Unfortunately, the JDK 1.4 and 1.5

have some bugs when you use

alternation inside lookbehind. These

were fixed in JDK 1.6. […]”

— 07001

使用大括号只要look-behind内的字符范围的总长度小于或等于Integer.MAX_VALUE即可。所以这些正则表达式是有效的:

"(?<=a{0," +(Integer.MAX_VALUE) + "})B"

"(?<=Ca{0," +(Integer.MAX_VALUE-1) + "})B"

"(?<=CCa{0," +(Integer.MAX_VALUE-2) + "})B"

但这些不是:

"(?<=Ca{0," +(Integer.MAX_VALUE) +"})B"

"(?<=CCa{0," +(Integer.MAX_VALUE-1) +"})B"

但是,我不明白以下:

当我使用*和量词在后台中运行测试时,一切都很好(见输出测试1和测试2)。

但是,当我从测试1和测试2开始添加单个字符时,它将中断(见输出测试3)。

使来自测试3的贪心*没有效果,它仍然断裂(见测试4)。

这里是测试工具:

public class Main {

private static String testFind(String regex, String input) {

try {

boolean returned = java.util.regex.Pattern.compile(regex).matcher(input).find();

return "testFind : Valid -> regex = "+regex+", input = "+input+", returned = "+returned;

} catch(Exception e) {

return "testFind : Invalid -> "+regex+", "+e.getMessage();

}

}

private static String testReplaceAll(String regex, String input) {

try {

String returned = input.replaceAll(regex, "FOO");

return "testReplaceAll : Valid -> regex = "+regex+", input = "+input+", returned = "+returned;

} catch(Exception e) {

return "testReplaceAll : Invalid -> "+regex+", "+e.getMessage();

}

}

private static String testSplit(String regex, String input) {

try {

String[] returned = input.split(regex);

return "testSplit : Valid -> regex = "+regex+", input = "+input+", returned = "+java.util.Arrays.toString(returned);

} catch(Exception e) {

return "testSplit : Invalid -> "+regex+", "+e.getMessage();

}

}

public static void main(String[] args) {

String[] regexes = {"(?<=a*)B", "(?<=a+)B", "(?<=Ca*)B", "(?<=Ca*?)B"};

String input = "CaaaaaaaaaaaaaaaBaaaa";

int test = 0;

for(String regex : regexes) {

test++;

System.out.println("********************** Test "+test+" **********************");

System.out.println(" "+testFind(regex, input));

System.out.println(" "+testReplaceAll(regex, input));

System.out.println(" "+testSplit(regex, input));

System.out.println();

}

}

}

输出:

********************** Test 1 **********************

testFind : Valid -> regex = (?<=a*)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = true

testReplaceAll : Valid -> regex = (?<=a*)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = CaaaaaaaaaaaaaaaFOOaaaa

testSplit : Valid -> regex = (?<=a*)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = [Caaaaaaaaaaaaaaa, aaaa]

********************** Test 2 **********************

testFind : Valid -> regex = (?<=a+)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = true

testReplaceAll : Valid -> regex = (?<=a+)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = CaaaaaaaaaaaaaaaFOOaaaa

testSplit : Valid -> regex = (?<=a+)B, input = CaaaaaaaaaaaaaaaBaaaa, returned = [Caaaaaaaaaaaaaaa, aaaa]

********************** Test 3 **********************

testFind : Invalid -> (?<=Ca*)B, Look-behind group does not have an obvious maximum length near index 6

(?<=Ca*)B

^

testReplaceAll : Invalid -> (?<=Ca*)B, Look-behind group does not have an obvious maximum length near index 6

(?<=Ca*)B

^

testSplit : Invalid -> (?<=Ca*)B, Look-behind group does not have an obvious maximum length near index 6

(?<=Ca*)B

^

********************** Test 4 **********************

testFind : Invalid -> (?<=Ca*?)B, Look-behind group does not have an obvious maximum length near index 7

(?<=Ca*?)B

^

testReplaceAll : Invalid -> (?<=Ca*?)B, Look-behind group does not have an obvious maximum length near index 7

(?<=Ca*?)B

^

testSplit : Invalid -> (?<=Ca*?)B, Look-behind group does not have an obvious maximum length near index 7

(?<=Ca*?)B

^

我的问题可能很明显,但我仍然会问:任何人都可以向我解释为什么测试1和2失败,测试3和4失败?我希望他们都失败,不是一半的工作,其中一半失败。

谢谢。

PS。我使用:Java版本1.6.0_14

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值