调试Java正则匹配代码时,发现一个现象,若正则规则字符串和匹配字符串相同时,调用matches()方法后,再调用find()方法,那么只有matches()方法匹配到了,如果单独调用find()方法,则find()方法也能匹配到了,示例代码如下:
public static void main(String[] args) {
Pattern p = Pattern.compile("aa");
Matcher a = p.matcher("aa");
if(a.matches()){
System.out.println(a.start());
System.out.println(a.end());
System.out.println(a.group());
}
if(a.find()){
System.out.println("---");
System.out.println(a.start());
System.out.println(a.end());
System.out.println(a.group());
}
}
为了一探究竟,跟进Pattern和Matcher两个类看看。
先是compile()方法,创建一个Pattern对象;
public static Pattern compile(String regex) {
return new Pattern(regex, 0);
}
private Pattern(String p, int f) {
pattern = p;
flags = f;
// to use UNICODE_CASE if UNICODE_CHARACTER_CLASS present
if ((flags & UNICODE_CHARACTER_CLASS) != 0)
flags |= UNICODE_CASE;
// Reset group index count
capturingGroupCount = 1;
localCount = 0;
//pattern的长度大于0,直接编译
if (pattern.length() > 0) {
try {
compile();
} catch (StackOverflowError soe) {
throw error("Stack overflow during pattern compilation");
}
} else {
root = new Start(lastAccept);
matchRoot = lastAccept;
}
}
内部的compile()很关键,包括给一些初始变量赋值,及后期的匹配操作初始化,都是在这里处理的。
private void compile() {
// Handle canonical equivalences
if (has(CANON_EQ) && !has(LITERAL)) {
normalize();
} else {
normalizedPattern = pattern;
}
patternLength = normalizedPattern.length();
// Copy pattern to int array for convenience
// Use double zero to terminate pattern
temp = new int[patternLength + 2];
hasSupplementary = false;
int c, count = 0;
// Convert all chars into code points
for (int x = 0; x < patternLength; x += Character.charCount(c)) {
c = normalizedPattern.codePointAt(x);
if (isSupplementary(c)) {
hasSupplementary = true;
}
temp[count++] = c;
}
patternLength = count; // patternLength now in code points
if (! has(LITERAL))
RemoveQEQuoting();
// Allocate all temporary objects here.
buffer = new int[32];
groupNodes = new GroupHead[10];
namedGroups = null;
if (has(LITERAL)) {
// Literal pattern handling
matchRoot = newSlice(temp, patternLength, hasSupplementary);
matchRoot.next = lastAccept;
} else {
// Start recursive descent parsing
matchRoot = expr(lastAccept);
// Check extra pattern characters
if (patternLength != cursor) {
if (peek() == ')') {
throw error("Unmatched closing ')'");
} else {
throw error("Unexpected internal error");
}
}
}
// Peephole optimization
if (matchRoot instanceof Slice) {
root = BnM.optimize(matchRoot);
if (root == matchRoot) {
root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
}
} else if (matchRoot instanceof Begin || matchRoot instanceof First) {
root = matchRoot;
} else {
root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
}
// Release temporary storage
temp = null;
buffer = null;
groupNodes = null;
patternLength = 0;
compiled = true;
}
//由于默认情况下,flags为0,所以默认情况下,has方法始终返回false
private boolean has(int f) {
return (flags & f) != 0;
}
在compile()方法中,创建的matchRoot和root对象很关键,是后面执行matches()方法和find()方法的基础。而matchRoot对象是通过expr(lastAccept)方法创建的,传参lastAccept为LastNode实例。
下面看看expr方法
private Node expr(Node end) {
Node prev = null;
Node firstTail = null;
Branc