Java正则表达式中Matcher类的find方法多次调用的匹配问题

最新推荐文章于 2024-07-29 21:10:47 发布

二木成林

最新推荐文章于 2024-07-29 21:10:47 发布

阅读量6.5k

点赞数 18

分类专栏：正则表达式文章标签： java 正则表达式开发语言

本文链接：https://blog.csdn.net/cnds123321/article/details/121195090

版权

正则表达式专栏收录该内容

6 篇文章

订阅专栏

本文深入解析了Java中Matcher类的find方法工作原理，通过示例代码解释了为何在连续调用find()时，其返回结果可能不同。关键在于find()方法会从上一次匹配成功后不匹配的第一个字符开始查找，导致在没有重置或重新创建Matcher的情况下，多次调用可能导致找不到匹配项。解决方案包括重新生成Matcher或在每次调用后使用reset()方法重置。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

首先，看下面的代码，我以为它三次调用find()方法都会返回true，但实际上并不是。

public class Demo {
    public static void main(String[] args) {
        String str = "hello 123!";
        String regex = "(\\d+)";
        Matcher matcher = Pattern.compile(regex).matcher(str);
        System.out.println(matcher.find());// true
        System.out.println(matcher.find());// false
        System.out.println(matcher.find());// false
    }
}

阅读了Java正则匹配中matcher()和find()的配对问题后认为matcher与find()是一一对应的，但后来发现事实并非如此，看下面的代码：

public class Demo {
    public static void main(String[] args) {
        String str = "hello 123! hello 456! hello 789!";
        String regex = "(\\d+)";
        Matcher matcher = Pattern.compile(regex).matcher(str);
        System.out.println(matcher.find());// true
        System.out.println(matcher.find());// true
        System.out.println(matcher.find());// true
        System.out.println(matcher.find());// false
    }
}

打印的结果分别是：true、true、true、false。如果按照上面文章的结论应该打印的结果是：true、false、false、false。

为什么会这样呢？我们看它的源码，可能会有一点思路：

    /**
     * Attempts to find the next subsequence of the input sequence that matches
     * the pattern.
     *
     * <p> This method starts at the beginning of this matcher's region, or, if
     * a previous invocation of the method was successful and the matcher has
     * not since been reset, at the first character not matched by the previous
     * match.
     *
     * <p> If the match succeeds then more information can be obtained via the
     * <tt>start</tt>, <tt>end</tt>, and <tt>group</tt> methods.  </p>
     *
     * @return  <tt>true</tt> if, and only if, a subsequence of the input
     *          sequence matches this matcher's pattern
     */
    public boolean find() {
        int nextSearchIndex = last;
        if (nextSearchIndex == first)
            nextSearchIndex++;

        // If next search starts before region, start it at region
        if (nextSearchIndex < from)
            nextSearchIndex = from;

        // If next search starts beyond region then it fails
        if (nextSearchIndex > to) {
            for (int i = 0; i < groups.length; i++)
                groups[i] = -1;
            return false;
        }
        return search(nextSearchIndex);
    }

将上面的方法注释翻译成中文，如下：

尝试找到与模式匹配的输入序列的下一个子序列。
此方法在此匹配器区域的开头开始，或者，如果该方法的先前调用成功且此后匹配器尚未重置，则在与先前匹配不匹配的第一个字符处开始。
如果匹配成功，则可以通过start 、 end和group方法获得更多信息。
返回：
当且仅当输入序列的子序列与此匹配器的模式匹配时才为真

即如果该方法的先前调用成功且此后匹配器尚未重置，则在与先前匹配不匹配的第一个字符处开始。的情况。

我们的测试文本是hello 123! hello 456! hello 789!，字符串中的每一个字符及其下标如下图所示：
在这里插入图片描述当我们第一次调用find()方法的时候，一定会搜索到数字的。
而第二次调用find()方法则从下标为9的字符开始查找起，也能找到与正则表达式相匹配的字符串"456"，如下图：
而第三次调用find()方法则从下标为20的字符开始查找起，也能找到与正则表达式相匹配的字符串"789"，如下图：
在这里插入图片描述而第四次调用find()方法从下标t为31的字符开查找起，找不到能够与正则表达式相匹配的字符串，返回false。

通过debug调试到find()方法内，查看nextSearchIndex局部变量的值跟上面说的一样。
在这里插入图片描述
到此我们应该能够明白第一份代码中find()方法为什么输出的结果是true、false、false了。因为find()方法的查找会从上一次匹配成功后不匹配的第一个字符开始查找，而上一份代码整个字符串中只有一份能够匹配，所以后面的调用也就返回false了。

所以对于第一份代码中的问题解决，可以像参考文章中所说的那样，重新再生成一个matcher：

public class Demo {
    public static void main(String[] args) {
        String str = "hello 123!";
        String regex = "(\\d+)";
        Matcher matcher = Pattern.compile(regex).matcher(str);
        System.out.println(matcher.find());// true
        matcher = Pattern.compile(regex).matcher(str);
        System.out.println(matcher.find());// true
        matcher = Pattern.compile(regex).matcher(str);
        System.out.println(matcher.find());// true
    }
}

或者在每次调用后再执行reset()方法重置：

public class Demo {
    public static void main(String[] args) {
        String str = "hello 123!";
        String regex = "(\\d+)";
        Matcher matcher = Pattern.compile(regex).matcher(str);
        System.out.println(matcher.find());// true
        matcher.reset();
        System.out.println(matcher.find());// true
        matcher.reset();
        System.out.println(matcher.find());// true
    }
}