浅析Java正则中的Pattern和Matcher两个类

最新推荐文章于 2023-02-06 16:45:42 发布

潭影空人心

最新推荐文章于 2023-02-06 16:45:42 发布

阅读量632

点赞数

分类专栏： Java 文章标签： java 开发语言后端

本文链接：https://blog.csdn.net/zlk252620068/article/details/109534254

版权

本文详细解析了Java正则表达式中Pattern和Matcher两个类的工作原理，通过分析compile()、matches()和find()方法，揭示了在匹配规则字符串和输入字符串相同时，两者行为差异的原因。关键在于matchRoot和root对象的创建以及match方法的调用，这些都直接影响了matches()和find()的匹配结果。

摘要由CSDN通过智能技术生成

调试Java正则匹配代码时，发现一个现象，若正则规则字符串和匹配字符串相同时，调用matches()方法后，再调用find()方法，那么只有matches()方法匹配到了，如果单独调用find()方法，则find()方法也能匹配到了，示例代码如下：

    public static void main(String[] args) {

        Pattern p = Pattern.compile("aa");
        Matcher a = p.matcher("aa");
        if(a.matches()){
            System.out.println(a.start());
            System.out.println(a.end());
            System.out.println(a.group());
        }
        if(a.find()){
            System.out.println("---");
            System.out.println(a.start());
            System.out.println(a.end());
            System.out.println(a.group());
        }
    }

为了一探究竟，跟进Pattern和Matcher两个类看看。

先是compile()方法，创建一个Pattern对象；

    public static Pattern compile(String regex) {
        return new Pattern(regex, 0);
    }

    private Pattern(String p, int f) {
        pattern = p;
        flags = f;

        // to use UNICODE_CASE if UNICODE_CHARACTER_CLASS present
        if ((flags & UNICODE_CHARACTER_CLASS) != 0)
            flags |= UNICODE_CASE;

        // Reset group index count
        capturingGroupCount = 1;
        localCount = 0;
        
        //pattern的长度大于0，直接编译
        if (pattern.length() > 0) {
            try {
                compile();
            } catch (StackOverflowError soe) {
                throw error("Stack overflow during pattern compilation");
            }
        } else {
            root = new Start(lastAccept);
            matchRoot = lastAccept;
        }
    }

内部的compile()很关键，包括给一些初始变量赋值，及后期的匹配操作初始化，都是在这里处理的。

    private void compile() {
        // Handle canonical equivalences
        if (has(CANON_EQ) && !has(LITERAL)) {
            normalize();
        } else {
            normalizedPattern = pattern;
        }
        patternLength = normalizedPattern.length();

        // Copy pattern to int array for convenience
        // Use double zero to terminate pattern
        temp = new int[patternLength + 2];

        hasSupplementary = false;
        int c, count = 0;
        // Convert all chars into code points
        for (int x = 0; x < patternLength; x += Character.charCount(c)) {
            c = normalizedPattern.codePointAt(x);
            if (isSupplementary(c)) {
                hasSupplementary = true;
            }
            temp[count++] = c;
        }

        patternLength = count;   // patternLength now in code points

        if (! has(LITERAL))
            RemoveQEQuoting();

        // Allocate all temporary objects here.
        buffer = new int[32];
        groupNodes = new GroupHead[10];
        namedGroups = null;

        if (has(LITERAL)) {
            // Literal pattern handling
            matchRoot = newSlice(temp, patternLength, hasSupplementary);
            matchRoot.next = lastAccept;
        } else {
            // Start recursive descent parsing
            matchRoot = expr(lastAccept);
            // Check extra pattern characters
            if (patternLength != cursor) {
                if (peek() == ')') {
                    throw error("Unmatched closing ')'");
                } else {
                    throw error("Unexpected internal error");
                }
            }
        }

        // Peephole optimization
        if (matchRoot instanceof Slice) {
            root = BnM.optimize(matchRoot);
            if (root == matchRoot) {
                root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
            }
        } else if (matchRoot instanceof Begin || matchRoot instanceof First) {
            root = matchRoot;
        } else {
            root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
        }

        // Release temporary storage
        temp = null;
        buffer = null;
        groupNodes = null;
        patternLength = 0;
        compiled = true;
    }

    //由于默认情况下，flags为0，所以默认情况下，has方法始终返回false
    private boolean has(int f) {
        return (flags & f) != 0;
    }

在compile()方法中，创建的matchRoot和root对象很关键，是后面执行matches()方法和find()方法的基础。而matchRoot对象是通过expr(lastAccept)方法创建的，传参lastAccept为LastNode实例。

下面看看expr方法

    private Node expr(Node end) {
        Node prev = null;
        Node firstTail = null;
        Branch bra

最低0.47元/天解锁文章

潭影空人心

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
3
评论
浅析Java正则中的Pattern和Matcher两个类

调试Java正则匹配代码时，发现一个现象，若正则规则字符串和匹配字符串相同时，调用matches()方法后，再调用find()方法，那么只有matches()方法匹配到了，如果单独调用find()方法，则find()方法也能匹配到了，示例代码如下： public static void main(String[] args) { Pattern p = Pattern.compile("aa"); Matcher a = p.matcher("aa");
复制链接

扫一扫