jdk7正则表达式-命名捕获组(named capture)

jdk6之前的正则表达式不支持命名捕获组功能,只能通过捕获组的索引来访问捕获组.当正则表达式比较复杂的时候,里面含有大量的捕获组和非捕获组,通过从左至右数括号来得知捕获组的计数也是一件很烦人的事情;而且这样做代码的可读性也不好,当正则表达式需要修改的时候也会改变里面捕获组的计数.解决这个问题的方法是通过给捕获组命名来解决,就像Python, PHP, .Net 以及Perl这些语言里的正则表达式一样.
新引入的命名捕获组支持如下:
(1) (?<NAME>X) to define a named group "NAME"                    
(2) \k<Name> to backref a named group "NAME"                 
(3) ${NAME} to reference to captured group in matcher's replacement str
(4) group(String NAME) to return the captured input subsequence by the given "named group"

举两个例子来看一下:

public static void indexedCaptureTest(){//jdk6之前的使用方式
        String names = "fred or barney";
        Matcher m = Pattern.compile("(\\w+) or (\\w+)").matcher(names);
        if(m.find()){
            System.out.println(m.group(1)+","+m.group(2));
        }
    }
    public static void namedCaptureTest(){//jdk7可以给捕获组命名
        String names = "fred or barney";
        Matcher m = Pattern.compile("(?<name1>\\w+) or (?<name2>\\w+)").matcher(names);
        if(m.find()){
            System.out.println(m.group("name1")+","+m.group("name2"));
        }
    }


再看一下反向引用和替换字符串的例子:
String input = "aabbbccdddef";
如何把这个字符串拆成[aa, bbb, cc, ddd, e, f]这样的数组?

public static void indexedCaptureReplace(){
        String input = "aabbbccdddef";
        String regex = "((.)+?)(?!\\2)";
        String temp = input.replaceAll(regex, "$1,");
        String[] arr = temp.split(",");
        System.out.println(java.util.Arrays.toString(arr));
    }
    public static void namedCaptureReplace(){
        String input = "aabbbccdddef";
        String regex = "(?<name2>(?<name1>.)+?)(?!\\k<name1>)";//好丑陋的实现!ugly!
        String temp = input.replaceAll(regex, "${name2},");
        String[] arr = temp.split(",");
        System.out.println(java.util.Arrays.toString(arr));
    }

参考: http://www.iteye.com/news/6195。但是,这里面的说法在jdk的实际实现中有改动,主要是在${}这块。
Pattern类的doc:
Back references
\n Whatever the nth capturing group matched

\k<name> Whatever the named-capturing group "name" matched


Matcher类的public Matcher appendReplacement(StringBuffer sb, String replacement):
The replacement string may contain references to subsequences captured during the previous match:
Each occurrence of ${name} or $g will be replaced by the result of evaluating the corresponding group(name) or group(g) respectively.
For $g, the first number after the $ is always treated as part of the group reference. Subsequent numbers are incorporated into g if they would form a legal group reference. Only the numerals '0' through '9' are considered as potential components of the group reference.
If the second group matched the string "foo", for example, then passing the replacement string "$2bar" would cause "foobar" to be appended to the string buffer.A dollar sign ($) may be included as a literal in the replacement string by preceding it with a backslash (\$).

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值