Groovy之旅系列之五(正则之分组)

最新推荐文章于 2024-06-07 17:03:46 发布

jaogun

最新推荐文章于 2024-06-07 17:03:46 发布

阅读量484

点赞数

分类专栏： open source 文章标签： groovy 正则表达式 string java 平台

open source 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Groovy正则表达式的一个最有用的特性就是能用正则表达式从另一个正则
表达式中俘获数据．看下面这个例子，如果我们想精确定位到Liverpool, England:

locationData = " Liverpool, England: 53° 25? 0? N 3° 0? 0? "

我们能用string的split()方法，来截取我们需要的Liverpool, England(这里需要把
逗号除去).或许我们可以采用正则表达式，对于下面的例子，您对语法可能有一点生疏．
第一步，我们定义一个正则表达式，把我们感兴趣的内容都放入圆括号内:

myRegularExpression = / ([a - zA - Z] + ), ([a - zA - Z] + ): ([ 0 - 9 ] + ). ([ 0 - 9 ] + ). ([ 0 - 9 ] + ). ([A - Z]) ([ 0 - 9 ] + ). ([ 0 - 9 ] + ). ([ 0 - 9 ] + ). /

下面我们定义一个matcher,它是用=~操作符来完成的．

matcher = ( locationData =~ myRegularExpression )

变量matcher包含 java.util.regex.Matcher ，并被Groovy进行了增强.你可以访问你的数据像在Java平台上一样对一个Matcher对象．一个更棒的方式就是用matcher,来访问一个二维数组．
我们可以来看看数据的第一维:

[ " Liverpool, England: 53° 25? 0? N 3° 0? 0? " , " Liverpool " , " England " , " 53 " , " 25 " , " 0 " , " N " , " 3 " , " 0 " , " 0 " ]

已经把满足条件的string加上原来的strng，组合成了一个数组．

这样我们就可以方便的输出我们想要的数据：

if (matcher.matches()) {
    println(matcher.getCount() + " occurrence of the regular expression was found in the string. " );
    println(matcher[ 0 ][ 1 ] + " is in the " + matcher[ 0 ][ 6 ] + " hemisphere. (According to: " + matcher[ 0 ][ 0 ] + " ) " )
     for ( int i = 0 ;i < matcher[ 0 ].size; i ++ )
    {
        println(matcher[ 0 ][i])
    }
}

非俘获组:

有时候我们需要定义一个非俘获组，来获得我们想要的数据．来看下面的例子，我们的目标是
过滤掉它的middle name:

names = [
     " Graham James Edward Miller " ,
     " Andrew Gregory Macintyre "
]

printClosure = {
    matcher = (it =~ / (. *? )( ? : . + ) + (. * ) / );   // notice the non-matching group in the middle
     if (matcher.matches())
        println(matcher[ 0 ][ 2 ] + " , " + matcher[ 0 ][ 1 ]);
}
names.each(printClosure);

输出:

Miller, Graham
Macintyre, Andrew

有人可能对非俘获组不太明白，通俗点说就是在已经俘获的组除去你不想要的字符或符号．
比如：

names =
[
" ZDW   love beijing " ,
" Angel   love beijing " ,
" Ghost   hate beijing "
]

我们只想要开头名字和结尾的城市，过滤掉love.这时
就用到了非俘获组．表示方法就是用?: 加上你要过滤的正则前面．

nameClosure = {
        myMatcher = (it =~ / (. *? )( ? :   . + ) + (. * ) / )
         if (myMatcher.matches())
        {
            println(myMatcher[ 0 ][ 1 ] + " " + myMatcher[ 0 ][ 2 ])
        }
}

names.each(nameClosure);

我们来分析一下这个：

( ? :　. + )

组都用()括起来，?:表示这是一个非俘获组其中中间是有一个空格的．这个取决
于原字符串中间的空格，如果是逗号或其它符号，换成相应的就可以了．
.+ 任意多个字符(最少１个)

替换:

我们可能有这样的需要，在一个字符串中，把指定的字符串或符号，换成我们想要的．
比如：

excerpt = " At school, Harry had no one. Everybody knew that Dudley's gang hated that odd Harry Potter " +
" in his baggy old clothes and broken glasses, and nobody liked to disagree with Dudley's gang. " ;
matcher = (excerpt =~ / Harry Potter / );
excerpt = matcher.replaceAll( " Tanya Grotter " );

matcher = (excerpt =~ / Harry / );
excerpt = matcher.replaceAll( " Tanya " );
println( " Publish it! " + excerpt);

这个例子中我们做了两件事情．一个是把Harry Potter换成了Tanya Grotter,另一个是
把Harry换成了Tanya.

Reluctant Operators

对于这个还是不翻译的好＂勉强操作符＂？．
对于.,*,+操作默认都是贪心的．意思就是说有时候把我们不想要的也
匹配进去了．这时我们就要用到Relucatant operators.

我们只想要皇帝的名字和所在世纪．

/ Pope (. * )( ? : . * ) ? ([ 0 - 9 ] + ) - ([ 0 - 9 ] + ) /

上面是正常分组表达式，我们简单的在.*+后面再加上个？就表示Reluctant operators.

自己试验一下看看输出什么:

popesArray = [
     " Pope Anastasius I 399-401 " ,
     " Pope Innocent I 401-417 " ,
     " Pope Zosimus 417-418 " ,
     " Pope Boniface I 418-422 " ,
     " Pope Celestine I 422-432 " ,
     " Pope Sixtus III 432-440 " ,
     " Pope Leo I the Great 440-461 " ,
     " Pope Hilarius 461-468 " ,
     " Pope Simplicius 468-483 " ,
     " Pope Felix III 483-492 " ,
     " Pope Gelasius I 492-496 " ,
     " Pope Anastasius II 496-498 " ,
     " Pope Symmachus 498-514 "
]

myClosure = {
    myMatcher = (it =~ / Pope (. *? )( ? : . * ) ? ([ 0 - 9 ] + ) - ([ 0 - 9 ] + ) / );
     if (myMatcher.matches())
        println(myMatcher[ 0 ][ 1 ] + " : " + myMatcher[ 0 ][ 2 ] + " to " + myMatcher[ 0 ][ 3 ]);
}
popesArray.each(myClosure);

基本上满足了我们的要求．
你可以尝试一下如果不加？看看会发生什么错误～．