Python 正则表达式 Howto(2)

重复

Repeating Things

Being able to match varying sets of characters is the first thing regular expressions can do that isn’t already possible with the methods available on strings. However, if that was the only additional capability of regexes, they wouldn’t be much of an advance. Another capability is that you can specify that portions of the RE must be repeated a certain number of times.

正则表达式做的第一件事情就是匹配各种不同的字符集,而这时普通的字符串操作所不能及的。但是,如果正则表达式仅仅能做这些的话,它也不会如此的高级。正则表达式的另外一种能力是你可以制定某些部分可以重复一定的次数,当然也可能是无限的重复。

The first metacharacter for repeating things that we’ll look at is ** doesn’t match the literal character *; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once.

For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a characters), and so forth. The RE engine has various internal limitations stemming from the size of C’s int type that will prevent it from matching over 2 billion a characters; you probably don’t have enough memory to construct a string that large, so you shouldn’t run into that limit.


我们首先来看*。* 是一个特殊字符,他不会匹配字符*,相反,他可以对其前面的字符匹配不是一次, 而是0次或者多次。

例如,ca*t 会匹配ct (没有a), Cat (只有一个a),caaat(3个a),等等。 Python 的RE模块对匹配的次数还是有一些限制的,这些限制主要是因为C语言的int 的类型引起的。 在C语言中int是32位的, 所以匹配的次数被限制为2^31 个。另外的一个限制可能是你计算机的内存。 如果你的计算机内存足够,这个限制不是问题。 归根结底,*在理论上没有限制, 但是在实际实现上哈市有一些限制的。

Repetitions such as * are greedy; when repeating a RE, the matching engine will try to repeat it as many times as possible. If later portions of the pattern don’t match, the matching engine will then back up and try again with few repetitions.

A step-by-step example will make this more obvious. Let’s consider the expression a[bcd]*b. This matches the letter 'a', zero or more letters from the class [bcd], and finally ends with a 'b'. Now imagine matching this RE against the string abcbd.

Step Matched Explanation
1aThe a in the RE matches.
2abcbdThe engine matches [bcd]*, going as far as it can, which is to the end of the string.
3FailureThe engine tries to match b, but the current position is at the end of the string, so it fails.
4abcbBack up, so that [bcd]* matches one less character.
5FailureTry b again, but the current position is at the last character, which is a 'd'.
6abcBack up again, so that [bcd]* is only matching bc.
6abcbTry b again. This time the character at the current position is 'b', so it succeeds.

正则表达式中对于* 的匹配时基于贪心算法的。当正则引擎在匹配*的时候,其会尽可能的匹配足够所的次数。如果模式后面的部分不匹配,匹配引擎会回朔,并以较少的重复重试。

让我来看个例子吧。一个逐步进行的例子更容易理解。让我们考虑一下a[bcd]*b,这个模式会首先匹配字母a,然后是[bcd]中一个或者几个,最后以匹配一个b结束。假设我们的源串是abcbd

Step Matched Explanation
1aa 会被匹配因为模式的第一个字母就是a
2abcbd引擎会匹配[bcdb], 因为[bcd]*
3Failure最后会试图去匹配b, 但是失败了, 前面的已经匹配了
4abcb回朔,看最后一个字符是不是b
5Failure失败了, 最后一个是d 而不是b
6abc再次回朔, 看看倒数第二个是不是b,
6abcb成功。
The end of the RE has now been reached, and it has matched  abcb . This demonstrates how the matching engine goes as far as it can at first, and if no match is found it will then progressively back up and retry the rest of the RE again and again. It will back up until it has tried zero matches for  [bcd]* , and if that subsequently fails, the engine will conclude that the string doesn’t match the RE at all.

现在我们已经成功匹配了这个模式。 这个额例子描述了正则引擎在匹配*字符的时候是如何工作的, 如果没有匹配的话,引擎会回朔,并且不厌其烦重试剩下的串。他会重试到没有[bcd]*,也就是0次[bcd]*。如果连这个也失败了,引擎会告知这个模式不匹配这个串。


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值