如何编写与非贪婪匹配的正则表达式? [重复]

本文翻译自:How can I write a regex which matches non greedy? [duplicate]

This question already has an answer here: 这个问题已经在这里有了答案:

I need help about regular expression matching with non-greedy option. 我需要有关使用非贪婪选项进行正则表达式匹配的帮助。

The match pattern is: 匹配模式为:

<img\s.*>

The text to match is: 要匹配的文本是:

<html>
<img src="test">
abc
<img
  src="a" src='a' a=b>
</html>

I test on http://regexpal.com 我在http://regexpal.com上测试

This expression matches all text from <img to last > . 此表达式匹配从<img到last >所有文本。 I need it to match with the first encountered > after the initial <img , so here I'd need to get two matches instead of the one that I get. 我需要它与初始<img之后的第一个遇到的>匹配,因此在这里,我需要得到两个匹配项,而不是我得到的匹配项。

I tried all combinations of non-greedy ? 我尝试了所有非贪婪的组合? , with no success. ,但没有成功。


#1楼

参考:https://stackoom.com/question/nvTK/如何编写与非贪婪匹配的正则表达式-重复


#2楼

The non-greedy ? 不贪心? works perfectly fine. 工作完美。 It's just that you need to select dot matches all option in the regex engines ( regexpal , the engine you used, also has this option) you are testing with. 只需在要测试的正则表达式引擎中选择点匹配所有选项( regexpal ,您使用的引擎,也具有此选项)。 This is because, regex engines generally don't match line breaks when you use . 这是因为,使用时,正则表达式引擎通常与换行符不匹配. . You need to tell them explicitly that you want to match line-breaks too with . 您需要明确告知他们您也想将换行符与匹配.

For example, 例如,

<img\s.*?>

works fine! 工作正常!

Check the results here . 这里检查结果

Also, read about how dot behaves in various regex flavours. 另外,请阅读有关在各种正则表达式中的行为方式的信息


#3楼

The ? ? operand makes match non-greedy. 操作数使匹配非贪婪。 Eg .* is greedy while .*? 例如.*是贪婪的,而.*? isn't. 不是。 So you can use something like <img.*?> to match the whole tag. 因此,您可以使用<img.*?>来匹配整个标签。 Or <img[^>]*> . <img[^>]*>

But remember that the whole set of HTML can't be actually parsed with regular expressions. 但是请记住,实际上无法使用正则表达式来解析整个HTML集合。


#4楼

The other answers here presuppose that you have a regex engine which supports non-greedy matching, which is an extension introduced in Perl 5 and widely copied to other modern languages; 这里的其他答案以您有一个支持非贪婪匹配的正则表达式引擎为前提,该引擎是Perl 5中引入的扩展,并且已广泛复制到其他现代语言中。 but it is by no means ubiquitous. 但这绝不是普遍存在的。

Many older or more conservative languages and editors only support traditional regular expressions, which have no mechanism for controlling greediness of the repetition operator * - it always matches the longest possible string. 许多较早或更保守的语言和编辑器仅支持传统的正则表达式,而后者没有控制重复操作符* -它始终与最长的字符串匹配。

The trick then is to limit what it's allowed to match in the first place. 然后,诀窍是首先限制允许匹配的内容。 Instead of .* you seem to be looking for 而不是.*您似乎正在寻找

[^>]*

which still matches as many of something as possible; 仍然匹配尽可能多的东西 but the something is not just . 但是事情不仅仅是. "any character", but instead "any character which isn't > ". “任何字符”,而是“不是>任何字符”。

Depending on your application, you may or may not want to enable an option to permit "any character" to include newlines. 根据您的应用程序,您可能会或可能不希望启用允许“任何字符”包括换行符的选项。

Even if your regular expression engine supports non-greedy matching, it's better to spell out what you actually mean. 即使您的正则表达式引擎支持非贪婪匹配,也最好阐明您的实际意思。 If this is what you mean, you should probably say this, instead of rely on non-greedy matching to (hopefully, probably) Do What I Mean. 如果这您的意思,那么您可能应该这样说,而不是依靠非贪婪的匹配来(希望是,也许是)做到我的意思。

For example, a regular expression with a trailing context after the wildcard like .*?><br/> will jump over any nested > until it finds the trailing context (here, ><br/> ) even if that requires straddling multiple > instances and newlines if you let it, where [^>]*><br/> (or even [^\\n>]*><br/> if you have to explicitly disallow newline) obviously can't and won't do that. 例如,正则表达式中包含通配符后尾随背景.*?><br/>将跳过任何嵌套>直到找到尾随上下文(在这里, ><br/> ),即使需要跨越多个>实例和换行符(如果允许的话),显然, [^>]*><br/> (甚至如果必须明确禁止换行符,甚至是[^\\n>]*><br/> )也不会去做。

Of course, this is still not what you want if you need to cope with <img title="quoted string with > in it" src="other attributes"> and perhaps <img title="nested tags"> , but at that point, you should finally give up on using regular expressions for this like we all told you in the first place. 当然,如果您需要处理<img title="quoted string with > in it" src="other attributes"> and perhaps <img title="nested tags"> ,那么这仍然不是您想要的,但是一点,您最终应该放弃使用正则表达式,就像我们一开始就告诉您的那样。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值