盘点Python正则表达式中的贪婪模式和非贪婪模式

最新推荐文章于 2024-09-09 17:52:44 发布

Python进阶者

最新推荐文章于 2024-09-09 17:52:44 发布

阅读量336

点赞数

原文链接：https://mp.weixin.qq.com/s?__biz=MzU3MzQxMjE2NA==&mid=2247501146&idx=1&sn=92851422cb5b4fc4a009cca47fbce4d1&chksm=fcc08371cbb70a673369ee4fab37f55eaa0d0312b5eb07d1a53e32735f36a56271e5cb4da194&scene=126&&sessionid=0

版权

Python 正则表达式贪婪模式非贪婪模式命名分组

关键词由CSDN通过智能技术生成

点击上方“Python共享之家”，进行关注

回复“资源”即可获赠Python学习资料

今

日

鸡

汤

潮落夜江斜月里，两三星火是瓜洲。

大家好，我是我是皮皮。

一、前言

前几天在Python最强王者交流群有个叫【杰】的粉丝问了一个关于Python正则表达式的问题，其中涉及到Python正则表达式中的贪婪模式和非贪婪模式，讨论十分火热，这里拿出来给大家分享下，一起学习。

二、解决过程

这里分享【小王】大佬的解答，一起来看看吧，下面是他给的一个示例代码。

import re

txt = "This is an HTML tag: <head>HEADER</head>. It means the head of the whole HTML document."
pattern1 = re.compile(r"<.*>")
pattern2 = re.compile(r"<.*?>")
result1 = re.findall(pattern1, txt)
result2 = re.findall(pattern2, txt)
print(result1)
print(result2)

输出结果如下图所示：

关于输出的解析如下：

我想匹配HTML标签中的数据，也就是<>之间的数据。

pattern1 = re.compile(r"<.*>")
pattern2 = re.compile(r"<.*?>")

这两种只相差了一个?，但是区别却很大。解析如下图所示：

直到什么时候停止呢？

这个就是贪婪模式的匹配方式，那么非贪婪模式呢？

小彩蛋

分享一个【小王】大佬的代码，实现的效果是将正则匹配结果写成命名分组Python代码。

常规写法如下所示：

import re

txt = "This is an HTML tag: <head>HEADER</head>. It means the head of the whole HTML document."

tag = re.compile(r"<([A-Za-z0-9]+)>.*?</\1>.*")
print(re.findall(tag, txt))

写成命名分组的写法如下所示：

txt = "This is an HTML tag: <head>HEADER</head>. It means the head of the whole HTML document."

tag = re.compile(r"<(?P<tag_mark>[A-Za-z0-9]+)>.*?</(?P=tag_mark)>.*")
print(re.findall(tag, txt))