Python3.8.1 re模块的findall和finditer ：findall出现bug

最新推荐文章于 2022-09-24 16:45:00 发布

我就叫陌了这还能重名

最新推荐文章于 2022-09-24 16:45:00 发布

阅读量652

点赞数

文章标签： python regex 正则表达式

本文链接：https://blog.csdn.net/weixin_42628449/article/details/108369046

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Python3.8.1 re模块的findall和finditer

结论
bug表现

结论

先上结论，re.findall有bug，且不能返回详细匹配位置信息，推荐使用finditer代替

bug表现

我们来看这段代码：

import re

pattern = re.compile(r'(asdf)*')
string = 'asdfasdf'

print(pattern.findall(string))
print(list(pattern.finditer(string)))

好了，猜猜结果吧

不熟悉正则表达式的我来解释一下，正则表达式是用于按照固定模式匹配文本的一种被无数程序员在不断使用、开发中完善的一种描述方法，上面我给出的这个式子描述了一个匹配0个到无限多个asdf的表达式（*表示匹配0到无限多个，作用于前面的字符或者括号括起来的里面的全体）。

Python官方文档描述（从python中文文档copy的，某些小问题不要理会）：

'*', '+'，和 '?' 修饰符都是贪婪的；它们在字符串进行尽可能多的匹配。有时候并不需要这种行为。如果正则式 <.*> 希望找到 '<a> b <c>'，它将会匹配整个字符串，而不仅是 '<a>'。在修饰符之后添加?将使样式以非贪婪方式或者 :dfn:最小 方式进行匹配；尽量少的字符将会被匹配。使用正则式 <.*?> 将会仅仅匹配 '<a>'

按其描述，上述代码结果应为

# ['asdfadsf', '']
# [<re.Match object; span=(0, 8), match='asdfasdf'>, <re.Match object; span=(8, 8), match=''>]

实际为

# ['asdf', '']
# [<re.Match object; span=(0, 8), match='asdfasdf'>, <re.Match object; span=(8, 8), match=''>]

什么情况？？

换成非贪婪模式，pattern和结果如下：

pattern = re.compile(r'(asdf)*?')

print(pattern.findall(string))
# ['', 'asdf', '', 'asdf', '']
print(list(pattern.finditer(string)))
# [<re.Match object; span=(0, 0), match=''>, <re.Match object; span=(0, 4), match='asdf'>, <re.Match object; span=(4, 4), match=''>, <re.Match object; span=(4, 8), match='asdf'>, <re.Match object; span=(8, 8), match=''>]

我就叫陌了这还能重名

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
Python3.8.1 re模块的findall和finditer ：findall出现bug

Python3.8.1 re模块的findall和finditer结论bug表现结论先上结论，re.findall有bug，且不能返回详细匹配位置信息，推荐使用finditer代替bug表现我们来看这段代码：import repattern = re.compile(r'(asdf)*')string = 'asdfasdf'print(pattern.findall(string))print(list(pattern.finditer(string)))好了，猜猜结果吧不熟悉
复制链接

扫一扫