I think the core of the problem boils
down to: Is there a Python RegEx
notation that e.g. involves curly
braces repetitions and allows me to
capture 'some string, another string,
' ?
我不认为有这样的表示法。
但是正则表达式不仅仅是NOTATION的问题,也就是说用于定义正则表达式的RE字符串。这也是TOOLS的问题,也就是说功能。
Unfortunately, I can't use findall as
the string from the initial question
is only a part of the problem, the
real string is a lot longer, so
findall only works if I do multiple
regex findalls / matches / searches.
你应该在不延迟的情况下提供更多信息:我们可以更快地理解什么是约束。因为在我看来,为了解决你暴露的问题,findall()确实没问题:
import re
for line in ('string one, string two, ',
'some string, another string, third string, ',
# the following two lines are only one string
'Topaz, Turquoise, Moss Agate, Obsidian, '
'Tigers-Eye, Tourmaline, Lapis Lazuli, '):
print re.findall('(.+?), *',line)结果
['string one', 'string two']
['some string', 'another string', 'third string']
['Topaz', 'Turquoise', 'Moss Agate', 'Obsidian', 'Tigers-Eye', 'Tourmaline', 'Lapis Lazuli']现在,既然你在问题中“已经省略了很多复杂性”,那么findall()可能无法保持这种复杂性。然后将使用finditer(),因为它允许更灵活地选择匹配组
import re
for line in ('string one, string two, ',
'some string, another string, third string, ',
# the following two lines are only one string
'Topaz, Turquoise, Moss Agate, Obsidian, '
'Tigers-Eye, Tourmaline, Lapis Lazuli, '):
print [ mat.group(1) for mat in re.finditer('(.+?), *',line) ]得到相同的结果,并可以通过写其他表达式代替mat.group(1)来复杂化