如果您可以使用alternate regex module,那么可以使用单个正则表达式执行此操作。但它很复杂且难以理解。但它正确处理悬挂支撑。
regex模块支持访问所有捕获组的之前匹配,这对于以下工作至关重要:
>>> import regex
>>> # The regex behavior version seems to make no difference in this case, so both '(?V0)...' and '(?V1)...' will work.
>>> pattern = r'(?V0)[{] (?P\s+)? (?: (?: [^\s}]+ (?P\s+) )* [^\s}]+ (?P\s+)? )? [}]'
>>> string = 'abc and 123 {foo-bar bar baz } bit {yummi tummie} byte.'
>>> [s for m in regex.finditer(pattern, string, regex.VERBOSE) for s in m.captures('u')]
[' ', ' ', ' ', ' ']
简单地说,这个正则表达式找到'{' blanks? ((nonblanks blanks)* nonblanks blanks?)? '}'形式的匹配项,并将所有空白部分分配给名为u((?P...))的同一个捕获组。
它也适用于包含不匹配的{和}:的字符串
>>> # Even works with dangling braces:
>>> badstring = '}oo} { ab a b}} xy {xy x y}cd {{ cd } e{e }f{ f} { }{} }{'
>>> # Fully flattened result:
>>> [s for m in regex.finditer(pattern, badstring, regex.VERBOSE) for s in m.captures('u')]
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
>>> # Less flattened (e.g. for verification):
>>> [v for m in regex.finditer(pattern, badstring, regex.VERBOSE) for v in m.capturesdict().values()]
[[' ', ' ', ' '], [' ', ' '], [' ', ' '], [' '], [' '], [' '], []]
在Python 3.5.1 x64,regex 2016.3.2上测试。