正如我上面所说的,正则表达式是一种主要的线性和单一规则的引擎 – 你可以选择贪婪捕获与否,但你不能同时选择它们.此外,大多数正则表达式引擎不支持重叠匹配(甚至那些支持它的人用子串/强制移动来伪造它)因为它也不适合正则表达式哲学.
如果您只查看两个子串之间的简单重叠匹配,可以自己实现:
def find_substrings(data, start, end):
result = []
s_len = len(start) # a shortcut for `start` length
e_len = len(end) # a shortcut for `end` length
current_pos = data.find(start) # find the first occurrence of `start`
while current_pos != -1: # loop while we can find `start` in our data
# find the first occurrence of `end` after the current occurrence of `start`
end_pos = data.find(end, current_pos + s_len)
while end_pos != -1: # loop while we can find `end` after the current `start`
end_pos += e_len # just so we include the selected substring
result.append(data[current_pos:end_pos]) # add the current substring
end_pos = data.find(end, end_pos) # find the next `end` after the curr. `start`
current_pos = data.find(start, current_pos + s_len) # find the next `start`
return result
哪个会产生:
substrings = find_substrings("BADACBA", "B", "A")
# ['BA', 'BADA', 'BADACBA', 'BA']
但是你必须修改它以获得更复杂的匹配.