pythonfinditer_在Python中与finditer()重叠匹配

我正在使用正则表达式来匹配文本中的圣经经文引用.当前正则表达式是

REF_REGEX = re.compile('''

(?

(?Pq(?:uote)?\s+)? # Match optional 'q' or 'quote' followed by many spaces

(?P

(?:(?:[1-3]|I{1,3})\s*)? # Match an optional arabic or roman number between 1 and 3.

[A-Za-z]+ # Match any alphabetics

)\.? # Followed by an optional dot

(?:

\s*(?P\d+) # Match the chapter number

(?:

[:\.](?P\d+) # Match the starting verse number, preceded by ':' or '.'

(?:-(?P\d+))? # Match the optional ending verse number, preceded by '-'

)? # Verse numbers are optional

)

(?:

\s+(?: # Here be spaces

(?:from\s+)|(?:in\s+)|(?P\()) # Match 'from[:space:]', 'in[:space:]' or '('

\s*(?P\w+) # Match a word preceded by optional spaces

(?(lbrace)\)) # Close the '(' if found earlier

)? # The whole 'in|from|()' is optional

''', re.IGNORECASE | re.VERBOSE | re.UNICODE)

可以很好地匹配以下表达式:

"jn 3:16": (None, 'jn', '3', '16', None, None, None),

"matt. 18:21-22": (None, 'matt', '18', '21', '22', None, None),

"q matt. 18:21-22": ('q ', 'matt', '18', '21', '22', None, None),

"QuOTe jn 3:16": ('QuOTe ', 'jn', '3', '16', None, None, None),

"q 1co13:1": ('q ', '1co', '13', '1', None, None, None),

"q 1 co 13:1": ('q ', '1 co', '13', '1', None, None, None),

"quote 1 co 13:1": ('quote ', '1 co', '13', '1', None, None, None),

"quote 1co13:1": ('quote ', '1co', '13', '1', None, None, None),

"jean 3:18 (PDV)": (None, 'jean', '3', '18', None, '(', 'PDV'),

"quote malachie 1.1-2 fRom Colombe": ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'),

"quote malachie 1.1-2 In Colombe": ('quote ', 'malachie', '1', '1', '2', None, 'Colombe'),

"cinq jn 3:16 (test)": (None, 'jn', '3', '16', None, '(', 'test'),

"Q IIKings5.13-58 from wolof": ('Q ', 'IIKings', '5', '13', '58', None, 'wolof'),

"This text is about lv5.4-6 in KJV only": (None, 'lv', '5', '4', '6', None, 'KJV'),

但它无法解析:

"Found in 2 Cor. 5:18-21 ( Ministers": (None, '2 Cor', '5', '18', '21', None, None),

因为它返回(None,’in’,’2′,None,None,None,None)代替.

有没有办法让finditer()返回所有匹配项(即使它们重叠),还是有一种方法可以改善我的正则表达式,使其与最后一点正确匹配?

谢谢.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值