python里使用正则表达式的组匹配自引用

最新推荐文章于 2024-02-27 06:45:00 发布

caimouse

最新推荐文章于 2024-02-27 06:45:00 发布

阅读量1.1k

点赞数

分类专栏： milang(小语）文章标签： python tensorflow 正则表达式

本文链接：https://blog.csdn.net/caimouse/article/details/78533174

版权

milang(小语）专栏收录该内容

389 篇文章 7 订阅

订阅专栏

在前面学习过组的匹配，也就是一个括号包含就叫做一个组。在一个复杂一点的正则表达式里，比如像（1）（2）（3）这样，就匹配三组，如果想在这个表达式里引用前面匹配的组，怎么办呢？其实最简单的方式是通过组号来引用，比如像（1）（2）（3）——\1。使用“\num”的语法来自引用，如下例子：

#python 3.6
#蔡军生 
#http://blog.csdn.net/caimouse/article/details/51749579
#
import re

address = re.compile(
    r'''

    # The regular name
    (\w+)               # first name
    \s+
    (([\w.]+)\s+)?      # optional middle name or initial
    (\w+)               # last name

    \s+

    <

    # The address: first_name.last_name@domain.tld
    (?P<email>
      \1               # first name
      \.
      \4               # last name
      @
      ([\w\d.]+\.)+    # domain name prefix
      (com|org|edu)    # limit the allowed top-level domains
    )

    >
    ''',
    re.VERBOSE | re.IGNORECASE)

candidates = [
    u'First Last <first.last@example.com>',
    u'Different Name <first.last@example.com>',
    u'First Middle Last <first.last@example.com>',
    u'First M. Last <first.last@example.com>',
]

for candidate in candidates:
    print('Candidate:', candidate)
    match = address.search(candidate)
    if match:
        print('  Match name :', match.group(1), match.group(4))
        print('  Match email:', match.group(5))
    else:
        print('  No match')

结果输出如下：

Candidate: First Last <first.last@example.com>
Match name : First Last
Match email: first.last@example.com
Candidate: Different Name <first.last@example.com>
No match
Candidate: First Middle Last <first.last@example.com>
Match name : First Last
Match email: first.last@example.com
Candidate: First M. Last <first.last@example.com>
Match name : First Last
Match email: first.last@example.com

在这个例子里，就引用了第1组first name和第4组last name的值，实现了前后不一致的EMAIL的姓名，就丢掉它。