Python的内置库re-2

最新推荐文章于 2024-08-24 07:25:39 发布

Python研究所

最新推荐文章于 2024-08-24 07:25:39 发布

阅读量258

点赞数

文章标签：列表字符串正则表达式 css regex

本文链接：https://blog.csdn.net/qq_40442753/article/details/109717371

版权

接上篇继续进行Python的内置库re的介绍。

4、内置对象

SRE_Match：匹配结果的对象，包含搜索结果索引，分组别名，分组索引，正则表达式，搜索字符串等属性。

s = 'Hello, Mr.Gumby : 2016/10/26'
m = re.search(', (?P<name>\w+\.\w+).*?(\d+)', s)
# 本次搜索的结束位置索引
print m.endpos
# output> 28


# 本次搜索匹配到的最后一个分组的别名
# 本次匹配最后一个分组没有别名
print m.lastgroup
# output> None


# 本次搜索匹配到的最后一个分组的索引
print m.lastindex
# output> 2


# 本次搜索开始位置索引
print m.pos
# output> 0


# 本次搜索使用的 SRE_Pattern 对象
print m.re
# output> <_sre.SRE_Pattern object at 0x000000000277E158>


# 列表，元素为元组，包含本次搜索匹配到的所有分组的起止位置 第一个元组为正则表达式匹配范围
print m.regs
# output> ((7, 22), (7, 15), (18, 22))


# 本次搜索操作的字符串
print m.string
# output> Hello, Mr.Gumby : 2016/10/26

SRE_Pattern：编译后的正则表达式对象，包含了模式、别名、分组数量，正则表达式等属性。

s = 'Hello, Mr.Gumby : 2016/10/26'
p = re.compile('''(?:        # 构造一个不捕获分组 用于使用 |
              (?P<name>\w+\.\w+)    # 匹配 Mr.Gumby
              |     # 或
              (?P<no>\s+\.\w+) # 一个匹配不到的命名分组
              )
              .*? # 匹配  : 
              (\d+) # 匹配 2016
              ''', re.X)


#
print p.flags
# output> 64
print p.groupindex
# output> {'name': 1, 'no': 2}
print p.groups
# output> 3
print p.pattern
# output> (?:        # 构造一个不捕获分组 用于使用 |
#              (?P<name>\w+\.\w+)    # 匹配 Mr.Gumby
#              |     # 或
#              (?P<no>\s+\.\w+) # 一个匹配不到的命名分组
#              )
#              .*? # 匹配  : 
#              (\d+) # 匹配 2016

5、分组用法

python 的正则表达式中用小括号 "(" 表示分组，按照每个分组中前半部分出现的顺序 "(" 判定分组的索引，索引从 1 开始，每个分组在访问的时候可以使用索引，也可以使用别名。

s = 'Hello, Mr.Gumby : 2016/10/26'
p = re.compile("(?P<name>\w+\.\w+).*?(\d+)(?#comment)")
m = p.search(s)


# 使用别名访问
print m.group('name')
# output> Mr.Gumby
# 使用分组访问
print m.group(2)
# output> 2016

有时候可能只是为了把正则表达式分组，而不需要捕获其中的内容，这时候可以使用非捕获分组。

s = 'Hello, Mr.Gumby : 2016/10/26'
p = re.compile("""
                (?:  # 非捕获分组标志 用于使用 |
                    (?P<name>\w+\.\w+)
                    |
                    (\d+/)
                )
                """, re.X)
m = p.search(s)
# 使用非捕获分组
# 此分组将不计入 SRE_Pattern 的 分组计数
print p.groups
# output> 2


# 不计入 SRE_Match 的分组
print m.groups()
# output> ('Mr.Gumby', None)

如果你在写正则的时候需要在正则里面重复书写某个表达式，那么你可以使用正则的引用分组功能，需要注意的是引用的不是前面分组的 正则表达式 而是捕获到的 内容，并且引用的分组不算在分组总数中。

s = 'Hello, Mr.Gumby : 2016/2016/26'
p = re.compile("""
                (?:  # 非捕获分组标志 用于使用 |
                    (?P<name>\w+\.\w+)
                    |
                    (\d+/)
                )
                .*?(?P<number>\d+)/(?P=number)/
                """, re.X)
m = p.search(s)
# 使用引用分组
# 此分组将不计入 SRE_Pattern 的 分组计数
print p.groups
# output> 3


# 不计入 SRE_Match 的分组
print m.groups()
# output> ('Mr.Gumby', None, '2016')


# 查看匹配到的字符串
print m.group()
# output> Mr.Gumby : 2016/2016/

6、环视用法

环视还有其他的名字，例如界定、断言、预搜索等，叫法不一。

环视是一种特殊的正则语法，它匹配的不是字符串，而是位置，其实就是使用正则来说明这个位置的左右应该是什么或者应该不是什么，然后去寻找这个位置。

环视的语法有四种，见第一小节元字符，基本用法如下：

s = 'Hello, Mr.Gumby : 2016/10/26  Hello,r.Gumby : 2016/10/26'


# 不加环视限定
print re.compile("(?P<name>\w+\.\w+)").findall(s)
# output> ['Mr.Gumby', 'r.Gumby']


# 环视表达式所在位置 左边为 "Hello, "
print re.compile("(?<=Hello, )(?P<name>\w+\.\w+)").findall(s)
# output> ['Mr.Gumby']


# 环视表达式所在位置 左边不为 ","
print re.compile("(?<!,)(?P<name>\w+\.\w+)").findall(s)
# output> ['Mr.Gumby']


# 环视表达式所在位置 右边为 "M"
print re.compile("(?=M)(?P<name>\w+\.\w+)").findall(s)
# output> ['Mr.Gumby']


# 环视表达式所在位置 右边不为 r
print re.compile("(?!r)(?P<name>\w+\.\w+)").findall(s)
# output> ['Mr.Gumby']