Python正则表达式-分组操作

最新推荐文章于 2024-06-01 22:03:08 发布

人气小姜

最新推荐文章于 2024-06-01 22:03:08 发布

阅读量795

点赞数

分类专栏： Python 文章标签： python 字符串正则表达式

本文链接：https://blog.csdn.net/windyJ809/article/details/116451732

版权

Python 专栏收录该内容

22 篇文章 1 订阅

订阅专栏

group()方法

Tip：只有在正则表达式中采用了()分组，才可以使用group()方法进行提取操作

🌰1

# 提取电话号码中区号和电话号码
phone = '029-85860577'

result = re.match(r'(\d{3}|\d{4})-(\d{8})$', phone) # '$'表示匹配到字符串结尾
print(result)

# 分别提取
print(result.group()) # group()方法中如果不填参数，默认提取全部
print(result.group(1)) # 提起第一个小括号中内容
print(result.group(2)) # 提取第二个小括号中内容

# 输出
# 029-85860577
# 029
# 85860577

🌰2

# 提取html5标签
msg = "<html>hello</html>"

result = re.match(r'<[0-9a-zA-Z]+>(.+)</[0-9a-zA-Z]'>, msg)
print(result.group())
print(result.group(1))

# 输出：
# <html>hello</html>
# hello

number：\number，number=1-9。引用第number组的数据

上面例子2中，如果要提取的对象msg=<h1>hello<\html> 那么也是可以提取到的，但是我们通常html标签必须是前后对应的，这种实现方式显然不行，由此就引出了number的概念。

如果要提取的对象是msg=<h1>hello<\html> ，我们可以用number这样来实现，将要对应的标签名称用()括起来，然后在后面用number找这个对应的标签即可。如下面🌰

msg = "<h1>hello</html>"

result = re.match(r'(<[0-9a-zA-Z]+>)(.+)</\1>$', msg)  # 格式就是\数字1-9
print(result.group())
print(result.group(2))

# 输出
# None 因为<h1>和<html>并不匹配，所以正则表达式没有匹配上
# None


msg1 = "<html>hello</html>"

result = re.match(r'(<[0-9a-zA-Z]+>)(.+)</\1>$', msg)
print(result.group())
print(result.group(2))

# 输出：
# <html>hello</html>
# hello

还有多个标签需要前后对应匹配的例子如下

msg_f = '<html><h1>hello</html></h1>'
msg_t = '<html><h1>hello</h1></html>'

result1 = re.match(r'(<[0-9a-zA-Z]+>)(<[0-9a-zA-Z]+>)(.+)</\2></\1>$',msg_f)
result2 = re.match(r'(<[0-9a-zA-Z]+>)(<[0-9a-zA-Z]+>)(.+)</\2></\1>$',msg_t)

print(result1.group())
print(result2.group())

# 输出：
# None
# <html><h1>hello</h1></html>

使用起名的方式：(?P<名字>正则) (?P=名字)

Tip：用于多个分组时

msg = '<html><h1>abd</h1></html>'

result = re.match(r'<(?P<name1>\w+)><(?P<name2>\w+)>(.+)</(?P=name2)></(?P=name1)>', msg)
print(result.group())
print(result.group(1))
print(result.group(2))
print(result.group(3))

# 输出：
# <html><h1>abd</h1></html>
# html
# h1
# abd