正则表达式-模式匹配

最新推荐文章于 2024-05-05 11:36:24 发布

没名字的菜狗子

最新推荐文章于 2024-05-05 11:36:24 发布

阅读量200

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/qq_39196408/article/details/113404981

版权

Python 专栏收录该内容

12 篇文章 0 订阅

订阅专栏

模式匹配

利用 () 分组

假定需要对区号进行分离，添加（）可以实现分组：(\d\d\d)-(\d\d\d)-(\d\d\d\d),使用 group() 方法从分组中匹配文本。
向 group() 传入整数参数，参数介绍：
①传入 0 或不传入参数：返回整个匹配文本
②传入 1 获取第1组，传入 2 获取第2组，传入 3 获取第3组

另外，使用 groups() 方法可以返回所有分组
示例代码：

>>> import re
# group() 方法示例
>>> testReg = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')
>>> test = testReg.search('Test number is 123-456-7895 ')
>>> test.group()
'123-456-7895'
>>> test.group(1)
'123'
>>> test.group(2)
'456'
>>> test.group(3)
'7895'
>>> test.group(0)
'123-456-7895'
>>> 
# groups()方法示例
>>> test.groups()
('123', '456', '7895')
>>> t1,t2,t3 = test.groups()
>>> t1
'123'
>>> t2
'456'
>>> t3
'7895'
>>>

利用 | 匹配多个分组

其中，字符 ‘ |’ 成为管道，如果希望匹配多个表达式中的一个时，可以使用管道，例如 r’test1|test2’ 将匹配 test1 或 test2

注意，如果多个匹配项均出现在被查找字符串中，则第一次出现的匹配文本作为Match对象返回
使用 findall() 可以找到所有匹配的地方

示例代码：

>>> import re
>>> testReg = re.compile(r'test1|test2')
>>> test = testReg.search('A message to check if there are test1 test2 test3.')
>>> test.group()
'test1'

# 带前缀的多个字符串匹配、利用findall()显示所有匹配结果
>>> import re
>>> testReg = re.compile(r'test(1|2|3)')
>>> test = testReg.search('This is a test message for test1 test2 test3')
>>> test.group()
'test1'
>>> testReg.findall('This is a test message for test1 test2 test3')
['1', '2', '3']
# 遍历显示结果：
>>> resultList = testReg.findall('This is a test message for test1 test2 test3')
>>> for s in resultList:
	print('test'+str(s))

	
test1
test2
test3
>>>

利用 ? 实现可选匹配

有时候要匹配的模式是可选的，即该部分在或不在均可实现匹配，？表示之前的分组在这个模式中可选

示例代码：

# (wo)? 部分表示 模式wo是可选的分组，该正则表达式中wo将出现0 或 1 次。即既匹配 woman 也匹配 man
>>> import re
>>> testReg = re.compile(r'(wo)?man')
>>> test1 = testReg.search('A woman is speaking.')
>>> test2 = testReg.search('A man is speaking.')
>>> test1.group()
'woman'
>>> test2.group()
'man'
>>>

利用 * 匹配 0 次或多次

* 意味着匹配0次或多次，即 * 之前的分组在文中可以出现无数次，或者不存在

示例代码：

>>> import re
>>> testReg = re.compile(r'(Hello)*test')
>>> test1 = testReg.search('A message for Hellotest')
>>> test2 = testReg.search('A message for HelloHellotest')
>>> test3 = testReg.search('A message for HelloHelloHellotest')

>>> test1.group()
'Hellotest'
>>> test2.group()
'HelloHellotest'
>>> test3.group()
'HelloHelloHellotest'
>>>

利用 + 匹配 1 次或多次

与 * 不同，+ 意味着必须匹配至少一次 + 前面的分组。不是可选的而是必须一次

示例代码：

>>> import re
>>> testReg = re.compile(r'test(must)+included')
>>>> test = []
>>> for i in range(3):
	message = 'A message for test'+'must'*(i+1)+'included'
	test.append(testReg.search(message))
	print('+匹配'+str(i+1)+'个 must:'+test[i].group())

	
+匹配1个 must:testmustincluded
+匹配2个 must:testmustmustincluded
+匹配3个 must:testmustmustmustincluded
>>> 
# 尝试在 0 次匹配的字符串中查找：
>>>> test = testReg.search('A message for testincluded')
>>> test.group()
Traceback (most recent call last):
  File "<pyshell#102>", line 1, in <module>
    test.group()
AttributeError: 'NoneType' object has no attribute 'group'
>>>

利用 {} 匹配特定次数

如果需要一个分组重复特定次数，则需要在该正则表达式中分组后面加上一个 {} ，例如 (test){3} 表示匹配字符串 ’testtesttest’ ,除了匹配特定次数，还可以指定一个范围内的次数，例如： (t){3,5} 将匹配 ‘ttt’ ,‘tttt’ 和 ‘ttttt’ ，另外，可以不指定第一个或第二个数字，表示不限定最小值或最大值

示例代码：

# 重复特定次数
>>> import re
>>> testReg = re.compile(r'(Hello){3}')
>>> test = testReg.search('A message for HelloHelloHello and Hello test.')
>>> test.group()
'HelloHelloHello'
>>> 

# 指定范围
>>> import re
>>> list = []
>>> testReg = re.compile(r'(Hello){2,4}')
>>> for i in range(3):
		message = 'A test message for '+'Hello'*(i+2)+'.'
		list.append(testReg.search(message))
		print('Search Hello for '+str(i+2)+' times: '+list[i].group())

	
Search Hello for 2 times: HelloHello
Search Hello for 3 times: HelloHelloHello
Search Hello for 4 times: HelloHelloHelloHello
# 查看当前的match结果：
>>> list
[<re.Match object; span=(19, 29), match='HelloHello'>, <re.Match object; span=(19, 34), match='HelloHelloHello'>, <re.Match object; span=(19, 39), match='HelloHelloHelloHello'>]
>>> 
# 不限定最小值或最大值
# 匹配0，1，2或3个'Hello'
>>> import re
>>> testReg = re.compile(r'(Hello){,3}')		 
>>> testReg = re.compile(r'(Hello){,3}')
>>> test1 = testReg.search('nothing')
>>> test2 = testReg.search('Hello')
>>> test3 = testReg.search('HelloHelloHelloHelloHello')
>>> test1.group()
''
>>> test2.group()
'Hello'
>>> test3.group()
'HelloHelloHello'
>>> 

# 匹配两个或更多的'Hello'
>>> import re
>>> testReg = re.compile(r'(Hello){2,}')
>>> test1 = testReg.search('HelloHello')
>>> test1.group()
'HelloHello'
>>> test2 = testReg.search('HelloHelloHello')
>>> test2.group()
'HelloHelloHello'
>>> test3 = testReg.search('Hello')
>>> test3.group()
Traceback (most recent call last):
  File "<pyshell#93>", line 1, in <module>
    test3.group()
AttributeError: 'NoneType' object has no attribute 'group'
>>>

没名字的菜狗子

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
正则表达式-模式匹配

模式匹配利用 () 分组利用 | 匹配多个分组利用 ? 实现可选匹配利用 * 匹配 0 次或多次利用 + 匹配 1 次或多次利用 {} 匹配特定次数利用 () 分组假定需要对区号进行分离，添加（）可以实现分组：(\d\d\d)-(\d\d\d)-(\d\d\d\d),使用 group() 方法从分组中匹配文本。向 group() 传入整数参数，参数介绍：①传入 0 或不传入参数：返回整个匹配文本②传入 1 获取第1组，传入 2 获取第2组，传入 3 获取第3组另外，使用 groups()
复制链接

扫一扫

专栏目录