python正则表达式概念及用法

最新推荐文章于 2023-05-25 16:30:00 发布

「已注销」

最新推荐文章于 2023-05-25 16:30:00 发布

阅读量545

点赞数

文章标签： python

本文链接：https://blog.csdn.net/Programmer_Mt/article/details/104349767

版权

在这里插入图片描述
正则对象，re模块使用：

import re
text = 'mt is 25 years old . mt is 18 years old'
pattern = re.compile('\d+')  //形式一：如果这种模式要反复重用，可以使用这种方法，将模式编译存在一个变量pattern中，便于重用
pattern.findall(text)
['25', '18']
text = 'mt is 25 years old . mt is 18 years old'
re.findall('\d+',text)//形式二：如果这种模式只需要临时使用，可以直接使用re下的方法：（模式，内容）
['25', '18']

re.findall()用法：在给定的字符串里，找出所有符合正则模式的对象，放在一个列表里边

import re
text = 'Tom is 8 years old. Mike is 23 years old. Peter is 28 years old'
p_age = re.compile(r'\d+')
p_age.findall(text)
['8', '23', '28']
p_name = re.compile(r'[A-Z]\w+')//以大写字母开头，后面是字母或者数字的
p_name.findall(text)
['Tom', 'Mike', 'Peter']

re.match()用法：默认从以第一位开始搜索，且不是直接返回对象，返回一些附加信息

import re
pattern = re.compile(r'mateng0531')
text = 'mateng053112345678'
pattern.match(text)
<re.Match object; span=(0, 10), match='mateng0531'>//可以返回匹配对象的索引范围
text2 = ' mateng053112345678'
pattern.match(text2)//默认从第一位开始搜索，所以返回结果为空
pattern.match(text2,1)//，后可以设置从第几位开始搜索
<re.Match object; span=(1, 11), match='mateng0531'>

re.search()用法：功能与match一样，但是可以从任意位置开始搜索，而不只是开头或设置位置

import re
text2 = ' mateng053112345678'
pattern = re.compile(r'mateng0531')
pattern.search(text2)
<re.Match object; span=(1, 11), match='mateng0531'>

re.finditer()用法：将匹配到的结果放到一个迭代器里，可以使用循环来一个一个看

import re
text = 'Tom is 8 years old. Mike is 23 years old. Peter is 28 years old'
pattern = re.compile('\d+')
it = pattern.finditer(text)// 返回的是迭代器
for i in it://通过遍历来逐个打印
...     print(i)
...     
<re.Match object; span=(7, 8), match='8'>
<re.Match object; span=(28, 30), match='23'>
<re.Match object; span=(51, 53), match='28'>

MatchObject:

import re
text = 'Tom is 8 years old. Mike is 23 years old. Peter is 28 years old'
pattern = re.compile(r'(\d+).*?(\d+)') //()代表一个group(组),两个组里面模式是数字，两个组中间肯能是除\n外的任意字符
m = pattern.search(text)
m
<re.Match object; span=(7, 30), match='8 years old. Mike is 23'>
m.group() //查看整体
'8 years old. Mike is 23'
m.group(0) // 0代表查看整体，与括号中不填一样
'8 years old. Mike is 23'
m.group(1)//1代表查看第一个group
'8'
m.group(2)
'23'
m.groups()//查看全部group
('8', '23')
m.start()
7
m.start(1)//第一个group开始的索引
7
m.start(2)
28
m.end()
30
m.end(1)//第一个group结束的索引
8
m.end(2)
30

group命名及引用：（?P模式）

import re
text = 'Tom:98'
pattern = re.compile(r'(?P<name>\w+):(?P<score>\d+)')//在模式\w+前面加?P<name>,其中<name>代表给当前group起的名称
m = pattern.search(text)
m
<re.Match object; span=(0, 6), match='Tom:98'>
m.group('name')//可以直接通过group名称来调用
'Tom'
m.group('score')
'98'

综合应用：

spilt切割：

import re
text = 'beautiful is better than ugly.\nexplict is better than implicit.\nsimple is better than complex'
p = re.compile(r'\n')
p.split(text)\\ 将模式作为分隔符，进行切割
['beautiful is better than ugly.', 'explict is better than implicit.', 'simple is better than complex']
re.split(r'\W','Good morning')
['Good', 'morning']
re.split(r'-','Good-morning')
['Good', 'morning']
re.split(r'(-)','Good-morning')\\若将分隔符加入group，那么返回结果中也包括分隔符
['Good', '-', 'morning']
text = 'beautiful is better than ugly.\nexplict is better than implicit.\nsimple is better than complex'
re.split(r'\n',text,1)\\可以定义最大切割数，达到切割次数后，将剩下字符串当作一个整体来返回
['beautiful is better than ugly.', 'explict is better than implicit.\nsimple is better than complex']

sub (要替换的内容模式，需要替换成什么，在哪个变量中替换)：

import re
text = 'ord000\nord001\nord002'
re.sub(r'\d+','-',text)
'ord-\nord-\nord-'

import re
text = 'beautiful is *better* than ugly'
re.sub(r'\*(.*?)\*','<srong></strong>',text)
'beautiful is <srong></strong> than ugly'
re.sub(r'\*(.*?)\*','<srong>\g<1></strong>',text)//通过加入group，可以保留被替换项
'beautiful is <srong>better</strong> than ugly'
text2 = 'ord000\nord001\nord002'
re.sub(r'([a-z]+)(\d+)','\g<2>-\g<1>',text2)//可以通过group来替换原对象格式
'000-ord\n001-ord\n002-ord'
re.subn(r'([a-z]+)(\d+)','\g<2>-\g<1>',text2)//subn：替换并返回替换数量
('000-ord\n001-ord\n002-ord', 3)

编译标记（常用re.I,re.M,re.S）:

import re
re.findall(r'python','Python python PYTHON')
['python']
re.findall(r'python','Python python PYTHON',re.I)//re.I 忽略大小写
['Python', 'python', 'PYTHON']
re.findall(r'^<html>','\n<html>')
[]
re.findall(r'^<html>','\n<html>',re.M)// re.M 在每一行都找，不要仅限于第一行
['<html>']
re.findall(r'\d(.)','1\ne')
[]
re.findall(r'\d(.)','1\ne',re.S)//re.S 指定“ . ”匹配所有字符，包含\n
['\n']

模块及操作：re.escape() 逃逸字符，去除字符功能

import re
re.findall(r'^','^python^')
['']
re.findall(re.escape('^'),'^python^')//^本意是以xxx开头的，经过excape逃逸后，失去功能，意味着查找^这个字符
['^', '^']

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式概念及用法

正则对象，re模块使用：import retext = 'mt is 25 years old . mt is 18 years old'pattern = re.compile('\d+') //形式一：如果这种模式要反复重用，可以使用这种方法，将模式编译存在一个变量pattern中，便于重用pattern.findall(text)['25', '18']text = 'mt i...
复制链接

扫一扫