python中正则表达式使用

最新推荐文章于 2023-08-27 11:21:07 发布

panda-star

最新推荐文章于 2023-08-27 11:21:07 发布

阅读量594

点赞数 1

分类专栏： python

本文链接：https://blog.csdn.net/chinabestchina/article/details/96912882

版权

python 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

python中正则表达式使用

文章目录

python中正则表达式使用

一、简介

这里介绍python中的正则表达式使用，包含正则表达式常用规则、常用方法、贪婪与非贪婪匹配、分组、断言等操作。

二、使用

这里预先定义待匹配字符串为：

#待匹配字符串
str = 'make progress everyday ! 123456, and good aNd yes and haha AND 123 '

2.1 常用规则

2.1.1 正则表达式字符串写法

正则表达式是一个字符串，在表达前添加 r 可以避免简写时对如 / 进行转译。如：

pat = re.compile(r'\d+')

2.1.2 常用匹配规则

#符号
.：匹配除\n外的任意字符
[]：匹配中括号内指定字符
[^]：匹配除中括号内指定字符外的其他任意字符
()：分组

#匹配简写
\d：匹配数字
\D：匹配非数字
\w：匹配数字、字母、下划线
\W：匹配非数字、字母、下划线
\s：匹配空白，空格和tab
\S：匹配非数字
\b：匹配数字
\B：匹配非数字

#匹配次数
*：匹配大于等于0次
+：匹配大于0次
?：匹配0或1次
{min,max}：匹配在min和max指定次数之间

#特殊规则
re.I 忽略大小写,同re.IGNORECASE,或同分组中的（?i:正则表达式）。下面其他模块类似
re.S 使.匹配包含换行符在内的所有字符
re.M 多行匹配，影响开头和结束符，即： ^和$
re.X 为了增加可读性，忽略空格和  # 后面的注释

2.1.3 贪婪与非贪婪匹配

匹配默认是贪婪匹配（也就是尽量多的匹配字符串），正则后添加?即为非贪婪模式，如：.*? , \d+? ，
示例如：

# 非贪婪
obj = re.findall('\d+?', str)  
print(obj) # ['1', '2', '3', '4', '5', '6', '1', '2', '3']
# 贪婪
obj = re.findall('\d+', str) 
print(obj) # ['123456', '123']

2.2 常用方法

2.2.1 编译

#re.compile 预编译正则，可多次使用，编译后可直接调用相关函数
pat = re.compile(r'\d+')
print(pat.findall(str)) #['123456', '123']

2.2.2 匹配

# re.match 从文本开头匹配,如果一开始就失败，则返回None
obj = re.match('and', str)
print(obj.group() if obj else 'None') #None

# re.fullmatch 全文匹配
obj = re.fullmatch('.*?(\d+).*', str)
print(obj.groups() if obj else 'None') #('123456',)

2.2.3 查找

# re.search 从全文匹配,直到找到一处匹配返回
obj = re.search('and', str)
print(obj.group() if obj else 'None') #and

# re.findall 查找，返回所有
obj = re.findall('and', str, re.I)
print(obj) #['and', 'aNd', 'and', 'AND']

# re.finditer 查找，返回结果的迭代器
obj = re.finditer('and', str)
print(list(map(lambda m: m.group(), obj))) #['and', 'and']

2.2.4 替换

# re.sub 字符串替换,直接字符串替换
obj = re.sub('and', '*and*', str)
print(obj)  # make progress everyday ! 123456, *and* good aNd yes *and* haha AND 1

# re.sub 字符串替换,直接字符串替换,同时指定替换的次数
obj = re.sub('and', '*and*', str, 1)
print(obj)  # make progress everyday ! 123456, *and* good aNd yes and haha AND 123

# # re.sub 字符串替换,直接字符串替换,忽略大小写
obj = re.sub('and', '*and*', str, flags=re.IGNORECASE)
print(obj)  # make progress everyday ! 123456, *and* good *and* yes *and* haha *and* 123

# re.sub 字符串替换,函数替换, 替换函数也可使用lambda表达式，lambda m: '--{}--'.format(m.group())
def repl(m):
    return '--{}--'.format(m.group())
obj = re.sub('and', repl, str)
print(obj)  # make progress everyday ! 123456, --and-- good aNd yes --and-- haha AND 123

# re.sub 字符串替换,函数替换, 替换函数使用lambda表达式，
obj = re.sub('and', lambda m: '--{}--'.format(m.group()), str)
print(obj)  # make progress everyday ! 123456, --and-- good aNd yes --and-- haha AND 123

# re.subn 字符串替换,直接字符串替换,返回替换的字符串及替换的次数
obj = re.subn('and', '*and*', str, flags=re.I)
print(obj)  # ('make progress everyday ! 123456, *and* good *and* yes *and* haha *and* 123 ', 4)

2.2.5 切分

# re.split 分割
obj = re.split('and', str, flags=re.I)
print(obj)#['make progress everyday ! 123456, ', ' good ', ' yes ', ' haha ', ' 123 ']

2.3 分组

2.3.1 分组使用

使用()对指定表达式括起来即可分组。示例如下:

obj = re.match('\D*(\d+)\D*(\d+)\D*', str)
# 源字符串
print(obj.group()) #make progress everyday ! 123456, and good aNd yes and haha AND 123
# 匹配的字符串元组
print(obj.groups()) ('123456', '123')
# 匹配字符串的最大位置索引
print(obj.lastindex) #2
print(obj.group(1)) #123456
print(obj.group(2)) #123

# 引用为实际分组字符串完全一样,也就是引用的是文本，不是正则表达式
# 分组直接引用，格式：\分组位置 ， 如：\1 ,  \2
obj = re.match(r'.*(\d).*\1.*', str)
print(obj.groups()) #('3',)

# 分组命名，格式：(?P<分组名>正则表达式)
obj = re.match(r'\D*(?P<numOne>\d+)\D*(?P<numTwo>\d+)\D*', str)
# 返回匹配串的key/val映射类型
print(obj.groupdict())  # {'numOne': '123456', 'numTwo': '123'}
# 根据命名获取匹配串映射类型
print(obj.group('numOne'))  # 123456

# 分组命名引用，格式：(?P=引用的分组名)
obj = re.match(r'.*(?P<numOne>and).*(?P=numOne).*', str)
print(obj.groups()) #('and',)

# 分组直接引用，格式：\分组位置 ， 如：\1 ,  \2
obj = re.match(r'.*(?P<numOne>and).*\1.*', str)
print(obj.groups()) #('and',)

2.3.2 指定分组不捕获

在分组()开头添加 :? 表示该分组结果不捕获，示例如下：

# 内部分组捕获了
obj = re.findall(r'((and)\s+\w+)', str)
print(obj)  # [('and make', 'and'), ('and good', 'and'), ('and haha', 'and')]
# 内部分组没有捕获
obj = re.findall(r'((?:and)\s+\w+)', str)
print(obj)  # ['and make', 'and good', 'and haha']

2.3.3 分组特殊规则

添加特殊规则，在分组()开头使用?加i（同re.I，大小写忽略，其他标示类似）, m, 或 x 标示标记，如(?i:正则表达式)，
去掉特殊规则，在分组()开头使用?-加i（同re.I，大小写忽略，其他标示类似）, m, 或 x 标示标记，如(-i:正则表达式)，
示例如：

# 分组忽略大小写
obj = re.findall('(?i:and)', str)
print(obj) #['and', 'aNd', 'and', 'AND']

# 正则表达式整体忽略大小写，但指定分组不忽略大小写
obj = re.findall('(?-i:and)', str, re.I)
print(obj) #['and', 'and']

2.4 断言

断言可以对匹配的字符串前后进行规定（对匹配的字符串前后再次添加条件限制）。

# 后向肯定断言，格式：(?=正则表达式)
obj = re.findall(r'(?P<numOne>\s*\w+\s*)(?=progress)', str)
print(obj) #['make ']

# 前向肯定断言，格式：(?<=正则表达式)
obj = re.findall(r'(?<=progress)(?P<numOne>\s*\w+\s*)', str)
print(obj) #[' everyday ']

# 后向否定断言，格式：(?!正则表达式)
obj = re.findall(r'(?P<numOne>\w+\s+)(?!progress)', str)
print(obj) #['progress ', 'everyday ', 'and ', 'good ', 'aNd ', 'yes ', 'and ', 'haha ', 'AND ', '123 ']

# 前向否定断言，格式：(?<!正则表达式)
obj = re.findall(r'(?<!progress)(?P<numOne>\s+\w+)', str)
print(obj) #[' progress', ' 123456', ' and', ' good', ' aNd', ' yes', ' and', ' haha', ' AND', ' 123']

panda-star

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
python中正则表达式使用

python中正则表达式使用文章目录python中正则表达式使用一、简介二、使用2.1 常用规则2.1.1 正则表达式字符串写法2.1.2 常用匹配规则2.1.3 贪婪与非贪婪匹配2.2 常用方法2.2.1 编译2.2.2 匹配2.2.3 查找2.2.4 替换2.2.5 切分2.3 分组2.3.1 分组使用2.3.2 指定分组不捕获2.3.3 分组特殊规则2.4 断言一、简介这里介绍pyt...
复制链接

扫一扫

专栏目录