python 系统学习正则表达式，方法和内容精简推荐

最新推荐文章于 2021-02-01 14:02:37 发布

Di_Panda

最新推荐文章于 2021-02-01 14:02:37 发布

阅读量231

点赞数

分类专栏： python基础

本文链接：https://blog.csdn.net/Di_Panda/article/details/105612593

版权

python基础专栏收录该内容

15 篇文章 0 订阅

订阅专栏

如何系统的学习正则表达式

精简版本

本版本将基本匹配语法做了一个汇总，并将上面网址的教程，没有说明的部分进行了详细解说，建议搭配食用。

匹配规则

元字符	描述
.	句号匹配任意单个字符除了换行符。
[ ]	字符种类。匹配方括号内的任意字符。
[^ ]	否定的字符种类。匹配除了方括号里的任意字符
*	匹配>=0个重复的在*号之前的字符。
+	匹配>=1个重复的+号前的字符。
?	标记?之前的字符为可选.
{n,m}	匹配num个大括号之前的字符或字符集 (n <= num <= m).
(xyz)	字符集，匹配与 xyz 完全相等的字符串.
\|	或运算符，匹配符号前或后的字符.
\	转义字符,用于匹配一些保留的字符 [ ] ( ) { } . * + ? ^ $ \
^	从开始行开始匹配.（整个句子）
$	从末端开始匹配.（整个句子）

简写	描述
.	除换行符外的所有字符
\w	匹配所有字母数字，等同于 [a-zA-Z0-9_]
\W	匹配所有非字母数字，即符号，等同于： [^\w]
\d	匹配数字： [0-9]
\D	匹配非数字： [^\d]
\s	匹配所有空格字符，等同于： [\t\n\f\r\p{Z}]
\S	匹配所有非空格字符： [^\s]
\f	匹配一个换页符
\n	匹配一个换行符
\r	匹配一个回车符
\t	匹配一个制表符
\v	匹配一个垂直制表符
\p	匹配 CR/LF（等同于 \r\n），用来匹配 DOS 行终止符

零宽度断言

符号	描述
?=	正先行断言-存在
?!	负先行断言-排除
?<=	正后发断言-存在
?<!	负后发断言-排除

标志

标志	描述
i	忽略大小写
g	全局搜索
m	多行修饰符：锚点元字符 ^ $ 工作范围在每行的起始。

re 模块

在 Python 中，我们可以使用内置的 re 模块来使用正则表达式。
re 模块的一般使用步骤如下：

使用 compile 函数将正则表达式的字符串形式编译为一个 Pattern 对象；
通过 Pattern 对象提供的一系列方法对文本进行匹配查找，获得匹配结果（一个 Match 对象）；
最后使用 Match 对象提供的属性和方法获得信息，根据需要进行其他的操作；

Python 的正则匹配默认是贪婪匹配。

MathObject 匹配对象

表现被匹配的模式
.group()：参数为0或空返回整个匹配，有参时返回特定分组匹配细节，参数也可以是分组名称。返回一个匹配对象的所有内容
.groups()：返回包含所有子分组的元组。返回匹配对象的第一个分组。
.start()：返回特定分组的起始索引
.end():返回特定分组的终止索引
.span()：返回特定分组的起止索引元组
.groupdict()：以字典表形式返回分组组名以及结果

import re

text = 'Tom is 8 years old. Jerry is 23 years old.'
pattern = re.compile(r'\d+')
print(pattern.findall(text))
pattern = re.compile(r'(\d+).*?(\d+)')
# 分组匹配返回的是一个迭代器
m = pattern.search(text)
print(m)
# 显示全部迭代器的内容
print(m.group())
# 返回第一个括号的内容
print(m.group(1))
# 返回第二个括号的内容
print(m.group(2))
# 第一个分组是从什么地方开始的，前闭后开
print(m.start(1))
print(m.end(1))
# 第二个分组是从什么地方开始的，前闭后开
print(m.start(2))
print(m.end(2))
# groups():类似于findall,不过findall找到所有的元素放到列表，但是groups返回包含所有子分组的元组
print(m.groups())
*********************运行结果***************
['8', '23']
<_sre.SRE_Match object; span=(7, 31), match='8 years old. Jerry is 23'>
8 years old. Jerry is 23
8
23
7
8
29
31
('8', '23')

.findall()：返回的是一个列表
.finditer()：返回的是一个迭代器

import re

text = 'Beautiful is better than ugly.'
pattern = re.compile(r'(\w+) (\w+)')
print(pattern.findall(text))

it = pattern.finditer(text)
for i in it:
    print(i.group())
***********************运行结果*************
[('Beautiful', 'is'), ('better', 'than')]
Beautiful is
better than

Group 分组

给括号里面的匹配对象取名

text = 'Tom:98 Di:100'
pattern = re.compile(r'(\w+):(\d+)')
m = pattern.search(text)
print(m.group())
# 给不同的括号取名字
pattern = re.compile(r'(?P<name>\w+):(?P<score>\d+)')
m = pattern.search(text)
print(m.group())
print(m.group(1))
# 这样下来就不用记顺序了，记名字就可以了
print(m.group('name'))
print(m.group('score'))

综合应用

split的应用

text = 'Beautiful is better than ugly.\nExplicit is better than implict.\nSimple is better than complex.'
pattern = re.compile(r'\n')
# 按照空格符切割
print(pattern.split(text))
# 按照非字母数字切割
print(re.split(r'\W+', 'Good morning'))
print(re.split(r'-', 'Good-morning'))
# 切割结果包括标点
print(re.split(r'(-)', 'Good-morning'))
# 最多切割一个
print(re.split(r'\n', text, 1))
#######################运行结果##################
['Beautiful is better than ugly.', 'Explicit is better than implict.', 'Simple is better than complex.']
['Good', 'morning']
['Good', 'morning']
['Good', '-', 'morning']
['Beautiful is better than ugly.', 'Explicit is better than implict.\nSimple is better than complex.']

Sub应用

text = 'Beautiful is *better* than ugly.'
pattern = re.compile(r'\*(.*)?\*')
print(pattern.sub(r'<strong></strong>', text))
# 保留better的内容
print(pattern.sub(r'<strong>\g<1></strong>', text))
# 标签内容加在括号的最前面
pattern = re.compile(r'\*(?P<html>.*)?\*')
print(pattern.sub(r'<strong>\g<html></strong>', text))

p = re.compile(r'(\w+) (\w+)')
s = 'hello 123, hello 456'
print(p.sub('hi \g<2>', s))
**************************运行结果*******************
Beautiful is <strong></strong> than ugly.
Beautiful is <strong>better</strong> than ugly.
Beautiful is <strong>better</strong> than ugly.
hi 123, hi 456

改变正则的默认行为

语言	功能
`re.I`	忽略大小写
`re.m`	多行搜索
`re.g`	全局搜索

Di_Panda

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
python 系统学习正则表达式，方法和内容精简推荐

如何系统的学习正则表达式正则表达式基础语法规则如何在python里面正确使用正则表达式实例正则表达式在线练习精简版本匹配规则元字符描述.句号匹配任意单个字符除了换行符。[ ]字符种类。匹配方括号内的任意字符。[^ ]否定的字符种类。匹配除了方括号里的任意字符*匹配>=0个重复的在*号之前的字符。+匹配>=1个重复的+...
复制链接

扫一扫

专栏目录