[python]二十、python正则表达式详解

最新推荐文章于 2024-01-15 19:00:00 发布

FanMY_71

最新推荐文章于 2024-01-15 19:00:00 发布

阅读量603

点赞数

分类专栏： python 文章标签：正则表达式

本文链接：https://blog.csdn.net/m0_48638643/article/details/126286015

版权

python 专栏收录该内容

27 篇文章 22 订阅

订阅专栏

2.2、findall或者finditer

2.3、sub

2.4、编译正则re.compile("匹配正则")

5.2、findall对非捕获分组和捕获分组的反应

6、零宽断言

1、什么是正则？

正则的目的

数据挖掘
- 从一大堆文本中找到一小堆文本时。如，从文本是寻找email, ip, telephone等
验证
- 使用正则确认获得的数据是否是期望值。如，email、用户名、IP地址是否合法等
非必要时慎用正则，如果有更简单的方法匹配，可以不使用正则
指定一个匹配规则，从而识别该规则是否在一个更大的文本字符串中。
正则表达式可以识别匹配规则的文本是否存在
还能将一个规则分解为一个或多个子规则，并展示每个子规则匹配的文本

1.1、正则表达式的优缺点

优点：提高工作效率、节省代码
缺点：复杂，难于理解

2、re模块

re模块是一个标准库，无需安装

官方文档：https://docs.python.org/3/library/re.html

2.1、match和search

re.search
- 查找匹配项
- 接受一个正则表达式和字符串，并返回发现的第一个匹配。
- 如果完全没有找到匹配，re.search返回None
re.match
- 从字符串头查找匹配项
- 接受一个正则表达式和字符串，从主串第一个字符开始匹配，并返回发现的第一个匹配。
- 如果字符串开始不符合正则表达式，则匹配失败，re.match返回None

>>> import re
>>> result = re.search("sanchuang","hello world,this is sanchuang")   # 前面写匹配字段，后面写要匹配的字符串
>>> result
<_sre.SRE_Match object; span=(20, 29), match='sanchuang'>  # 这是一个match对象
>>> result = re.search("sanchuang1","hello world,this is sanchuang")  # 这里没有匹配到字符串
>>> result
>>> print(result)
None

>>> result = re.search("san.*$","hello world,this is sanchuang")  # 匹配"san"开头的往后所有的字符串
>>> result
<_sre.SRE_Match object; span=(20, 29), match='sanchuang'>

>>> result = re.match("san.*$","hello world,this is sanchuang")
>>> result
>>> result = re.match("hello","hello world,this is sanchuang")
>>> result
<_sre.SRE_Match object; span=(0, 5), match='hello'>

# 如果匹配到了就会返回一个match对象；如果没有匹配上那就返回一个None
"""

"""
# match只能从字符串开头查找，开始的部分没有，那就匹配不上。也就是说只匹配字符串的开始

>>> result = re.search("sanchuang","hello world,this is sanchuang  sanchuang") # 这个匹配是一个字符一个字符匹配的
>>> result
<_sre.SRE_Match object; span=(20, 29), match='sanchuang'>
>>> dir(result)
['__class__', '__copy__', '__deepcopy__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'end', 'endpos', 'expand', 'group', 'groupdict', 'groups', 'lastgroup', 'lastindex', 'pos', 're', 'regs', 'span', 'start', 'string']
>>> result.start()
20
>>> result.end()  # 匹配的是一个左闭右开区间
29
>>> result.group()  # 显示匹配的结果
'sanchuang'

2.1.1、match对象

如上代码所示：

match.group(default=0)：返回匹配的字符串。
- group是由于正则表达式可以分拆为多个只调出匹配子集的子组。
- 0是默认参数，表示匹配的整个串，n 表示第n个分组
match.start()
- start方法提供了原始字符串中匹配开始的索引
match.end()
- end方法提供了原始字符串中匹配开始的索引
- star()和end()组成的区间是左闭右开的区间

2.2、findall或者finditer

作用：找到多个匹配

re.findall
- 查找并返回匹配的字符串，返回一个列表
re.finditer
- 查找并返回匹配的字符串，返回一个迭代器

"""
>>> msg = "i love pythonpython1python2"
>>> re.findall("python",msg)
['python', 'python', 'python']

>>> re.finditer("python",msg)   # finditer出来的是一个迭代器
<callable_iterator object at 0x7f9d520a3470>
>>> for i in re.finditer("python", msg):
...     print(i)
... 
<_sre.SRE_Match object; span=(7, 13), match='python'>
<_sre.SRE_Match object; span=(13, 19), match='python'>
<_sre.SRE_Match object; span=(20, 26), match='python'>

"""

2.3、sub

re.sub('匹配正则' , '替换内容' , 'string')

将string中匹配的内容替换为新内容

>>> msg = "i love python1 pythonyy python123 pythontt"

# /d，表示[0-9]
>>> re.sub("python\d","**",msg)  # 第一参数表示你要匹配的字符串，第二参数表示你要替换的字符串，第三个参数表示你要进行替换操作的字符串
'i love ** pythonyy **23 pythontt'

2.4、编译正则re.compile("匹配正则")

编译正则的特点：

复杂的正则可复用。
使用编译正则更方便，省略了参数。
re模块缓存它即席编译的正则表达式，因此在大多数情况下，使用compile并没有很大的性能优势

>>> msg = "i love python1 pythonyy python123 pythontt"
>>> reg = re.compile("python[0-9]")
>>> reg.findall(msg)
['python1', 'python1']

3、基本正则匹配

最简单的正则表达式是那些仅包含简单的字母数字字符的表达式，复杂的正则可以实现强大的匹配