python内置库之正则表达式re库

温小八

已于 2023-02-17 15:07:58 修改

阅读量420

点赞数

分类专栏： python自动化文章标签：正则表达式 python

于 2022-09-05 21:51:01 首次发布

本文链接：https://blog.csdn.net/wenxiaoba/article/details/126692082

版权

python自动化专栏收录该内容

32 篇文章 14 订阅

订阅专栏

正则表达式对象转换

compile()

格式：re.compile(pattern)
作用：将字符串转换为正则表达式对象
参数：正则表达式
返回：正则表达式对象
适用场景：需要多次使用这个正则表达式（即多个内容需要使用同样的正则）

通过compile()方法获取到一个正则表达式对象后，可以通过正则表达式对象的match()、search()、findall()方法获取到Match对象

import re

prog = re.compile(r"\d+")   # 匹配一个或多个数字

# match()匹配开头，开头不匹配就返回None，匹配就返回Match对象
result = prog.match("122.56log32")
print(type(result))
print(result)
print(prog.match("e22.56log32"))

print("------------------------------")
# search()整个字符串中搜索第一个匹配的值
result2 = prog.search("122.56log32")
print(type(result2), "+++", result2)

print("------------------------------")
# findall()在整个字符串中搜索所有符合正则表达式的字符串，返回列表
result3 = prog.findall("122.56log32")
print(type(result3), "+++", result3)

执行结果为：

<class 're.Match'>
<re.Match object; span=(0, 3), match='122'>
None
------------------------------
<class 're.Match'> +++ <re.Match object; span=(0, 3), match='122'>
------------------------------
<class 'list'> +++ ['122', '56', '32']

可以看到match()、search()返回的是Match对象，以<re.Match object; span=(0, 3), match='122'>为例，span表示匹配到的内容的起始位置和结束位置

Match对象

match()、search()匹配到内容后，返回一个Match对象，Match类保存有匹配的信息

方法	描述
start()	匹配到的内容在被匹配字符串中的起始位置
end()	匹配到的内容在被匹配字符串中的结束位置
span()	start()和end()组成的元组
group()	提出分组截获的字符串
string	属性，被匹配的字符串

import re

prog = re.compile(r"\d+")   # 匹配一个或多个数字

result = prog.match("122.56log32")
print(result)

print("以下为匹配到的内容信息：")
print("起始位置：", result.start())
print("结束位置：", result.end())
print("位置元组：", result.span())   # 即起始位置和结束位置组成的元组
print("匹配到的数据：", result.group())
print("被匹配的字符串：", result.string)

执行结果为：

<re.Match object; span=(0, 3), match='122'>
以下为匹配到的内容信息：
起始位置： 0
结束位置： 3
位置元组： (0, 3)
匹配到的数据： 122
被匹配的字符串： 122.56log32

从执行结果可以看出来，<re.Match object; span=(0, 3), match='122'>中，span=(0, 3)是匹配到的起始位置和结束位置，match='122’中的122为匹配到的内容

group()

方法	描述
group(num=0)	匹配的整个表达式的字符串，group() 可以一次输入多个组号，在这种情况下它将返回一个包含那些组所对应值的元组。
groups	返回一个包含所有小组字符串的元组，从 1 到所含的小组号。

在正则表达式中，()是用来分组的，group()根据参数找到对应的组，将该组匹配的内容截获出来

import re

match = re.search(r"([0-9]*)([a-z]*)([0-9]*)", "123cctv678action")
print(type(match.group()), match.group())
print(type(match.group(1)), match.group(1))
print(type(match.group(2)), match.group(2))
print(type(match.group(3)), match.group(3))
print(type(match.group(1, 3)), match.group(1, 3))
print(type(match.groups()), match.groups())

执行结果为：

<class 'str'> 123cctv678
<class 'str'> 123
<class 'str'> cctv
<class 'str'> 678
<class 'tuple'> ('123', '678')
<class 'tuple'> ('123', 'cctv', '678')

从该示例中，正则表达式r"([0-9]*)([a-z]*)([0-9]*)"有3个分组[0-9]*、[a-z]*、[0-9]*，这3个分组匹配到了不同的内容，访问时按照顺序为1、2、3，所以group(1)是截取[0-9]*匹配的内容，group(2)是截取[a-z]*匹配的内容，group(3)是截取第三个括号内即[0-9]*匹配的内容。group(0)是匹配正则表达式整体结果，即表达式匹配到的内容（不看分组），groups()则是按照分组显示匹配到的内容。

匹配字符串

方法	描述
match(pattern, string, flags=0)	从字符串的开始处进行匹配
search(pattern, string, flags=0)	在整个字符串中搜索第一个匹配的值
findall(pattern, string, flags=0)	在整个字符串中搜索所有符合正则表达式的字符串，返回列表

3种匹配字符串的方法的参数均一致，相关含义如下：

pattern：正则表达式
string：要匹配的字符串
flags：控制匹配方式。
A：只进行ASCII匹配，
I：不区分大小写，M将^和$用户包括正则字符串的开始和结尾的每一行，
S：使用(.)字符匹配所有字符（包括换行符），
X：忽略模式字符串中未转义的空格和注释

import re

pattern = r"hog\w+"

match1 = re.match(pattern, "Hogwarts is a magic school", re.I)
print(match1)
match2 = re.match(pattern, "eHogwarts is a magic school", re.I)
print(match2)

print("----------------------")
match3 = re.search(pattern, "I like Hogwarts, and hogwarts's teaching magic", re.I)
print(match3)
match4 = re.search(pattern, "I lie hotdog")
print(match4)

print("----------------------")
result = re.findall(pattern, "I like Hogwarts, and hogwarts's teaching magic", re.I)
print(type(result), "+++", result)

执行结果为：

<re.Match object; span=(0, 8), match='Hogwarts'>
None
----------------------
<re.Match object; span=(7, 15), match='Hogwarts'>
None
----------------------
<class 'list'> +++ ['Hogwarts', 'hogwarts']

替换字符串sub()

格式：sub(pattern, repl, string, count=0, flags=0)
根据正则表达式替换字符串，返回新的字符串

pattern：正则表达式
repl：要替换的字符串
string：要被查找并替换的原始字符串
count：可选，表示替换的最大次数，默认为0，表示替换所有匹配的
flags：可选，控制匹配方式，与前面的匹配字符串章节一致

import re

pattern = r"hog\w+"
data = "I like Hogwarts, and hogwarts's teaching magic"
result = re.sub(pattern, "money", data, flags=re.I)
print("原数据: ", data)
print("替换后: ", result)

执行结果为：

原数据:  I like Hogwarts, and hogwarts's teaching magic
替换后:  I like money, and money's teaching magic

分割字符串split()

格式：split(pattern, string, maxsplit=0, flags=0)
根据正则表达式分割字符串，返回列表

pattern：正则表达式
string：要被查找并分割的原始字符串
maxsplit：可选，表示最大分割次数，默认为0（不限制次数）
flags：可选，控制匹配方式，与前面的匹配字符串章节一致

import re

pattern = r"hog\w+"
data = "I like Hogwarts, and hogwarts's teaching magic"
result = re.split(pattern, data, flags=re.I)
print("原数据: ", data)
print("分割情况: ", result)

执行结果为：

原数据:  I like Hogwarts, and hogwarts's teaching magic
分割情况:  ['I like ', ', and ', "'s teaching magic"]

温小八

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录