正则表达式作用

最新推荐文章于 2021-08-10 00:44:02 发布

十七光年

最新推荐文章于 2021-08-10 00:44:02 发布

阅读量8.2k

点赞数 1

分类专栏： python学习文章标签： python 正则表达式

本文链接：https://blog.csdn.net/weixin_42684559/article/details/118608311

版权

python学习专栏收录该内容

22 篇文章 10 订阅

订阅专栏

匹配指定规则的字符串
常用方法
findall()：RE 匹配的所有子串，并把它们作为一个列表返回
findall()：RE 匹配的所有子串，并把它们作为一个列表返回

res=re.findall(‘www’,‘www.baidu.www.com’)

print(res)

输出：[‘www’, ‘www’]
match()：RE 是否在字符创开始的位置匹配（只匹配开始位置），如果不是起始位置匹配成功的话，match()就返回 none
匹配起始位置为 www，非其实位置返回 None,group()返回 re 匹配的字符串

res = re.match(‘www’,‘www.baidu.com’).group()

print(res)

输出：www
search()：RE 扫描整个字符串并返回第一个成功的匹配
在’www.baidu.com’中匹配’www’，返回匹配到的第一个

result = re.search(‘www’,‘www.baidu.com’).group()

print(result)

输出：www
finditer()：RE 匹配的所有子串，并把它们作为一个迭代器返回
finditer():匹配所有找到的子串，返回一个可迭代对象
res = re.finditer(‘www’,‘www.baidu.www.com’)
print(res)
for i in res:
print(i.group())
输出：
<callable_iterator object at 0x10e6df8d0>
www
www
结果处理函数
group()：返回 RE 匹配的字符串
span()：返回匹配到字符串开始索引位置和结束索引位置的元组（开始，结束）
start()：返回匹配到字符串的开始索引位置
end()：返回匹配到字符串的结束索引位置
sub()：替换字符串，返回替换后的字符串
res3=re.sub(’\d+’,‘我把数字替换了’,‘www.5555.com’)
print(res3)
输出：www.我把数字替换了。com
subn()：替换字符串，返回替换后的字符串，和替换次数
subn()：替换字符串，返回替换后的字符串，和替换次数
res = re.subn(’\d+’,‘我把数字替换了’,‘www.5555.com’)
print(res)
输出：(‘www.我把数字替换了。com’, 1)
split()：切割，根据某个字符切割(根据 +，-，*，\ 切割 su 这个字符串，±*有特殊意义需要反斜杠转义)
根据。进行切割，特殊意义的字符需要通过反斜杠()进行转义
res2 = re.split(’.’,‘www.baidu.com’)
print(res2)
输出：[‘www’, ‘baidu’, ‘com’]
正则匹配分类
单字符匹配（每次只匹配一个字符）

元字符描述
. 匹配任意一个字符（除了\n），匹配多次每次匹配一个字符，返回匹配结果的 list（匹配\n 时会报错）
[ ] 匹配[ ]中列举的任意一个字符（匹配[ ]中的任何一个字符）
\d 匹配数字，即 0-9
\D 匹配非数字，即不是数字
\s 匹配空白，即空格，tab 键（都是空白）
\S 匹配非空白
\w 匹配非特殊字符，即 a-z、A-Z、0-9、_、汉字
\W 匹配特殊字符，即非字母、非数字、非汉字（非下划线）

代码演示：
. ：匹配任意一个字符（除了\n），匹配多次每次匹配一个字符，返回匹配结果的 list
.：匹配任意一个字符（除了\n），匹配多次每次匹配一个字符，返回匹配结果的 list
需求：获取 h 开头的 2 个字符
匹配 h 开头的任意 2 个字符
res=‘h.’
ss=‘hello python’
result = re.findall(res,ss)
print(result)
输出：[‘he’, ‘ho’]
[ ]：匹配[ ]中列举的任意一个字符【匹配[ ]中的任何一个字符】
[]：匹配[ ]中列举的任意一个字符【匹配[ ]中的任何一个字符】
需求：获取 heo
匹配 heo 中任意一个
res=’[heo]’
ss=‘hello python’
result = re.findall(res,ss)
print(result)
输出：[‘h’,‘e’,‘o’,‘o’]
\d：匹配数字，即 0-9
\d：匹配数字，即 0-9
需求：匹配所有的数字
匹配数字
res=’\d’
ss=‘he4l5lo1pyt0h3on5’
result = re.findall(res,ss)
print(result)
输出：[‘4’, ‘5’, ‘1’, ‘0’, ‘3’, ‘5’]
\D：匹配非数字，即不是数字
\D：匹配非数字，即不是数字
需求：匹配非数字
匹配非数字
res=’\D’
ss=‘he4l5lo1pyt0h3o&n5’
result = re.findall(res,ss)
print(result)
输出：[‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘p’, ‘y’, ‘t’, ‘h’, ‘o’, ‘&’, ‘’, ‘n’]
\s：匹配空白，即空格，tab 键【都是空白】
\s：匹配空白，即空格，tab 键【都是空白】
匹配空白
res=’\s’
ss=‘hello python ’
result = re.findall(res,ss)
print(result)
输出：[’ ‘, ’ ‘, ’ ‘, ’ ‘, ’ ‘]
\S：匹配非空白
\S：匹配非空白
匹配非空白
res=’\S’
ss=‘hello python ’
result = re.findall(res,ss)
print(result)
输出：[‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘p’, ‘y’, ‘t’, ‘h’, ‘o’, ‘n’]
\w：匹配非特殊字符，即 a-z、A-Z、0-9、、汉字
\w：匹配非特殊字符，即 a-z、A-Z、0-9、、汉字
匹配非特殊字符
res=’\w’
ss=‘hello 好好学 python’
result = re.findall(res,ss)
print(result)
输出：[‘h’, ‘e’, ‘l’, ‘l’, ‘o’, ‘好’, ‘好’, ‘学’, ‘p’, ‘y’, ‘t’, ‘h’, ‘o’, ‘n’]
\W：匹配特殊字符，即非字母、非数字、非汉字【非下划线】
\W：匹配特殊字符，即非字母、非数字、非汉字【非下划线】
匹配特殊字符
res=’\W’
ss=‘hello 好好学 python_-*&^%
KaTeX parse error: Expected ‘EOF’, got ‘#’ at position 1: #̲@’ result = re.…
', ‘#’, ‘@’]

多字符匹配

元字符描述举例

匹配前一个字符出现 0 次或者无限次，即可有可无，【0 次–无限次】【贪婪模式】

匹配前一个字符出现 1 次或者无限次，即至少有 1 次匹配一个字符串【1 次–无限次】；【贪婪模式】第一个字符是 t，最后一个字符串是 o，中间至少有一个字符 re.match(“t.+o”, “two”)
？匹配前一个字符出现 0 次或者 1 次，即要么有 1 次，要么没有，【最多匹配一次】；【非贪婪模式】匹配 https re = ‘https?’
{n} 匹配前一个字符连续出现 n 次
{m,n} 匹配前一个字符连续出现从 m 到 n 次【至少出现 m 次，最多出现 n 次】【省略 n，匹配前一个字符中至少出现 m 次】

代码演示
：匹配前一个字符出现 0 次或者无限次，即可有可无
：匹配前一个字符出现 0 次或者无限次，即可有可无，未匹配到就是空
匹配 h 出现 0 次或者多次无限次数
res=‘h’
ss='hello 好好学 python_-&^%$#@’
result = re.findall(res,ss)
print(result)
输出：[‘h’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘h’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’, ‘’]
+：匹配前一个字符出现 1 次或者无限次，即至少有 1 次匹配一个字符串
+：匹配前一个字符出现 1 次或者无限次，即至少有 1 次匹配一个字符串
贪婪模式
匹配 o 至少出现 1 次或者多次
res=‘ho+’
ss=‘hello python hoo’
result = re.findall(res,ss)
print(result)
输出：[‘ho’, ‘hoo’]
?：匹配前一个字符出现 0 次或者 1 次，即要么有 1 次，要么没有
?：匹配前一个字符出现 0 次或者 1 次，即要么有 1 次，要么没有
匹配 o 出现零次或者一次
res=‘ho?’
ss=‘hello python hoo’
result = re.findall(res,ss)
print(result)
输出：[‘h’, ‘ho’, ‘ho’]
{n}：匹配前一个字符连续出现 n 次
{n}：匹配前一个字符连续出现 n 次
匹配 l 连续出现 2 次
res=‘hel{2}’
ss=‘hello python helll’
result = re.findall(res,ss)
print(result)
输出：[‘hell’, ‘hell’]
{m,n}：匹配前一个字符连续出现从 m 到 n 次
{m,n}：匹配前一个字符连续出现从 m 到 n 次
匹配 l 连续出现 2-4 次
res=‘hel{2,4}’
ss=‘hello python helll hellllll’
result = re.findall(res,ss)
print(result)
输出：[‘hell’, ‘helll’, ‘hellll’]
逻辑运算
元字符描述
或 (|) 将两个匹配条件进行逻辑“或”（or）运算

|：将两个匹配条件进行逻辑“或”（or）运算
匹配 he 或者 py
res=‘he|py’
ss=‘hello python helll hellllll’
result = re.findall(res,ss)
print(result)
输出：[‘he’, ‘py’, ‘he’, ‘he’]
边界值(了解)

元字符描述
^ 匹配输入字符串开始位置
$ 匹配输入字符串结束位置

代码演示
^：匹配输入字符串开始位置
^：匹配输入字符串开始位置
匹配 he 开头的
res=’^heh’
ss=‘hello python’
result = re.findall(res,ss)
print(result)
输出：[‘he’]
$ ：匹配输入字符串结束位置
匹配 on 结尾
res=‘on$’
ss=‘hello python’
result = re.findall(res,ss)
print(result)
匹配分组
元字符描述
( ) 只取括号内的值

代码演示
匹配 member_id，通过()取出对应的值，去掉井号
re_str = ‘#(\w.+?)#’
ss = ‘{“member_id”:"#member_id#",“key2”:“val2”,“bid_id”:"#bid_id#"}’
test_re = re.findall(re_str, ss)
输出：[‘member_id’, ‘bid_id’]
多字符匹配模式
贪婪模式：尽可能匹配多个(+)
尽可能多匹配数字
ss =‘hell123opy456thon’
res = re.findall(r’\d+’,ss)
print(res)
输出：[‘123’, ‘456’]
非贪婪模式：尽可能匹配更少的，在数量表达后面加问号(?)
ss =‘hell123opy456thon’
res = re.findall(r’\d?’,ss)
print(res)
输出：[’’, ‘’, ‘’, ‘’, ‘1’, ‘2’, ‘3’, ‘’, ‘’, ‘’, ‘4’, ‘5’, ‘6’, ‘’, ‘’, ‘’, ‘’, ‘’]