Python进阶-re正则

最新推荐文章于 2024-07-28 15:46:11 发布

一城烟雨452

最新推荐文章于 2024-07-28 15:46:11 发布

阅读量474

点赞数 6

文章标签： python java 前端

本文链接：https://blog.csdn.net/weixin_65436886/article/details/136356275

版权

本文介绍了Python中re模块的基本概念，包括match、findall、search、fullmatch、finditer、split、sub和subn等方法的用法，以及正则表达式的各种模式匹配、边界、分组和贪婪/非贪婪模式的应用实例。

摘要由CSDN通过智能技术生成

1.python中的re模块

在python中，re模块是用于处理正则表达式的模块，提供了对正则表达式的支持，可以用来进行字符串匹配、搜索和替换。

简单来说就是字符串的匹配、搜索和替换。

2.re模块的常用方法

match方法：从头开始匹配，返回Match或者None

findall方法：找到所有，返回值的类型是列表，列表可以遍历打印

search方法：找到一个，匹配整个字符串

fullmatch方法：从头到尾匹配，返回Match或者None

finditer方法：找到所有，返回迭代器iterator 每一个元素都是Match

split方法：分割，返回列表，可以指定分割个数

sub方法：替换，返回一个新的字符串，可以指定替换个数，不写替换个数，默认全部替换

subn方法：替换，返回的是一个元组，（新字符串，替换个数）

"." 匹配任意字符

\d 数字 \D 非数字

\w 字母数字下划线 \W 非字母数字下划线

\s 空白字符：空格制表符换行 \S 非空白字符

* 出现0-n次 + 至少有一个 ? 尽可能取少

.*贪婪模式，取到任意多个 .*?非贪婪模式，尽可能取少

{n}匹配n次 {m,n}匹配m-n次

()分组，匹配两个连续相同的内容 \n取前面的分组匹配的内容

( | )分组的内容可以是|左右的任一内容，且\n的内容要和分组内容保持一致才能匹配到

边界： ^ 以开头 $以结尾 \b 匹配单词边界 \B匹配非单词边界

模式：re.I 忽略大小写

re.M 多行模式

3.re使用案例

match方法使用：

# match从开头匹配，返回Match或者None
# r.group()打印出r的内容
r = re.match("hello", "hello world")
print(type(r), r, r.group())
#<class 're.Match'> <re.Match object; span=(0, 5), match='hello'> hello

# re.I忽略大小写
r = re.match("hello", "Hello world", re.I)
print(type(r), r, r.group())
#<class 're.Match'> <re.Match object; span=(0, 5), match='Hello'> Hello

# .代表任意一个字符
r = re.match(".....", "Hello world", re.I)
print(type(r), r, r.group())
#<class 're.Match'> <re.Match object; span=(0, 5), match='Hello'> Hello

findall方法使用：

# findall，找到所有，返回值的类型是列表
r = re.findall(r".", "hello world")
print(type(r), r)
#<class 'list'> ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']

search方法使用：

# search匹配一个，匹配整个字符串
# 在整个字符串中匹配，找到第一个就返回，返回Match或者None
r = re.search(r"\d", "a2b3c")
print(r, type(r))
if r:
    print(r.group())
#<re.Match object; span=(1, 2), match='2'> <class 're.Match'>
#2

fullmatch方法使用：

# fullmatch从头到尾匹配，返回Match或者None
# 与match不同，match只是从头开始匹配，fullmatch必须严格按照字符串的格式进行匹配
# 如1a2b，匹配格式必须是\d\w\d\w，否则返回的就是None
r = re.fullmatch(r"\d{5}", "12345")
print(r, type(r))
if r:
    print(r.group())
r = re.fullmatch(r"\d\w\d\w", "1a2b")
print(r, type(r))
if r:
    print(r.group())
#<re.Match object; span=(0, 5), match='12345'> <class 're.Match'>
#12345
#<re.Match object; span=(0, 4), match='1a2b'> <class 're.Match'>
#1a2b

findall方法使用：

# finditer找到所有，返回迭代器iterator 每一个元素都是Match
r = re.finditer(r"\d", "hel4lo123")
print(r, type(r))
for e in r:
    print(type(e), e.group())
#<callable_iterator object at 0x0000023C46D84FA0> <class 'callable_iterator'>
#<class 're.Match'> 4
#<class 're.Match'> 1
#<class 're.Match'> 2
#<class 're.Match'> 3

split方法使用：

# split分割，返回列表，可以指定分割个数
r = re.split(r"\d", "1a2b3c4d5a", 3)
print(r, type(r))
#['', 'a', 'b', 'c4d5a'] <class 'list'>

sub方法使用：

# sub替换，返回一个新的字符串，可以指定替换个数，不写替换个数，默认全部替换
r = re.sub(r"\d", "+", "1a2d3c", 2)
print(r, type(r))
#+a+d3c <class 'str'>

subn方法使用：

# subn替换，返回的是一个元组，（新字符串，替换个数）
# 也可以指定替换个数，不指定则全部替换
r = re.subn(r"\d", "*", "1a2v3r4t5y", 3)
print(r, type(r))
#('*a*v*r4t5y', 3) <class 'tuple'>

重复 *

# 重复  * 出现0-n次
# 有任意个数a和一个b
r = re.findall(r"a*b", "aaaabcabcb")
print(r, type(r))
#['aaaab', 'ab', 'b'] <class 'list'>

+ 至少有一个

# +至少有一个a
r = re.findall(r"a+", "aaaabcabcb")
print(r, type(r))
#['aaaa', 'a'] <class 'list'>

? 尽可能取少

# ?尽可能取少
r = re.findall(r"a?", "aaaabcabcb")
print(r, type(r))
#['a', 'a', 'a', 'a', '', '', 'a', '', '', '', ''] <class 'list'>

贪婪模式和非贪婪模式

# .*贪婪模式，取到任意多个
# .*?非贪婪模式，尽可能取少
r = re.findall(r".*", "aaaabcabcb")
print(r, type(r))
r = re.findall(r".*?", "aaaabcabcb")
print(r, type(r))
#['aaaabcabcb', ''] <class 'list'>
#['', 'a', '', 'a', '', 'a', '', 'a', '', 'b', '', 'c', '', 'a', '', 'b', '', 'c', '', 'b', ''] <class 'list'>

{n}匹配n次 {m,n}匹配m-n次

r = re.findall(r"\d{3}", "123456789")
print(r, type(r))
r = re.findall(r"\d{2,3}", "12345678")
print(r, type(r))
r = re.findall(r"\d{2,4}", "12345678910")
print(r, type(r))
#['123', '456', '789'] <class 'list'>
#['123', '456', '78'] <class 'list'>
#['1234', '5678', '910'] <class 'list'>

边界：

# 边界 ^ 以开头  $以结尾  \b 匹配单词边界  \B匹配非单词边界
# 以3开头，以d结尾，中间是贪婪模式，有\n换行符则使用re.M多行模式
r = re.findall(r"^3.*d$", "3bnoi89bn6d\n3999uigbd", re.M)
print(r, type(r))
r = re.findall(r".*?\b", "hello python hello hi ")
print(r, type(r))
r = re.findall(r".*?\B", "hello python hello hi ")
print(r, type(r))

#['3bnoi89bn6d', '3999uigbd'] <class 'list'>
#['', 'hello', '', ' ', '', 'python', '', ' ', '', 'hello', '', ' ', '', 'hi', ''] <class 'list'>
#['h', '', 'e', '', 'l', '', 'l', '', 'o p', '', 'y', '', 't', '', 'h', '', 'o', '', 'n h', '', 'e', '', 'l', '', 'l', '', 'o h', '', 'i ', ''] <class 'list

分组：

# ()分组，匹配两个连续相同的内容
# \n取前面的分组匹配的内容
# ( | ) 分组
r = re.findall(r"(\d)\1", r"112233")
print(r)  #['1', '2', '3']
# +可以匹配更多的连续相同内容，只返回一个
r = re.findall(r"(\d)a\1+", r"22a221a13a33")
print(r)
# ( | )分组的内容可以是|左右的任一内容，且\n的内容要和分组内容保持一致才能匹配到
r = re.findall(r"(\d|a)b\1+", r"3b3ab1")
print(r)  # ['3']
r = re.findall(r"(\d|a)b\1+", r"3b3aba")
print(r)  # ['3', 'a']

#['1', '2', '3']
#['2', '1', '3']
#['3']
#['3', 'a']