Python小记正则

最新推荐文章于 2024-08-23 16:53:34 发布

薛定谔的壳

最新推荐文章于 2024-08-23 16:53:34 发布

阅读量661

点赞数 2

分类专栏： Python 学习笔记文章标签： python 正则表达式

本文链接：https://blog.csdn.net/qq_45020818/article/details/121192019

版权

Python 学习笔记专栏收录该内容

24 篇文章 1 订阅

订阅专栏

本文详细介绍了Python中的正则表达式，包括修饰符、特殊字符、模式、re模块的各种函数如compile、search、match、findall、split、sub等的用法，以及贪婪匹配和match对象的方法。通过实例解析了正则表达式的各种应用场景，帮助读者深入理解和掌握Python正则操作。

摘要由CSDN通过智能技术生成

修饰符 / 模式

修饰符	描述
re.I	匹配对大小写不敏感
re.L	做本地化识别匹配
re.M	多行匹配，影响`^`和`&`
re.S	使`.`包括换行在内的所有字符
re.U	根据`Unicode`字符集解析字符。影响 `\w`， `\W`， `\b`， `\B`
re.X	该标志通过给予你更灵活的格式以便你将正则表达式写得更易于理解。

特别字符

字符	说明
$	匹配结尾
^	匹配开头
+	此前表达式`一次`或`多次`
?	此前表达式`零次`或`一次`
*	此前表达式`零次`或`多次`
.	匹配任意字符，除换行
\|	两项之间取`或`。 `a\|b` ：表示匹配 `a` 或 `b`
\	转义字符。要匹配`$`表达式应为 `\$`
()	子表达式、组
{}	限定符

\b	匹配一个边界
\B	匹配一个非边界
\w	匹配字母和数字
\W	匹配非字母和数字
\d	匹配数字
\D	匹配非数字
\s	匹配空白。`[\t\n\r\f\v]`
\S	匹配非空。

\A	匹配字符串的开始
\Z	匹配字符串的结束，如有换行，只匹配到换行前的结束字符【常用】
\z	匹配字符串结束
\G	匹配最后匹配完成的位置

特殊表达式

表达式	说明	示例
(?#…)	注释
[…]	字符组。	`[a-zA-z]`：匹配字母 `[hdjr]`：匹配`h`,`d`, `j` 或 `r` `[\u4e00-\u9fa5]`：匹配中文
[^…]	不在字符组的字符。	`[^abc]`：匹配除了`a`, `b`, `c`外的
{n[, m]}	匹配n次（或n~m次）	`6{6,8}`：匹配`'6'`6~8次
a \| b	或	`a\|b`：匹配 `a` 或 `b`
(re)	匹配括号内的表达式，也表示一个组	`(http)://`：匹配`http`
(?P<name>…)	命名组
(?P=name)	调用已匹配的命名组
/number	通过序号调用已匹配的组
(?:re)	类似`(re)`,但不表示组
(?imx:re)	在括号中使用 `i`，`m`，`x`可选标志
(?-imx:re)	在括号中不使用 `i`， `m`，`x`可选标志

手机端显示这个表格肯定辣眼睛，截个图放这。
在这里插入图片描述

re模块函数

compile()

将正则语句编译成Pattern对选
返回值： Pattern 对象

语法：

pt = re.compile(
	soucre,
	filename,
	mode[, flags[, dont_inherit]]
	)

参数：

source ：字符串或AST对象
filename：代码文件名（如果从文件读取代码的话
mode：编译代码的种类，可为 exec，eval，single

pt = re.compile(r'[aeiou]+$')
pt.findall('hello world')

search()

返回值：第一个成功的匹配(match对象) 或 None

语法：

re.search(
	pattern, 
	string,
	flags=0
)

参数：

pattern：模板
string：待匹配字符串
flags：标志位【是否区分大小写、多行匹配…】

>>> re.search(r'hello', 'hello world')
<re.Match object; span=(0, 5), match='hello'>

match对象：

.group() 返回匹配值
.groups()
.groupdict()
.span() 返回匹配位置
.start()
.end()

match()

从第一个字符开始匹配

返回值： match对象或 None

re.match(
	pattern, 
	string,
	flags=0
)

参数：

pattern：模板
string：待匹配字符串
flags：标志位【是否区分大小写、多行匹配…】

match对象：

.group() 返回匹配值
.groups()
.groupdict()
.span() 返回匹配位置
.start()
.end()

findall()

返回值：所有匹配结果列表或元组（有组的话）

语法：

re.findall(
	pattern,
	string,
	flags=0
	)

参数：

pattern：模板
string：待匹配字符串
flags：标志位【是否区分大小写、多行匹配…】

>>> re.findall('hello\d?', 'e hello2 world hello1')
['hello2', 'hello1']

split()

将字符串按正则规则切分
返回值：返回分隔后的列表

语法：

re.split(
	pattern,
	string,
	maxsplit=0,
	falgs=0
	)

参数：

pattern：模板
string：待匹配字符串
maxsplit：最大分割数
flags：标志位【是否区分大小写、多行匹配…】

print(re.split(
	r'\d{2,4}', 
	'hello2future666HHHHHH897LLL12beloved')
	)
# Output
'''
['hello2future', 'HHHHHH', 'LLL', 'beloved']
'''

sub() 和 subn()

搜索和替换
返回值：替换后的字符串 | subn: (替换后字符串，替换次数)

语法：

re.sub(
	pattern,
	repl,
	string[, count]
	)

参数：

pattern：匹配模板
repl：替换模板。可以是函数
string：原字符串
count：最大替换次数

print(re.sub('666', '999', '祝你666， 祝他666'))
print(re.subn('666', '999', '祝你666， 祝他666'))
# Output
'''
祝你999， 祝他999
('祝你999， 祝他999', 2)
'''

sub 高级用法

def ch(value):
    return str(len(value.group()))

print(re.sub(
	'[hH]+', # pattern
	ch, 	 # repl
	'hhhhhhhh, IamGreat, HHHHH, ni666, hhhhhhh'))
# Output
'''
8, IamGreat, 5, ni666, 7
'''

finditer()

跟findall类似

返回值： iterator 类。每个元素是一个 match 对象

语法：

re.finditer(pattern,string,flags=0)

贪婪匹配

正则匹配默认为贪婪模式：

st = 'Python ython thon hon on'
pt = re.compile(r'P.+n')
print(pt.findall(st))
# Output
'''
['Python ython thon hon on']
'''
# 并不是匹配到 Python 就结束

非贪婪
‘数量’ 后边加个问号?
‘数量’：+，* 等

st = 'Python ython thon hon on'
pt = re.compile(r'P.+?n')
print(pt.findall(st))
# Output
'''
['Python']
'''

macth 一些出现场景

方法、属性：

group()
groups()
groupdict()
span()
start()
end()

group()

只有一个组，取值时：match.group(0)
span: 匹配到的位置信息

string = 'hello world, hello python'
pt = re.compile(r'hello')
result = pt.search(string)
print(result,		 # <re.Match object; span=(0, 5), match='hello'>
      result.span(), # (0, 5)
      result.start(),# 0
      result.end(),  # 5
      result.group(),# hello
      sep='\n'
      )
# Output
'''
<re.Match object; span=(0, 5), match='hello'>
(0, 5)
0
5
hello
'''

groups()

有多个组
想要看按组匹配的结构还得 match.groups()

string = 'hello world, hello python'
result = re.search(
    r'(ello).+(thon)',
    string
)
print(result,
      result.span(),   # (1, 25)
      result.start(),  # 1
      result.end(),    # 25
      result.groups(), # ('ello', 'thon')
      sep='\n'
      )
# Output
'''
<re.Match object; span=(1, 25), match='ello world, hello python'>
(1, 25)
1
25
('ello', 'thon')
'''

groupdict()

当给组命名时，可以用groupdict()返回字典
或者也可以用 groups()返回元组

string = 'hello world, hello python'
result = re.search(
    r'(?P<halou>ello).+(?P<Py>thon)',
    string
)
print(result,
      result.span(),   # (1, 25)
      result.start(),  # 1
      result.end(),    # 25
      result.groups(), # ('ello', 'thon')
      result.groupdict(),# {'halou': 'ello', 'Py': 'thon'}
      sep='\n'
      )
# Output
'''
<re.Match object; span=(1, 25), match='ello world, hello python'>
(1, 25)
1
25
('ello', 'thon')
{'halou': 'ello', 'Py': 'thon'}
'''

薛定谔的壳

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
1
评论
Python小记正则

文章目录修饰符 / 模式特别字符特殊表达式re模块函数compile()search()match()findall()split()sub() 和 subn()sub 高级用法finditer()贪婪匹配修饰符 / 模式修饰符描述re.I匹配对大小写不敏感re.L做本地化识别匹配re.M多行匹配，影响^和&re.S使.包括换行在内的所有字符re.U根据Unicode字符集解析字符。影响 \w， \W， \b， \Bre.X该标
复制链接

扫一扫