Python正则表达式复习（）

最新推荐文章于 2021-12-11 13:04:20 发布

杰益

最新推荐文章于 2021-12-11 13:04:20 发布

阅读量125

点赞数

分类专栏： Python 文章标签：正则表达式 python perl

本文链接：https://blog.csdn.net/qq_43109064/article/details/120352719

版权

Python 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

正则表达式

作用—：定位文本中的字符串
1．搜索
2．匹配

搜索（search）:
匹配模式:以a开头，以9结尾
xabd9xda34afg9da
搜索结果:
abd9
a34afg9

匹配（match）:

匹配模式:以x开头，以8结尾
xab8:匹配
xxx5 :不匹配
bxcd8:不匹配（即使包含也不匹配）

import re
from pprint import pprint
pattern ='D.*?s'
text = "Does this text match the pattern? "
match = re.search(pattern,text)
print(type(match))#<class 're.Match'>
# match 包含起始和结束 
start = match.start()
end = match.end()
print(f'pattern:{match.re.pattern}\nmatch.string:{match.string} \nstart,end: {start},{end}\n{text[start:end]} ')

<class 're.Match'>
pattern:D.*?s
match.string:Does this text match the pattern?  
start,end: 0,4
Does

编译正则表达式

为什么要编译正则表达式:
频繁使用的正则表达式,编译后会更高效
解释型编程语言，每次运行程序，都需要进行词法、语法分析
编译型编程语言，只有第一次运行程序，需要进行词法、语法分析，以后再运行，就直接运行可执行程序(exe等)


regexes=[re.compile(p) for p in ['this','that']] ## 编译正则表达式列表
print(regexes)# 需要使用时，将其放在最内层
text = 'Does this text match the pattern?'
text1 = "What's that?"
text2 = "What's this?"
texts=[text,text1,text2]
pprint([re.search(y,text)   for text in texts for y in regexes])

[re.compile('this'), re.compile('that')]
[<re.Match object; span=(5, 9), match='this'>,
 None,
 None,
 <re.Match object; span=(7, 11), match='that'>,
 <re.Match object; span=(7, 11), match='this'>,
 None]

搜索全部符合匹配模式的文本

**re.findall:**返回一个列表，每一个列表元素就是搜索结果（字符串类型)
**finditer:**返回一个可迭代对象，可以获取每一个搜索结果的Match对象

import re
pattern = 'a.*?a'
text = 'abbaaabbbbaaaa a ga '
#findall
print(type(re.findall(pattern,text)))
for match_text in re.findall(pattern,text):
    print(f'{match_text}')
#finditer
print(type(re.finditer(pattern,text)))
for match in re.finditer(pattern,text):
    start = match.start()
    end = match.end()
    print('{}<---->start,end：{},{}'.format(text[start:end],start,end))

<class 'list'>
abba
aa
aa
aa
a ga
<class 'callable_iterator'>
abba<---->start,end：0,4
aa<---->start,end：4,6
aa<---->start,end：10,12
aa<---->start,end：12,14
a ga<---->start,end：15,19

三种匹配

精确匹配
重复匹配
可选匹配
贪婪匹配和非贪婪匹配
默认是贪婪匹配，非贪婪匹配加上 ?

pattern = ‘[ab]’ # 或者a，或者b
pattern = 'a[ab]+’ #最长匹配，贪心匹配pattern - ‘a[ab]+?’ #非贪婪匹配

正则表达式中的转义码

\d,\D,\s,\S,\w,\W

搜索和匹配适用的场景

一般而言，搜索适用于文本提取，而匹配适用于文本格式判断。

函数名	作用	参数
re.search(pattern，string)	查找一个匹配项：普通搜索
re.match()	查找一个匹配项：普通匹配
re.fullmatch()	查找一个匹配项：完全匹配
re.compile ()	编译正则对象