Python正则表达式re模块详解

礼礼礼礼礼礼

已于 2024-09-25 23:13:28 修改

阅读量285

点赞数 6

文章标签： python 爬虫

于 2024-09-24 21:36:06 首次发布

本文链接：https://blog.csdn.net/2301_81117102/article/details/142500759

版权

1.导入re模块

首先，你需要在Python脚本中导入re模块。

import re

2. 编译正则表达式

虽然可以不编译直接使用正则表达式，但编译正则表达式可以提高性能，尤其是在需要多次使用同一个表达式时。使用re.compile()函数编译正则表达式。

import re
pattern=re.compile(r'\d+')
print(pattern)

3. 匹配字符串

3.1 使用match()方法

match()方法尝试从字符串的起始位置匹配正则表达式，如果匹配成功，返回一个匹配对象；否则返回None

#导入re模块
import re
#设置正则表达式匹配规则，该正则用于匹配数值
pattern=re.compile(r'\d+')
#匹配目标文本，没有匹配到，则中止匹配，由于o不是数字因此无匹配数据
m1=pattern.match('oen123two456three789')
print(m1)
#参数2：匹配的起始位置，下标数包含
#参数3：匹配的结束位置，下标数不包含
m1=pattern.match('oen123two456three789',3,7)
#通过group查看匹配的数据
print(m1.group())

3.2使用search()方法

search()方法扫描整个字符串，查找第一个与正则表达式匹配的子串。如果找到，返回一个匹配对象；否则返回None。

import re
pattern=re.compile(r'\d+')
#search字符串中的任意位置匹配，如果匹配失败则结束匹配
m1=pattern.search('one123two456')
print(m1)

3.3 使用findall()方法

findall()方法查找字符串中所有与正则表达式匹配的匹配项，并返回一个列表。

import re
pattern=re.compile(r'\D+')
#findall方法用于搜索整个字符串，获得所有的匹配结果
result= pattern.findall('hello 123 word 456')
print(result)

4. 替换字符串

使用sub()方法可以在字符串中替换所有与正则表达式匹配的部分。

import re
string='<h1 class="text">HelloWord</h1>'
pattern=re.compile(r'\d+')
#参数1:替换后的值
# 参数2:源字符串
print(pattern.sub('2',string))
#参数3:替换几个
print(pattern.sub('2',string,3))

#分组   ?P<classname> 为组声明一个名字，可以通过名字调用该组
pattern=re.compile(r'<(.\d)\sclass="(?P<abcd>.*?)">.*?</(.1)>')
print(pattern.search(string).group('abcd'))
def func(m):
    return "after sub"+m.group('abcd')
#sub方法中传入函数，
#参数1：函数名
#参数2：目标字符串
print(pattern.sub(func,string))

5. 分割字符串

spilt()方法可以使用正则表达式来分割字符串。

import re
str='a,b,c'
print(str.split(','))
str1='a,b;;c d'
#按照逗号 分号 空格进行拆分腌
patten=re.compile(r'[\s\,\;]+')
print(patten.split(str1))

6.贪婪匹配

import re
string='<h1 class="text">HelloWord</h1>'
#?中止贪婪匹配。匹配最近的
pattern=re.compile(r'<.\d\sclass=.*?>')
print(pattern.search(string).group())

礼礼礼礼礼礼

关注

6
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫