python正则表达式通过re模块使用方法

最新推荐文章于 2021-12-13 13:15:31 发布

柏林墙

最新推荐文章于 2021-12-13 13:15:31 发布

阅读量188

点赞数

分类专栏： Python 文章标签：正则表达式 python 字符串

本文链接：https://blog.csdn.net/weixin_44122191/article/details/106952638

版权

Python 专栏收录该内容

26 篇文章 1 订阅

订阅专栏

1. match 和 search

re.match(表达式, string, 选项).span() 只从开头开始匹配,存在则返回(起始位置，结束位置)，否则返回None
re.search(表达式, string, 选项).span() 全文搜索,存在则返回(起始位置，结束位置)，否则返回None
代码举例

>>> re.match('com', 'www.baidu.com')
>>> re.match('www', 'www.baidu.com').span()
(0, 3)

>>> re.search('w{3}', 'wWw.baidu.com', re.I|re.M).span()
(0, 3)
>>> re.search('com', 'wWw.baidu.com', re.I|re.M).span()
(10, 13)

2. flag标志选项

修饰符	描述
re.I	使匹配对大小写不敏感
re.L	做本地化识别（locale-aware）匹配
re.M	多行匹配，影响 ^ 和 $
re.S	使 . 匹配包括换行在内的所有字符
re.U	根据Unicode字符集解析字符。这个标志影响 \w, \W, \b, \B.
re.X	该标志通过给予你更灵活的格式以便你将正则表达式写得更易于理解。

3. sub

re.sub(表达式, repl, string, count=0, flag=0) 默认count为0替换所有，可省略
不会对原字符串修改，只会生成新的字符串
代码示例

>>> phone = '400-8888-888 # welecom to BJ'
>>> num = re.sub('#.*', '', phone)  # 删除注释
>>> num
'400-8888-888 '
>>>
>>> phone
'400-8888-888 # welecom to BJ'
>>>
>>> num = re.sub('\D*', '', phone)  # 删除非数字字符
>>> num
'4008888888'

repl 参数可以是个函数，对匹配成功的字符作为参数传入，通过处理后返回新的字符串作为替代物

def handle(str):
    return str.append(' hallo')

re.sub('[A-Z][a-z]{2,5}', handle, string)

4. compile

pattern = re.compile(正则表达式) : 生成正则表达式对象供match 和 search使用
pattern.match(string, 起始位置，结束位置) : pattern对象调用match方法

注意：这里的match方法和re.match不完全相同，这里的match可以设置首尾位置

import re
pa = re.compile(r'(\d+) ([a-z]+)')  # 分组指( ), 一个() 代表一个组
result1 = pa.match('12 b23cc4', 0, 8)  # 首尾位置可以不写，默认为全部

# 所有的（除了groups）默认都为0，表示所有
print(result1.group(0))  # 全部
print(result1.group(1))  # 第一个分组
print(result1.group(2))  # 第二个分组
print(result1.groups())  # 所有分组，等价于 group(1), group(2)
print(result1.start(2))  # 第二个分组开始
print(result1.end(1))   # 第一个分组结尾
print(result1.span(2)) # 第二个分组范围

5. findall

pattern.findall(string, start, end) 匹配字符串中所以符合的子串，以列表形式返回

import re
pa = re.compile(r'\d+')  # 匹配所有数字

result = pa.findall('a12b34c5')
print(result)  # ['12', '34', '5']

6. split

pattern.split(string, 最大分割点数) 默认全部分割，以列表返回

import re
pa = re.compile(r'\.')  # 匹配所有数字

res = pa.split('a.b.cc', 1)
print(res)      # ['a', 'b.cc']

柏林墙

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式通过re模块使用方法

1. match 和 searchre.match(表达式, string, 选项).span() 只从开头开始匹配,存在则返回(起始位置，结束位置)，否则返回Nonere.search(表达式, string, 选项).span() 全文搜索,存在则返回(起始位置，结束位置)，否则返回None代码举例>>> re.match('com', 'www.baidu.com')>>> re.match('www', 'www.baidu.com').
复制链接

扫一扫