正则表达式（regex）实现模式匹配

最新推荐文章于 2024-05-13 09:48:25 发布

不想秃头12138

最新推荐文章于 2024-05-13 09:48:25 发布

阅读量949

点赞数

分类专栏： Python 文章标签： python 正则表达式字符串

本文链接：https://blog.csdn.net/ljn123fdg/article/details/117458513

版权

Python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

正则表达式（regex）实现模式匹配

使用过程：

调入regex所在模块
创建regex对象
查询
打印调用

import re #re为正则表达式模块
phoneNumberRegex=re.compile(r'\d\d\d-\d\d\d\d')   #phoneNumberRegex为创建的正则表达式
mo=phoneNumberRegex.search('My number is 415-9683.')   #mo用来保存匹配结果
print('phone number found:'+mo.group())   #group调出打印

phone number found:415-9683

\d表示数字， ‘\d’实际应输入’\\d’ ，从而避免当成转义字符翻译（用\来打印\符号）。
字符串首个引号之前加入 r , 该字符串可标记为原始字符串，不会自动翻译转义字符。
r’\d\d\d’ 等同于 ‘\\d\\d\\d’

功能：

括号创建分组

phoneNumberRegex=re.compile(r'(\d\d\d\d-)(\d\d\d\d\d\d\d)')
mo=phoneNumberRegex.search('my number is 0551-6306943')

mo.group()   #打印所有分组

'0551-6306943'

mo.group(0)  #打印所有分组

'0551-6306943'

mo.group(1)  #第一组从0开始计数，不是0

'0551-'

mo.group(2)

'6306943'

分组后mo得到多个值的元组，可用于赋值

example1,example2=mo.groups()
#注意是 .groups() 是个复数

example1

'0551-'

example2

'6306943'

查找（415） 5555-444 如何创建正则表达式？（如何处理括号？）

phoneNumberRegex=re.compile(r'(\(\d\d\d\)) (\d\d\d\d-\d\d\d)')

用管道匹配多个值

'|'表示管道（或），匹配多个表达式中的第一个

heroregex=re.compile(r'Batman | Tina Fey')
mo=heroregex.search('Batman and Tina Fey')
mo.group()

'Batman '

管道与括号实现找前缀

batregex=re.compile(r'Bat(man|mobile|copter|woman)')
mo=batregex.search('Batmobile lost a wheel.')

mo.group()

'Batmobile'

mo.group(1)

'mobile'

用？实现可选匹配

?表示可选内容出现0此或1次

batmanregex=re.compile(r'Bat(wo)?man')
mo1=batmanregex.search('the adventure of Batman')
mo1.group()

'Batman'

mo2=batmanregex.search('the adventures of Batwoman.')
mo2.group()

'Batwoman'

用 * 实现匹配0次或多次

batregex=re.compile(r'bat(wo)*man')

mo1=batregex.search('the adventures of batman')
mo1.group()

'batman'

mo2=batregex.search('the adventures of batwowowowowoman')
mo2.group()

'batwowowowowoman'

用 + 实现匹配1次或多次

batregex=re.compile(r'bat(wo)+man')

用花括号 {} 匹配特定次数

（ha）{3} 匹配于 hahaha

（ha）{,3}匹配于ha出现0-3次

（ha）{5,}匹配于ha出现5次及以上

贪心匹配与非贪心匹配

贪心匹配：找最长字符串
非贪心匹配：找最短字符串
Python的正则表达式默认贪心匹配： (a){3,5}默认匹配5次
非贪心匹配修正：(a){3,5}? 默认匹配最短

用 .findall() 方法查找所有匹配

.findall()返回的是字符串列表

字符分类

\d 等效于正则表达式 (0|1|2|3|4|5|6|7|8|9) 等效于 [0-9]

缩写字符分类	表示
\d	0到9的任何数字
\D	除0到9的数字以外的任何字符
\w	任何字母、数字或下划线字符（可以认为是匹配“单词”字符）
\W	除字母、数字和下划线以外的任何字符
\s	空格、制表符或换行符（可以认为是匹配“空白”字符）
\S	除空格、制表符和换行符以外的任何字符

建立 [] 自己的字符分类

vowelregex=re.compile(r'[aeiouAEIOU]')
vowelregex.findall('roboccop eats baby food.BABY FOOD.')

['o', 'o', 'o', 'e', 'a', 'a', 'o', 'o', 'A', 'O', 'O']

用 ^ 表示非对应字符

vowelregex=re.compile(r'[^aeiouAEIOU]')
vowelregex.findall('roboccop eats baby food.BABY FOOD.')

['r','b','c','c','p',' ','t', 's', ' ', 'b', 'b','y',' ','f','d','.','B','B','Y',' ','F','D','.']

用 ^ 字符表示必须匹配字符串的起始处

hello=re.compile(r'^hello')
hello.search('he said hello.') == None

True

用 $ 字符表示必须匹配字符串的结束处

end=re.compile(r'end$')
end.search('ending is a new begining') == None

True

e1=end.search('this never dsadend')
e1.group()

'end'

用 . 字符匹配除换行以外所有字符（单个）
用 .* 匹配任意字符串

nameregex=re.compile(r'First Name: (.*) Last Name: (.*) ')
mo=nameregex.search('First Name: AI Last Name: Sweidau ')
mo.group()

'First Name: AI Last Name: Sweidau '

这是贪心匹配，非贪心策略使用（.*?）

用 . 匹配换行符(re.DOTALL)

import re
nonewllineregex=re.compile('.*')
nonewllineregex.search('Serve the public trust.\nProtect the innocent.\nUphold the law.').group()

'Serve the public trust.'

nonewllineregex=re.compile('.*',re.DOTALL)
nonewllineregex.search('Serve the public trust.\nProtect the innocent.\nUphold the law.').group()

'Serve the public trust.\nProtect the innocent.\nUphold the law.'

不区分大小写的匹配

使用re.compile()传入re.IGNORECASE或re.I作为第二个参数

import re
robocop=re.compile(r'robocop',re.I)

用sub()方法替换字符串

nameregex=re.compile(r'Agent \w+')
nameregex.sub('CENSORED','Agent Alice gave the secret documents to Agent BOb')

'CENSORED gave the secret documents to CENSORED'

使用re.VERBOSE作为第二参数忽略正则表达式中的空格与换行

搭配三重引号’’'使用，创造一个多行的字符串。三重引号要接大括号

phone_number_regex1 = re.compile(r'''(
    (\d{4}|\(\d{4}\)?     #area code may includes()
    (\s|-)?               #separator
    (\d{7})             #7 digits
)''', re.VERBOSE)

用 .join() 方法使用元素链接成新的字符串

join() 方法用于将序列中的元素以指定的字符连接生成一个新的字符串

for groups in phone_number_regex1.findall(text):
    phone_number1 = '-'.join([groups[1],groups[3]])
    maches.append(phone_number1)

不想秃头12138

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
正则表达式（regex）实现模式匹配

正则表达式（regex）实现模式匹配使用过程：功能字符分类
复制链接

扫一扫

专栏目录

正则表达式（regex）实现模式匹配

正则表达式（regex）实现模式匹配

使用过程：

功能：

括号创建分组

用管道匹配多个值

用 ？实现可选匹配

用 * 实现匹配0次或多次

用 + 实现匹配1次或多次

用花括号 {} 匹配特定次数

贪心匹配与非贪心匹配

用 .findall() 方法查找所有匹配

字符分类

建立 [] 自己的字符分类

用 ^ 表示非对应字符

用 ^ 字符表示必须匹配字符串的 起始处

用 $ 字符表示必须匹配字符串的结束处

用 . 字符匹配除换行以外所有字符（单个）

用 .* 匹配任意字符串

用 . 匹配换行符(re.DOTALL)

不区分大小写的匹配

用sub()方法替换字符串

用 .join() 方法使用元素链接成新的字符串

“相关推荐”对你有帮助么？

用？实现可选匹配

用 ^ 字符表示必须匹配字符串的起始处