python正则表达式匹配模式,Python_模式匹配与正则表达式

最新推荐文章于 2024-06-22 09:36:52 发布

weixin_34984088

最新推荐文章于 2024-06-22 09:36:52 发布

阅读量321

点赞数

文章标签： python正则表达式匹配模式

正则表达式符合总结：

？匹配零次或一次前面的分组；

匹配零次或多次前面的分组；

匹配一次或多次前面的分组；

{n} 匹配n次前面的分组；

{n,} 匹配n次或更多次前面的分组；

{,m} 匹配零次或m次前面的分组；

{n,m} 匹配至少n次，至多m次前面的分组；

^spam 意味着字符串必须以spam开始；

spam$ 意味着字符串必须以spam结束；

. 匹配所有字符，除换行符外；

\d,\w,\s 分别匹配数字、单词和空格；

\D,\W,\S 分别匹配除数字、单词和空格外的所有字符；

[abc] 匹配方括号内任意字符；

[^abc] 匹配不在方括号内的任意字符；

一、Python 使用正则表达式步骤：

1.用import re 导入正则表达式；

2.用re.compile()函数创建一个Regex对象(记得使用原始字符串)；

3.向Regex对象的search()方法传入想查找的字符串，它返回一个March对象；

4.调用Match对象的group()方法，返回实际匹配文本的字符串。

基于网页的正则表达式测试程序：http://regexpal.com/

import re

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

mo = phoneNumRegex.search('My number is 415-555-4242')

print('Phone number found: ' + mo.group())

Phone number found: 415-555-4242

二、用正则表达式匹配更多模式

1. 利用括号分组：正则表达式字符串中第一对括号是第1组，第二对括号是第2组，依次类推。向group()匹配对象方法传入整数(1或2或...)，就可以获取匹配问题的不同部分。向group()方法传入0或不传参，将返回整个匹配文本。

phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')

mo = phoneNumRegex.search('My number is 415-555-4242')

print('Phone number found: ' + mo.group(1))

print(mo.group(2))

print(mo.group(3))

Phone number found: 415

555

4242

2. 用管道匹配多个分组： “|”，希望匹配许多表达式中的一个时使用。第一次出现的匹配文本将作为Match对象返回。

heroRegex = re.compile(r'Batman|Tina Fay')

mo1 = heroRegex.search('Batman and Tina Fey.')

mo2 = heroRegex.search('Tina Fey and Batman.')

print(mo1.group())

print(mo2.group())

Batman

Tina Fay

希望匹配‘Batman’、‘Batmobile’、‘Batcopter’中和‘Batbat’中任意一个，可指定前缀，括号实现。

heroRegex = re.compile(r'Bat(man|mobile|copter|bat)')

mo3 = heroRegex.search('Batmobile lost a wheel.')

print(mo3.group(1))

mobile

用问号？实现可选匹配：字符？表明它前面的分组在这个模式中时可选的。

batRegex = re.compile(r'Bat(wo)?man')

mo1 = batRegex.search('The Adventures of Batman')

mo2 = batRegex.search(('The Adventures of Batwoman'))

print(mo1.group())

print(mo2.group())

Batman

Batwoman

4. 用星号*匹配零次或多次：字符*号之前的分组，可以在文本中出现任意次，即可以完全不出现，或者一次又一次重复。

batRegex = re.compile(r'Bat(wo)*man')

mo1 = batRegex.search('The Adventures of Batman')

mo2 = batRegex.search(('The Adventures of Batwoman'))

mo3 = batRegex.search('The Adventures of Batwowowowowoman')

print(mo1.group())

print(mo2.group())

print(mo3.group())

Batman

Batwoman

Batwowowowowoman

5. 用加号+匹配一次货多次：字符+号要求前面的分组必须“至少出现一次”。

batRegex = re.compile(r'Bat(wo)+man')

mo1 = batRegex.search('The Adventures of Batman')

mo2 = batRegex.search(('The Adventures of Batwoman'))

mo3 = batRegex.search('The Adventures of Batwowowowowoman')

print(mo1 == None)

print(mo2.group())

print(mo3.group())

True

Batwoman

Batwowowowowoman

6. 用花括号{}匹配待定次数：匹配前面分组中特定次数或次数范围。

haRegex = re.compile(r'(Ha){3}')

baRegex = re.compile(r'(Ba){3,5}')

mo1 = haRegex.search('HaHaHa')

mo2 = haRegex.search('ha')

mo3 = baRegex.search('sdfiBaBaBaBa3fdf')

print(mo1.group())

print(mo2 == None)

print(mo3.group())

HaHaHa

True

BaBaBaBa

7. 贪心和非贪心匹配：Python的正则表达式默认是“贪心”的，表示在有二义的情况下，它们会尽可能匹配最长的字符串。花括号的‘非贪心’版本匹配尽可能最短的字符串，即在结束的花括号后跟着一个问号。

greedyHaRegex = re.compile(r'(Ha){3,5}')

mo1 = greedyHaRegex.search('HaHaHaHaHa')

nongreedyHaRegex = re.compile(r'(Ha){3,5}?')

mo2 = nongreedyHaRegex.search('HaHaHaHaHa')

print(mo1.group())

print(mo2.group())

HaHaHaHaHa

HaHaHa

8. search()和 findall()方法：search()方法返回一个Match对象，包含被查找字符串的‘第一次’匹配文本；而findall()方法将返回一组字符串，包含被查找字符串中的所有匹配，且如果调用在一个没有分组的正则表达式上，返回一个匹配字符串列表，如果调用在一个有分组的正则表达式上，返回一个匹配字符串(每一个分组对应一个字符串)元组列表。

phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

phones = phoneNumRegex.findall('Cell:415-983-8876 Work:425-098-7723')

phoneNum1Regex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')

phones1 = phoneNum1Regex.findall('Cell:415-983-8876 Work:425-098-7723')

print(phones)

print(phones1)

['415-983-8876', '425-098-7723']

[('415', '983', '8876'), ('425', '098', '7723')]

9. 字符分类：

(1)\d：0到9的任意数字；

(2)\D: 除0到9的数字外的任意字符；

(3)\w：任何字母、数字或下划线字符(可以认为是匹配“单词”字符)；

(4)\W：除字符、数字和下划线以外的任何字符；

(5)\s：空格、制表符或换行符(可以认为是匹配“空白”字符)；

(6)\S：除空格、制表符和换行符以外的任何字符；

xmasRegex = re.compile(r'\d+\s\w+')

xmax = xmasRegex.findall('12 drummers,11 pipers,10 lords,9 ladies,8 maids,7 swans,6 geese,5 rings, 4 birds,3 hens,2 doves,1 partridge')

print(xmax)

['12 drummers', '11 pipers', '10 lords', '9 ladies', '8 maids', '7 swans', '6 geese', '5 rings', '4 birds', '3 hens', '2 doves', '1 partridge']

10. 建立自己的字符分类：方括号[]定义自己的字符分类；短横线-表示字母或数字的范围。方括号内普通的正则表达式符合不会被解释，左方括号后加^可以得到‘非字符类’，即匹配不在这个字符类中的所有其他字符。

consonantRegex = re.compile(r'[aeiouAEIOU]')

c= consonantRegex.findall('RoboCop eats baby food.BABY FOOD.')

consonant1Regex = re.compile(r'[^aeiouAEIOU]')

n= consonant1Regex.findall('RoboCop eats baby food.BABY FOOD.')

print (c)

print (n)

['o', 'o', 'o', 'e', 'a', 'a', 'o', 'o', 'A', 'O', 'O']

['R', 'b', 'C', 'p', ' ', 't', 's', ' ', 'b', 'b', 'y', ' ', 'f', 'd', '.', 'B', 'B', 'Y', ' ', 'F', 'D', '.']

11. 插入字符^和美元字符$：在正则表达式开始处使用^，表明匹配必须发生在被查找文本开始处；在正则表达式末尾加美元符$，表明该字符串必须以这个正则表达式模式结束。

12. 通配字符：句号.字符匹配除换行符以外的所有字符。

atRegex = re.compile(r'.at')

at = atRegex.findall('The cat in the hat sat on the flat mat.')

print (at)

['cat', 'hat', 'sat', 'lat', 'mat']

13. 用点-星(.*)匹配所有字符：

nameRegex = re.compile(r'First Name:(.)Last Name:(.)')

mo = nameRegex.search('First Name:Al Last Name:Sweigart')

print(mo.group(1))

print(mo.group(2))

Sweigart

不区分大小写的匹配：向re.compile()传入第二个参数re.I

robocop = re.compile(r'robocop',re.I)

r = robocop.search('RoboCop is part man,part machine,all cop.')

print (r.group())

RoboCop

15. 用sub()方法替换字符串：Regex的对象的sub()方法需要传入两个参数。第一个参数为一个字符串，用于取代发现的匹配。第二个参数是一个字符串，使用正则表达式匹配的内容。

namesRegex = re.compile(r'Agent \w+')

names = namesRegex.sub('CENSORED','Agent Alice gave the secret documents to Agent Bob.')

print (names)

CENSORED gave the secret documents to CENSORED.

16. 管理复杂的正则表达式：向re.compile()传入变量re.VERBOSE编写注解

phoneRegex = re.compile(r'''

(\d{3})|(\d{3}))? #area code

(\s|-|.)? #separator

\d{3} #first 3 digits

(\s|-|.) #separator

\d{4} #last 4 digits

(\s(ext|x|ext.)\s\d{2,5})? #extension

''',re.VERBOSE)

weixin_34984088

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式匹配模式,Python_模式匹配与正则表达式

正则表达式符合总结：？匹配零次或一次前面的分组；匹配零次或多次前面的分组；匹配一次或多次前面的分组；{n} 匹配n次前面的分组；{n,} 匹配n次或更多次前面的分组；{,m} 匹配零次或m次前面的分组；{n,m} 匹配至少n次，至多m次前面的分组；^spam 意味着字符串必须以spam开始；spam$ 意味着字符串必须以spam结束；. 匹配所有字符，除换行符外；\d,\w,\s 分别匹配数字、单...
复制链接

扫一扫