import re
phoneNumRegex = re.compile(r'\d{3}-\d{3}-\d{3}')
mo = phoneNumRegex.search('My number is 415-555-4242.')print('Phone number found: '+ mo.group())>>>>> Phone number found:415-555-424
3、利用括号分组
正则表达式字符串的第一对括号就是第1组,第二对括号就是第2组。
向group()匹配对象方法传入整数1或2,就可以取得匹配文本的不同部分。
向group()方法传入0或不传入参数,将返回整个匹配的文本
如果想一次性获取所有分组,请使用groups()方法
匹配真正的括号用转义字符 \( 和 \)
phoneNumRegex = re.compile(r'(\d{3})-(\d{3}-\d{3})')
mo = phoneNumRegex.search('My number is 415-555-4242.')print('整个文本:{},分组1:{},分组2:{}'.format(mo.group(), mo.group(1), mo.group(2)))>>>>> 整个文本:415-555-424,分组1:415,分组2:555-424print('所有分组:', mo.groups())>>>>> 所有分组: ('415','555-424')
4、用 | 匹配多个分组
字符’|'称为管道,希望匹配许多表达式中的一个时,就可以使用它
匹配真正的管道字符,用转义字符 \|
heroRegex = re.compile(r'Batman|Tina Fey')
mo1 = heroRegex.search('Batman and Tina Fey.')print(mo1.group())>>>>> Batman
mo2 = heroRegex.search('Tina Fey and Batman.')print(mo2.group())>>>>> Tina Fey
batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
mo = batRegex.search('Batmobile lost a weel')print(mo.group())>>>>> Batmobile
print(mo.group(1))>>>>> mobile
5、用 ? 实现可选匹配
匹配这个问号之前的分组零次或一次
匹配真正的问号字符,用转义字符 \?
batRegex = re.compile(r'Bat(wo)?man')
mo1 = batRegex.search('The Adventures of Batman')print(mo1.group())>>>>> Batman
mo2 = batRegex.search('The Adventures of Batwoman')print(mo2.group())>>>>> Batwoman
phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
mo1 = phoneRegex.search('My number is 415-555-4242')print(mo1.group())>>>>>415-555-4242
mo2 = phoneRegex.search('My number is 555-4242')print(mo2.group())>>>>>555-4242
6、用 * 匹配零次或多次
匹配真正的星号字符,用转义字符 \*
batRegex = re.compile(r'Bat(wo)*man')
mo1 = batRegex.search('The Adventures of Batman')print(mo1.group())>>>>> Batman
mo2 = batRegex.search('The Adventures of Batwoman')print(mo2.group())>>>>> Batwoman
mo3 = batRegex.search('The Adventure of Batwowowowoman')print(mo3.group())>>>>> Batwowowowoman
7、用 + 匹配一次或多次
匹配真正的加号字符,用转义字符 \+
batRegex = re.compile(r'Bat(wo)+man')
mo1 = batRegex.search('The Adventures of Batman')print(mo1)>>>>>None
mo2 = batRegex.search('The Adventures of Batwoman')print(mo2.group())>>>>> Batwoman
mo3 = batRegex.search('The Adventure of Batwowowowoman')print(mo3.group())>>>>> Batwowowowoman
# 如 r'^Hello'匹配以'Hello'开始的字符串
beginsWithHello = re.compile(r'^Hello')print(beginsWithHello.search('Hello world!'))>>>>><re.Match object; span=(0,5), match='Hello'>print(beginsWithHello.search('He said hello Hello'))>>>>>None# r'\d$'匹配以数字0到9结束的字符串
endWithNumber = re.compile(r'\d$')print(endWithNumber.search('You number is 42'))>>>>><re.Match object; span=(15,16), match='2'>print(endWithNumber.search('Your number is forty two'))>>>>>None# 正则表达式r'^\d+$'匹配从开始到结束都是数字的字符串
wholeStringIsNum = re.compile(r'\d+$')print(wholeStringIsNum.search('1234567890'))>>>>><re.Match object; span=(0,10), match='1234567890'>print(wholeStringIsNum.search('12345xy67890'))>>>>><re.Match object; span=(7,12), match='67890'>
14、. 通配字符,匹配除换行符之外的所有字符
. 匹配除换行符之外的所有字符
atRegex = re.compile(r'.at')print(atRegex.findall('The cat in the hat sat on the flat mat.'))>>>>>['cat','hat','sat','lat','mat']
15、用 .* 匹配所有字符,包括换行符
.* 匹配除换行所有字符为贪心模式
.*? 用非贪心模式
nongreedyHaRegex = re.compile(r'<.*>')print(nongreedyHaRegex.search('<To server man> for dinner.>').group())>>>>><To server man>for dinner.>
nongreedyHaRegex = re.compile(r'<.*?>')print(nongreedyHaRegex.search('<To server man> for dinner.>').group())>>>>><To server man>
newlineRegex = re.compile('.*', re.DOTALL)print(newlineRegex.search('Serve the public truse.\nProtect the innocent.').group())>>>>> Serve the public truse.\nProtect the innocent.
17、re.I 不区分大小写的匹配
re.compile()传入参数re.I,可以进行不区分大小写的匹配
rebocop = re.compile(r'robocop', re.I)print(rebocop.search('Robocop is part man,part machine,all cop.').group())>>>>> Robocop
namesRegex = re.compile(r'Agent \w+')print(namesRegex.sub('CENSORED','Agent Alice gave the secret documents to Agent Bob.'))>>>>> CENSORED gave the secret documents to CENSORED.# 输入\1、\2、\3表示替换中输入分组1、2、3.....的文本
agentNamesRegex = re.compile(r'Agent (\w)\w*')print(agentNamesRegex.sub(r'\1****','Agent Alice told Agent Carol tha Agent Eve'))>>>>> A**** told C**** tha E****