Python学习5-7.12-7.15正则表达式查找文本
本文为学习python编程时所记录的笔记,仅供学习交流使用。
7.12 用sub()方法替换字符串
sub()第一个参数为一个字符串,用于取代匹配到的内容,第二个参数是字符串。
>>> import re
>>> namesRegex=re.compile(r'Agent \w+')
>>> namesRegex.sub('CENSORED','Agent Alice gave the secret documents to Agent Bob.')
'CENSORED gave the secret documents to CENSORED.'
>>> agentNamesRegex=re.compile(r'Agent (\w)\w*')
>>> agentNamesRegex.sub(r'\1****','Agent ALice told Agent Carol that Agent Eve knew Agent Bob was a double agent.')
'A**** told C**** that E**** knew B**** was a double agent.'
7.13 管理复杂的正则表达式
使用re.VERBOSE参数忽略正则表达式中的空白符和注释
>>> phoneRegex=re.compile(r'((\d{3}|\(\d{3}\))?(\s|-|\.)?\d{3}(\s|-|\.)\d{4}(\s*(ext|x|ext.)\s*\d{2,5})?)')
>>> phoneRegex=re.compile(r'''(
(\d{3}|\(\d{3}\))?
(\s|-|\.)?
\d{3}
(\s|-|\.)
\d{4}
(\s*(ext|x|ext.)\s*\d{2,5})?
)''',re.VERBOSE)
>>> phoneRegex=re.compile(r'''(
(\d{3}|\(\d{3}\))? #area code
(\s|-|\.)? #separator
\d{3} #first three digits
(\s|-|\.) #separator
\d{4} #last four digits
(\s*(ext|x|ext.)\s*\d{2,5})? #extension
)''',re.VERBOSE)
7.14 组合使用re.IGNORECASE,re.DOTALL和re.VERBOSE
用管道字符|隔开即可。
>>> someRegexValue=re.compile('foo',re.IGNORECASE|re.DOTALL)
7.15 项目:电话号码和E-mail地址提取程序
#! python3
# phoneAndEmail.py - Find phone numbers and emial addresses on the clipboard.
import pyperclip,re
phoneRegex=re.compile(r'''(
(\d{3}|\(\d{3}\))? #area code
(\s|-|\.)? #separator
(\d{3}) #first three digits
(\s|-|\.) #separator
(\d{4}) #last four digits
(\s*(ext|x|ext.)\s*(\d{2,5}))? #extension
)''',re.VERBOSE)
# create Email regex
emailRegex=re.compile(r'''(
[a-zA-Z0-9._%+-]+ #username
@ #@symbol
[a-zA-Z0-9.-]+ #domain name
(\.[a-zA-Z]{2,4}) #dot-something
)''',re.VERBOSE)
# find match in clipboard text
text=str(pyperclip.paste())
matches=[]
for groups in phoneRegex.findall(text):
phoneNum='-'.join([groups[1],groups[3],groups[5]])
if groups[8]!='':
phoneNum+=' x'+groups[8]
matches.append(phoneNum)
for groups in emailRegex.findall(text):
matches.append(groups[0])
# copy results to the clipboard.
if len(matches)>0:
pyperclip.copy('\n'.join(matches))
print('copied to clipboard:')
print('\n'.join(matches))
else:
print('no phone number or email address found.')
内容来源
[1] [美]斯维加特(Al Sweigart).Python编程快速上手——让繁琐工作自动化[M]. 王海鹏译.北京:人民邮电出版社,2016.7.p129-134