《Python 编程快速上手 — 让繁琐工作自动化》读书笔记之【第7章模式匹配与正则表达式（2）】

最新推荐文章于 2021-12-14 18:16:13 发布

此生小会

最新推荐文章于 2021-12-14 18:16:13 发布

阅读量472

点赞数

分类专栏： Python 文章标签： python regex 正则表达式模式匹配

本文链接：https://blog.csdn.net/cckavin/article/details/79453064

版权

Python 专栏收录该内容

57 篇文章 2 订阅

订阅专栏

1. 通配字符

在正则表达式中，.（句点）字符称为“通配符”。它匹配除了换行之外的所有字符。示例：

>>> atRegex = re.compile(r'.at')
>>> atRegex.findall('The cat in the hat sat on the flat mat.')
['cat', 'hat', 'sat', 'lat', 'mat']

句点字符只匹配一个字符，如上面的flat指匹配了一个字符的到lat。

1) 用点-星匹配所有字符

可以用点-星（.*）表示“任意文本”。同时点_星使用的是“贪心”模式：总是匹配尽可能多的文本。如果要使用“非贪心”模式匹配所有文本，则需要在后面加上问号。示例：

>>> nongreedyRegex =re.compile(r'<.*?>')

>>> mo =nongreedyRegex.search('<To serve man> for dinner.>')

>>> mo.group()

'<To serve man>'

>>> greedyRegex =re.compile(r'<.*>')

>>> mo =greedyRegex.search('<To serve man> for dinner.>')

>>> mo.group()

'<To serve man> for dinner.>'

2) 用句点字符匹配换行

点_星匹配换行以外的所有字符。通过传入 re.DOTALL 作为 re.compile()的第二个参数，可以让句点字符匹配所有字符，包括换行字符。示例：

>>> noNewlineRegex =re.compile('.*')

>>> noNewlineRegex.search('Servethe public trust.\nProtect the innocent.

\nUphold the law.').group()

'Serve the public trust.'

>>> newlineRegex =re.compile('.*', re.DOTALL)

>>> newlineRegex.search('Serve thepublic trust.\nProtect the innocent.

\nUphold the law.').group()

'Serve the public trust.\nProtect theinnocent.\nUphold the law.'

2. 不区分大小写的匹配

要让正则表达式不区分大小写，可以向 re.compile()传入 re.IGNORECASE 或 re.I，作为第二个参数。示例：

>>> robocop =re.compile(r'robocop', re.I)

>>> robocop.search('RoboCop ispart man, part machine, all cop.').group()

'RoboCop'

>>> robocop.search('ROBOCOPprotects the innocent.').group()

'ROBOCOP'

>>> robocop.search('Al, why doesyour programming book talk about robocop so much?').group()

'robocop'

3. 用 sub()方法替换字符串

Regex对象的 sub()方法需要传入两个参数。第一个参数是一个字符串，用于取代发现的匹配。第二个参数是一个字符串，即正则表达式。sub()方法返回替换完成后的字符串。示例：

>>> namesRegex =re.compile(r'Agent \w+')

>>> namesRegex.sub('CENSORED','Agent Alice gave the secret documents to Agent Bob.')

'CENSORED gave the secret documents toCENSORED.'

有时候，你可能需要使用匹配的文本本身，作为替换的一部分。在 sub()的第一个参数中，可以输入\1、\2、\3……。表示“在替换中输入分组 1、2、3……的文本”。例如，假定想要隐去密探的姓名，只显示他们姓名的第一个字母。要做到这一点，可以使用正则表达式 Agent (\w)\w*，传入 r'\1****'作为 sub()的第一个参数。字符串中的\1 将由分组 1 匹配的文本所替代，也就是正则表达式的(\w)分组。示例：

>>> agentNamesRegex =re.compile(r'Agent (\w)\w*')

>>> agentNamesRegex.sub(r'\1****','Agent Alice told Agent Carol that Agent

Eve knew Agent Bob was a double agent.')

A**** told C**** that E**** knew B**** wasa double agent.'

4. 管理复杂的正则表达式

要实现这种详细模式，可以向 re.compile()传入变量 re.VERBOSE，作为第二个参数。示例：

phoneRegex = re.compile(r'''(

(\d{3}|\(\d{3}\))? # area code

(\s|-|\.)? # separator

\d{3} # first 3 digits

(\s|-|\.) # separator

\d{4} # last 4 digits

(\s*(ext|x|ext.)\s*\d{2,5})? # extension

)''', re.VERBOSE)

如果正则表达式很长，给正在表达式加入注释，提高可读性。

5. 组合使用 re.IGNOREC ASE、re.DOTALL 和 re.VERBOSE

re的compile()只接受一个值作为第二个参数，如果要组合使用可以使用管道（|）字符实现。示例：

>>> someRegexValue =re.compile('foo', re.IGNORECASE | re.DOTALL | re.VERBOSE)