Python编程快速上手第7章模式匹配与正则表达re

最新推荐文章于 2022-04-11 10:41:27 发布

忘记他

最新推荐文章于 2022-04-11 10:41:27 发布

阅读量218

点赞数 1

分类专栏： Python 文章标签：正则表达式 python

本文链接：https://blog.csdn.net/u013965752/article/details/104951781

版权

Python 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

创建正则表达式对象
创建一个 Regex 对象

>>> phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')

匹配 Regex 对象
Regex 对象的search()方法查找传入的字符串，寻找该正则表达式的所有匹配。如
果字符串中没有找到该正则表达式模式，search()方法将返回None。如果找到了该模式，
search()方法将返回一个 Match 对象。Match 对象有一个 group()方法，它返回被查找字
符串中实际匹配的文本（稍后我会解释分组）。

>>> phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
>>> mo = phoneNumRegex.search('My number is 415-555-4242.')
>>> print('Phone number found: ' + mo.group())
Phone number found: 415-555-4242

变量名 mo 是一个通用的名称，用于 Match 对象。
我们将期待的模式传递给 re.compile()，并将得到的 Regex 对象保存在
phoneNumRegex 中。然后我们在 phoneNumRegex 上调用 search()，向它传入想查找
的字符串。查找的结果保存在变量 mo 中。
在这个例子里，我们知道模式会在这个
字符串中找到，所以我们知道会返回一个 Match 对象。知道 mo 包含一个 Match 对
象，而不是空值 None，我们就可以在 mo 变量上调用 group()，返回匹配的结果。
将 mo.group()写在打印语句中，显示出完整的匹配，即 415-555-4242。
正则表达式匹配复习
1．用 import re 导入正则表达式模块。
2．用 re.compile()函数创建一个 Regex 对象（记得使用原始字符串）。
3．向 Regex 对象的 search()方法传入想查找的字符串。它返回一个 Match 对象。
4．调用 Match 对象的 group()方法，返回实际匹配文本的字符串。
利用括号分组

>>> phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
>>> mo = phoneNumRegex.search('My number is 415-555-4242.')
>>> mo.group(1)
'415'
>>> mo.group(2)
'555-4242'
>>> mo.group(0)
'415-555-4242'
>>> mo.group()
'415-555-4242'

如果想要一次就获取所有的分组，请使用groups()方法：

>>> mo.groups()
('415', '555-4242')
>>> areaCode, mainNumber = mo.groups()
>>> print(areaCode)
415
>>> print(mainNumber)
555-4242

传递给 re.compile()的原始字符串中，(和)转义字符将匹配实际的括号字符。
用管道匹配多个分组
字符|称为“管道”。希望匹配许多表达式中的一个时，就可以使用它。如果 Batman 和 Tina Fey 都出现在被查找的字符串中，第一次出现的匹配文本，将作为 Match 对象返回。

>>> heroRegex = re.compile (r'Batman|Tina Fey')
>>> mo1 = heroRegex.search('Batman and Tina Fey.')
>>> mo1.group()
'Batman'
>>> mo2 = heroRegex.search('Tina Fey and Batman.')
>>> mo2.group()
'Tina Fey'

注意：利用 findall()方法，可以找到“所有”匹配的地方。
也可以使用管道来匹配多个模式中的一个，作为正则表达式的一部分。

>>> batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
>>> mo = batRegex.search('Batmobile lost a wheel')
>>> mo.group()
'Batmobile'
>>> mo.group(1)
'mobile'

方法调用 mo.group()返回了完全匹配的文本’Batmobile’，而 mo.group(1)只是返
回第一个括号分组内匹配的文本’mobile’。通过使用管道字符和分组括号，可以指定
几种可选的模式，让正则表达式去匹配。
如果需要匹配真正的管道字符，就用倒斜杠转义，即|。
用问号实现可选匹配
字符?表明它前面的分组在这个模式中出现零次或一次。

>>> batRegex = re.compile(r'Bat(wo)?man')
>>> mo1 = batRegex.search('The Adventures of Batman')
>>> mo1.group()
'Batman'
>>> mo2 = batRegex.search('The Adventures of Batwoman')
>>> mo2.group()
'Batwoman'

用星号匹配零次或多次
*（称为星号）意味着“匹配零次或多次”，即星号之前的分组，可以在文本中出现任意次。
用加号匹配一次或多次
+（加号）则意味着“匹配一次或多次”。加号前面的分组必须“至少出现一次”。这不是可选的。
用花括号匹配特定次数

(Ha){3,5}
((Ha)(Ha)(Ha))|((Ha)(Ha)(Ha)(Ha))|((Ha)(Ha)(Ha)(Ha)(Ha))

贪心和非贪心匹配

>>> greedyHaRegex = re.compile(r'(Ha){3,5}')
>>> mo1 = greedyHaRegex.search('HaHaHaHaHa')
>>> mo1.group()
'HaHaHaHaHa'
>>> nongreedyHaRegex = re.compile(r'(Ha){3,5}?')
>>> mo2 = nongreedyHaRegex.search('HaHaHaHaHa')
>>> mo2.group()
'HaHaHa'

问号在正则表达式中可能有两种含义：声明非贪心匹配或表示可选的分组。
findall()方法
search()将返回一个Match
对象，包含被查找字符串中的“第一次”匹配的文本，而 findall()方法将返回一组
字符串，包含被查找字符串中的所有匹配。
作为 findall()方法的返回结果的总结，请记住下面两点：
1．如果调用在一个没有分组的正则表达式上，例如\d\d\d-\d\d\d-\d\d\d\d，方法
findall()将返回一个匹配字符串的列表，例如[‘415-555-9999’, ‘212-555-0000’]。
2．如果调用在一个有分组的正则表达式上，例如(\d\d\d)-(\d\d\d)-(\d\d\d\d)，方
法 findall()将返回一个字符串的元组的列表（每个分组对应一个字符串），例如[(‘415’,
‘555’, ‘1122’), (‘212’, ‘555’, ‘0000’)]。
常用字符分类的缩写代码
\D—\d 0 到 9 的任何数字
\W—\w 任何字母、数字或下划线字符（可以认为是匹配“单词”字符）
\S—\s 空格、制表符或换行符（可以认为是匹配“空白”字符）
\d+\s\w+数字加空白加字符
建立自己的字符分类

[aeiou][0-5][0-5.]
[^0-5]取反

插入字符和美元字符
r’^Hello’匹配以’Hello’开始的字符串。
r’\d $KaTeX parse error: Undefined control sequence: \d at position 25: …9 结束的字符串。 r'^\̲d̲+$ ’匹配从开始到结束都是数字的字符串。
通配字符
.（句点）字符称为“通配符”。它匹配除了换行之外的所有字符。
用点-星匹配所有字符
有时候想要匹配所有字符串。
贪心和非贪心

>>> nongreedyRegex = re.compile(r'<.*?>')
>>> mo = nongreedyRegex.search('<To serve man> for dinner.>')
>>> mo.group()
'<To serve man>'
>>> greedyRegex = re.compile(r'<.*>')
>>> mo = greedyRegex.search('<To serve man> for dinner.>')
>>> mo.group()
'<To serve man> for dinner.>'

sub()方法
第一个参数是一个字符串，用于取代发现的匹配。第二个参数是一个字符串，即正则表达式。sub()方法返回替换完成后的字符串。

namesRegex = re.compile(r'Agent \w+')
>>> namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')
'CENSORED gave the secret documents to CENSORED.'

>>> agentNamesRegex = re.compile(r'Agent (\w)\w*')
>>> agentNamesRegex.sub(r'\1****', 'Agent Alice told Agent Carol that Agent
Eve knew Agent Bob was a double agent.')
A**** told C**** that E**** knew B**** was a double agent.'

re.DOTALL 作为 re.compile()的第二个参数，可以让句点字符匹配所有字符，包括换行字符。
re.compile()传入 re.IGNORECASE 或 re.I，作为第二个参数。
re.compile()传入变量 re.VERBOSE，作为第二个参数，忽略正则表达式字符串中的空白符和注释。

组合使用 re.IGNOREC ASE、re.DOTALL 和 re.VERBOSE

>>> someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL)
#使用第二个参数的全部 3 个选项，看起来像这样：
>>> someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL | re.VERBOSE)

强口令检测

import re
eightRegex = re.compile(r'\w{8,}')
lowerRegex = re.compile(r'[a-z]+')
capitalsRegex = re.compile(r'[A-Z]+')
numRegex = re.compile(r'\d+')
while True:
    passwd = input("Please input passwd:")
    mo1 = eightRegex.search(passwd)
    mo2 = lowerRegex.search(passwd)
    mo3 = capitalsRegex.search(passwd)
    mo4 = numRegex.search(passwd)
    if mo1 and mo2 and mo3 and mo4:
        print('sussessful!')
        break
    else:
        print("Input again。")

strip()的正则表达式版本

import re
char1 = '    ssss    '
char2 = 'aaaassssaaaa'
def sameStrip(char,temp=r'\s'):
    spaceRegex = re.compile(r'^%s*|%s*$' % (temp,temp))
    mo = spaceRegex.sub('',char)
    return mo
print('---')
print(sameStrip(char1))
print(sameStrip(char2,'a'))