python正则表达式（二）

最新推荐文章于 2021-11-10 12:53:57 发布

此处无声胜有声

最新推荐文章于 2021-11-10 12:53:57 发布

阅读量432

点赞数

分类专栏： python 文章标签： python 函数正则表达式扩展

本文链接：https://blog.csdn.net/qq_24074149/article/details/78766207

版权

python 专栏收录该内容

19 篇文章 1 订阅

订阅专栏

python的正则表达式支持大量的扩展符号
通过使用（?iLmsux）系列，用户可以在正则表达式里面指定一个或者多个标记，而不是通过compile()或者其他re模块函数。下面使用re.I/IGNORECASE的示例，最后一个示例在re.M/MULTILINE实现多行混合。

>>>import re
>>>re.findall(r'(?i)yes','yes? Yes. YES!')
['yes','Yes','YES']
>>>re.findall(r'(?i)th\w+','The quickest way is through this tunnel.')
['The', 'through', 'this']
>>> re.findall(r'(?im)(^th[\w ]+)','''
... This line is the fisrt ,
... another line,
... that line,it's the best
... ''')
['This line is the fisrt ', 'that line']

在前面两个示例中，显然不区分大小写，在最后一行示例中，通过使用‘多行’，能够在目标字符串中跨行搜索，而不必将整个字符串视为单个实体。
下一组演示使用re.S/DOTALL.该标记表明点号（.）能够用来表示\n符号（反之其通常用于表示除了\n之外的全部字符）:

>>> re.findall(r'th.+','''
... The firse line
... the second line
... the third line
... ''')
['the second line', 'the third line']
>>> re.findall(r'(?s)th.+','''
... The firse line
... the second line
... the third line
... ''')
['the second line\nthe third line\n']

re.X/VERBOSE标记，允许用户通过抑制在正则表达式中使用空白符号（除了在字符类中或者在反斜线转义中）来创建更易读的正则表达式。此外，散列、注释、＃号也可以用于一个注释的起始，只要他们不在一个用反斜线转义的字符类中。

>>> re.search(r'''(?x)
...       \((\d{3})\)        #区号
...       [ ]                #空白符
...       (\d{3})            #前缀
...       -                  #横线
...       (\d{4})            #终点数字
... ''','(800) 555-1212').groups()
('800', '555', '1212')
>>>

（?:…）符号更流行;通过使用该符号，可以对部分正则表达式进行分组，但是并不会保存该分组用于后续的检索或者应用。当不想保存今后不会使用的多余匹配时，该符号非常有用。

>>> re.findall(r'http://(?:\w+\.)*(\w+\.com)',
... 'http://google.com http://www.baidu.com http://code.google.com')
['google.com', 'baidu.com', 'google.com']
>>> 
>>> re.search(r'\((?P<areacode>\d{3})\) (?P<prefix>\d{3})-(?:\d{4})',
... '(800) 555-1212').groupdict()
{'areacode': '800', 'prefix': '555'}
>>>

可以通过一起使用(?P<>) 和(?P=name)。前者通过使用一个名称标识符而不是使用从1开始增加N的增量数字来保存匹配，如果使用数字来保存匹配结果，我们可以使用\1,\2….,\N来检索。

>>> re.sub(r'\((?P<areacode>\d{3})\) (?P<prefix>\d{3})-(?:\d{4})',
...    '(\g<areacode>) \g<prefix>-xxxx','(800) 555-1212')
'(800) 555-xxxx'
>>>

处理验证电话号码的规范化

>>> bool(re.match(r'\((?P<areacode>\d{3})\) (?P<prefix>\d{3})-(?P<number>\d{4}) (?P=areacode)-(?P=prefix)-(?P=number) 1(?P=areacode)(?P=prefix)(?P=number)',
... '(800) 555-1212 800-555-1212 18005551212'))
True
>>>

使用(?=…)和(?!…)实现一个前视匹配，而不必实际上使用这些字符串，前者是正向前视断言，后者是负向前视断言。

>>> re.findall(r'\w+(?= van Rossum)',
... '''
...     Guido van Rossum
...     Tim Peters
...     Alex Martelli
...    Just van Rossum
... ''') 
['Guido','Just']
>>> 
>>> re.findall(r'(?m)^\s+(?!noreply|postmaster)(\w+)',
... '''
...    sales@phptr.com
...    postmaster@phptr.com
...    eng@phptr.com
...    noreply@phptr.com
...    admin@phptr.com
... ''')
['sales', 'eng', 'admin']
>>>
>>> bool(re.search(r'(?:(x|y)(?(1)y|x))','xy'))
True 
 >>> bool(re.search(r'(?:(x|y)(?(1)y|x))','xx'))
False
>>>

此处无声胜有声

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式（二）

python的正则表达式支持大量的扩展符号通过使用（?iLmsux）系列，用户可以在正则表达式里面指定一个或者多个标记，而不是通过compile()或者其他re模块函数。下面使用re.I/IGNORECASE的示例，最后一个示例在re.M/MULTILINE实现多行混合。>>>import re>>>re.findall(r'(?i)yes','yes? Yes. YES!')['yes',
复制链接

扫一扫