Python 正则表达式函数（二）

最新推荐文章于 2022-08-25 11:44:29 发布

Xiaofei@IDO

最新推荐文章于 2022-08-25 11:44:29 发布

阅读量258

点赞数

分类专栏：正则表达式 Python 基本函数文章标签：正则表达式 python

本文链接：https://blog.csdn.net/nixiang_888/article/details/105083798

版权

Python 基本函数同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

正则表达式

5 篇文章 5 订阅

订阅专栏

背景

基于 re.compile 编译的正则表达式对象 (Parttern)，可以调用多种属性方法。

正则表达式对象的方法或属性

1. Pattern.search(string[, pos[, endpos]])

从头开始搜索 string ，并返回一个匹配对象 (match object)
如果没有匹配成功，则返回 None
pos 是可选的，指定开始搜索的位置，默认是 0
endpos 是可选的，指定搜索结束的位置

In [25]: pattern = re.compile("d")

In [26]: pattern.search("dog") # Match at index 0
Out[26]: <re.Match object; span=(0, 1), match='d'>

In [27]: pattern.search("dog", 1) # No match; search doesn't include the "d"

2. Pattern.match(string[, pos[, endpos]])

必须在 pos （默认是 0 ）指定位置开始匹配
如果匹配成功返回 mathc object ，否则，返回 None

In [3]: pattern = re.compile("o")

In [5]: m =   pattern.match("dog")
In [6]: m == None  # No match as "o" is not at the start of "dog".
Out[6]: True

In [7]: m = pattern.match("dog", 1)
In [8]: m == None  # # Match as "o" is the 2nd character of "dog".
Out[8]: False

3. Pattern.fullmatch(string[, pos[, endpos]])

必须在 pos （默认 0 ）和 endpos （默认结尾）指定区间内完成全匹配
如果完全匹配成功则返回 match object ，否则，返回 None

In [9]: pattern = re.compile("o[gh]")                                           

In [10]: m = pattern.fullmatch("dog") 
In [11]: m == None # No match as "o" is not at the start of "dog".
Out[11]: True

In [12]: m = pattern.fullmatch("ogre") 
In [13]: m == None # No match as not the full string matches.
Out[13]: True

In [14]: m = pattern.fullmatch("doggie", 1, 3)
In [15]: m == None # Matches within given limits.
Out[15]: False

In [16]: m                                                                      
Out[16]: <re.Match object; span=(1, 3), match='og'>

3. 其它同 re 模块库的方法

3.1 Pattern.split(string, maxsplit=0)

3.2 Pattern.findall(string[, pos[, endpos]])

3.3 Pattern.finditer(string[, pos[, endpos]])

3.4 Pattern.sub(repl, string, count=0)

3.5 Pattern.subn(repl, string, count=0)

4. Pattern.groups

返回编译的正则表达式对象中的子组数量，
注意：一定是子组数量，没有捕获分组的正则表达式对象 pattern 返回 0

5. Pattern.groupindex

仅适用于 (?P<name>…) 的正则表达式对象

6. Pattern.pattern

返回正则表达式的字符串形式

In [22]: pattern = re.compile("(?P<first>\w{3}\s+(\w{2,4}))")

In [23]: pattern.groups
Out[23]: 2

In [24]: pattern.groupindex  
Out[24]: mappingproxy({'first': 1})

In [25]: pattern.pattern 
Out[25]: '(?P<first>\\w{3}\\s+(\\w{2,4}))'

match对象的方法

正则表达式搜索匹配后返回 match 对象
若果匹配成功，match 对象总是包含一个 True ，用于条件语句
除此之外，match 对象也有自已的方法，用于操作 match 对象

match = re.search(pattern, string)
if match:
    process(match)

1. Match.expand(template)

template 是一个字符串，使用 match 对象中捕获子组定义的内容替换 template 中的后向引用标识符，包括转义字符（如，\n ，即 \1 等）和命名子组（如， \g<1> （同 \1 ）或 \g<name> ）
如果没有匹配对象，则使用空字符串替换

In [26]: m = re.search(r"(t)(\w{3})?", "Return the string") 

In [27]: m.expand(r'substring is \2')
Out[27]: 'substring is urn'

In [29]: m.expand(r'substring is \1')
Out[29]: 'substring is t'

2. Match.group([group1, …])

如果不提供参数，或提供 0 参数，则返回匹配内容（ string 对象）
如果提供一个不为零的参数，则返回指定的捕获子组匹配内容（ string 对象）
如果提供多余一个的参数，则返回一个捕获子组匹配内容形成的字符串元组

In [2]: m = re.match(r"(\w+) (\w+)", "Returns one or more subgroups of the match")                                                                      

In [3]: m.group()
Out[3]: 'Returns one'

In [4]: m.group(0)
Out[4]: 'Returns one'

In [5]: m.group(1)
Out[5]: 'Returns'

In [6]: m.group(1,2)
Out[6]: ('Returns', 'one')

如果正则表达式采用 后向引用 语法（ (?P<name>…) ），则 group 参数可以是 name ，也可以是数值

In [7]: m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)","Malcolm Reynolds") 
In [8]: m.group("first_name")                   
Out[8]: 'Malcolm'

In [9]: m.group("last_name")
Out[9]: 'Reynolds'

In [11]: m.group(2)              
Out[11]: 'Reynolds'

In [12]: m.group("last_name", "first_name")
Out[12]: ('Reynolds', 'Malcolm')

如果匹配多次，则返回最后一次匹配

In [13]: m = re.match(r"(..)+", "a1b2c3") 

In [14]: m.group(1)          
Out[14]: 'c3'

In [17]: m.groups() 
Out[17]: ('c3',)

3. Match.getitem(g)

允许使用方括号（[]）访问匹配内容

In [18]: m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")

In [19]: m[0]
Out[19]: 'Isaac Newton'

In [20]: m[1]
Out[20]: 'Isaac'

4. Match.groups(default=None)

返回由捕获子组形成的元组，其中，未被捕获子组匹配的返回 default 指定的内容（默认为 None ）
注意：仅适用于捕获子组匹配；如果没有捕获子组，则返回空元组

In [22]: m = re.match(r"(\d+)\.(\d+)", "24.1632")
In [23]: m.groups()
Out[23]: ('24', '1632')

In [24]: m = re.match(r"(\d+)\.?(\d+)?", "24")

In [25]: m.groups()
Out[25]: ('24', None)

In [26]: m.groups(0) # Now, the second group defaults to '0'.
Out[26]: ('24', 0)

5. Match.groupdict(default=None)

返回 后向引用 捕获子组的字典
仅适用于 后向引用 的命名子组（(?P<name>…)）

In [28]: m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")

In [29]: m.groupdict()
Out[29]: {'first_name': 'Malcolm', 'last_name': 'Reynolds'}

6. Match.start([group]) 、Match.end([group])

不提供参数或提供 0 参数时，返回整个匹配的起始或结束位置；若提供数值参数，则返回指定捕获子组的起始或结束位置
如果指定捕获子组存在，但为 null match ，则起始与结束位置相等
如果指定捕获子组存在，但未匹配内容，则返回 -1
超出捕获子组数量的，则返回 IndexError

In [30]: m = re.search('b(c?)', 'cba') # ①
# c可有可无，但子组一定有
In [31]: m.groups()
Out[31]: ('',)

In [32]: m.group(1)
Out[32]: ''

In [33]: m.start(0)
Out[33]: 1

In [34]: m.end(0)
Out[34]: 2

In [35]: m.start(1) # 子组匹配成功，为 null match - 见'Out[31]'
Out[35]: 2

In [37]: m.end(1)  # null match
Out[37]: 2

In [59]: m = re.match(r"(\d+)\.?(\d+)?", "24") 
In [60]: m.start(2) # Return -1 if group exists but did not contribute to the match
Out[60]: -1

In [103]: m = re.search('b(c)?', 'cba') # 注意和①的区别
# c一定有，但子组可有可无
In [111]: m.groups()
Out[111]: (None,) # 与'Out[31]'的区别

In [104]: m.start(1) # 子组没有匹配内容
Out[104]: -1

In [105]: m.start(0)
Out[105]: 1

不提供参数或提供 0 参数时，返回整个匹配的起始或结束位置

In [38]: email = "tony@tiremove_thisger.net"  
In [39]: m = re.search("remove_this", email)

In [40]: m.start()
Out[40]: 7

In [41]: m.start(0) # 表示整个匹配的起始位置；当为1时，引起IndexError
Out[41]: 7

7. Match.span([group])

同 match.start() / match.end() ，但是返回含有匹配的起始和结束位置的元组
如果指定捕获子组存在，但未匹配内容，则返回 （-1，-1）

In [112]: email = "tony@tiremove_thisger.net"
In [113]: m = re.search("remove_this", email)

In [114]: m.span()
Out[114]: (7, 18)

In [115]: m.span(0)
Out[115]: (7, 18)

In [116]: m = re.search('b(c)?', 'cba')

In [117]: m.span()
Out[117]: (1, 2) # 2 指向 null match - 将 'Out[31]'

In [118]: m.span(1)
Out[118]: (-1, -1)

8. Other

Match.pos 返回开始搜索的位置
Match.endpos 返回结束搜索的位置
Match.lastindex 返回最后一个捕获子组的整数索引，或者返回 None
Match.lastgroup 返回最后一个捕获子组的名字，或者返回 None
Match.re 返回正则表达式对象
Match.string 返回用于搜索匹配的字符串

Xiaofei@IDO

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Python 正则表达式函数（二）

Python 正则表达式函数（二）背景正则表达式对象方法或属性1. Pattern.search(string[, pos[, endpos]])背景基于 re.compile 编译的正则表达式对象 (Parttern)，可以调用多种属性方法。正则表达式对象方法或属性1. Pattern.search(string[, pos[, endpos]])从头开始搜索 string ，并返回...
复制链接

扫一扫