Python3正则表达式(二)re模块

最新推荐文章于 2024-05-02 00:38:27 发布

坚强的狗蛋

最新推荐文章于 2024-05-02 00:38:27 发布

阅读量2.1k

点赞数 5

分类专栏： python3正则表达式文章标签： python 正则表达式 re模块

本文链接：https://blog.csdn.net/m0_37852369/article/details/78838498

版权

python3正则表达式专栏收录该内容

3 篇文章 0 订阅

订阅专栏

在Python3正则表达式(一)基本语法规则已经记录了正则表达式的基本规则，接下来将写一下在python当中如何利用正则表达式去匹配字符串，即re模块中功能函数的使用。
使用时要先进行导入re模块：import re

一、re模块中常用的函数

1.compile()

源码描述：

def compile(pattern, flags=0):
    "Compile a regular expression pattern, returning a pattern object."
    # 生成一个正则表达式模式，返回一个Regex对象
    return _compile(pattern, flags)

参数说明：

pattern: 正则表达式
flags: 用于修改正则表达式的匹配方式，就是我们在基本语法规则中说到的(iLmsux)六种模式，默认正常模式

示例代码：

pattern = re.compile(r"\d")
result = pattern.match("123")
print(result.group())
# 输出结果为1 因为这里只有一个\d 所以只匹配到一个数字

pattern = re.compile(r"abc d", re.I|re.X)
result = pattern.match("AbcD")
print(result.group())
# 输出结果为AbcD 证明可以同时使用多个模式

2.match()

源码描述：

1. def match(pattern, string, flags=0):
    """Try to apply the pattern at the start of the string, returning a match object, or None if no match was found."""
    # 在字符串的开头匹配pattern，返回Match匹配对象，如果没有不到匹配的对象，返回None。
    return _compile(pattern, flags).match(string)

2. def match(self, string, pos=0, endpos=-1):
    """Matches zero | more characters at the beginning of the string."""
    pass
    # 可以指定匹配的字符串起始位置

参数说明：

# 其他两个参数与compile()当中的意义一致
string: 需要验证的字符串
pos: 设定开始位置，默认0
endpos: 设定结束位置，默认-1

示例代码：

result = re.match(r"a+\d", "aA123", re.I)
print(result.group())
# 输出结果为aA1 只要pattern匹配完了，则视为成功，并将匹配成功的字符串返回

pattern = re.compile(r"abc d", re.I|re.X)
result = pattern.match("0AbcD5", 1, 5)
print(result.group())
# 输出结果为AbcD 从第1个位置开始，到第5个位置之前的字符串

3.search()

源码描述：

1. def search(pattern, string, flags=0):
    """Scan through string looking for a match to the pattern, returning a match object, or None if no match was found."""
    # 大致意思与match方法相同，不同的地方在于search时整个字符串任意位置匹配，而match时从特定的位置(pos)开始向后仅匹配一次
    return _compile(pattern, flags).search(string)

2. def search(self, string, pos=0, endpos=-1):
    """Scan through string looking for a match, and return a corresponding match instance. Return None if no position in the string matches."""
    pass
    # 可以指定字符串的子串进行匹配

参数说明：

# 与match中的一致

示例代码：

pattern = re.compile(r"abc d", re.I|re.X)
result = pattern.search("0A2aBcd7")
print(result.group())
# 输出结果为aBcd 在字符串中任意位置只要匹配到就返回结果

pattern = re.compile(r"abc d", re.I|re.X)
matchResult = pattern.match("0AbcD5")
searchResult = pattern.search("0AbcD5")
# matchResult的结果是None
# searchResult.group()的结果结果为AbcD 
# 因为在pattern中第一个位置是a，但是在字符串中第一个位置是0，所以match方法在这里匹配失败

4.group()，groups()和groupdict()

源码描述：

1.def group(self, *args):
   """Return one or more subgroups of the match."""
   # 返回成功匹配的一个或多个子组
   pass

2.def groups(self, default=None):
   """Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern."""
   # 以元组的格式返回所有分组匹配到的字符
   pass

3.def groupdict(self, default=None):
   """Return a dictionary containing all the named subgroups of the match,keyed by the subgroup name."""
   # 以字典的格式返回所有分组匹配到的字符
   pass

参数说明：

group中的*args: 如果参数为一个，就返回一个子串；如果参数有多个，就返回多个子串的元组。如果不传任何参数，和传入0一样，将返回整个匹配子串。
groups中的default: 用于给那些没有匹配到的分组做默认值，它的默认值是None
groupdict中的default: 用于给那些没有匹配到的分组做默认值，它的默认值是None

示例代码：

pattern = re.compile(r"([\w]+) ([\w]+)")
m = pattern.match("Hello World Hi Python")
print(m.group())
# 输出结果为Hello World 第一个分组成功匹配到Hello第二个成功匹配到World 正则表达式已匹配结束
print(m.group(1))
# 输出结果为Hello 取第一个分组成功匹配到Hello
print(m.group(2))
# 输出结果为World 取第二个分组成功匹配到World 

pattern = re.compile(r"([\w]+)\.?([\w]+)?")
m = pattern.match("Hello")
print(m.groups())
# 输出结果为('Hello', None) 第一个元素是一个分组匹配到的Hello，因为第二个分组没有匹配到，所以返回None
print(m.groups("Python"))
# 输出结果为('Hello', 'Python') 因为第二个分组没有匹配到，所以返回在groups中设置的默认值

pattern = re.compile(r"(?P<first_str>\w+) (?P<last_str>\w+)")
m = pattern.match("Hello Python")
print(m.groupdict())
# 输出结果为{'first_name': 'Hello', 'last_name': 'Python'} 默认值的用法与groups中的相同

5.findall()

源码描述：

def findall(self, string, pos=0, endpos=-1):
   """Return a list of all non-overlapping matches of pattern in string."""
   # 返回字符串中所有匹配成功的子串的列表，
   #重点：返回的是一个列表，没有group方法
   pass

参数说明：

# 与match方法一致

示例代码：

pattern = re.compile(r'\d+')
m = pattern.findall('a1b2c33d4')
print(m)
# 输出['1', '2', '33', '4'] 查找出字符串中的所有数字

m = pattern.findall('a1b2c33d4', 1, 6)
print(m)
# 输出['1', '2', '3'] 在"1b2c3"中查找

6.finditer()

源码描述：

def finditer(self, string, pos=0, endpos=-1):
   """Return an iterator over all non-overlapping matches for the pattern in string. For each match, the iterator returns a match object."""
   # 返回字符串中所有匹配成功的子串的迭代器
   pass

参数说明：

# 与match方法一致

示例代码：

pattern = re.compile(r'\d+')
m = pattern.finditer('a1b2c33d4')
print(m)
# 输出<callable_iterator object at 0x0000017A8C0F8240>迭代器

print(next(m).group())
# 依次输出匹配到的结果

7.finditer()

源码描述：

def split(self, string, maxsplit=0):
   """Split string by the occurrences of pattern."""
   # 返回根据匹配到的的子串将字符串分割后成列表
   pass

参数说明：

maxsplit: 指定最大分割次数，不指定将全部分割。

示例代码：

pattern = re.compile(r'\d+')
m = pattern.split('a1b2c3d4e')
print(m)
# 输出['a', 'b', 'c', 'd', 'e'] 根据数字，全部分割

m = pattern.split('a1b2c3d4e', 3)
print(m)
# 输出['a', 'b', 'c', 'd4e'] 只分割三次，后面的不进行分割

8.split()

源码描述：

def split(self, string, maxsplit=0):
   """Split string by the occurrences of pattern."""
   # 返回根据匹配到的的子串将字符串分割后成列表
   pass

参数说明：

maxsplit: 指定最大分割次数，不指定将全部分割。

示例代码：

pattern = re.compile(r'\d+')
m = pattern.split('a1b2c3d4e')
print(m)
# 输出['a', 'b', 'c', 'd', 'e'] 根据数字，全部分割

m = pattern.split('a1b2c3d4e', 3)
print(m)
# 输出['a', 'b', 'c', 'd4e'] 只分割三次，后面的不进行分割

9.sub()

源码描述：

def sub(self, repl, string, count=0):
   """Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl."""
   # repl替换掉字符串中匹配到的子串，变成新的字符串返回
   pass

参数说明：

repl: 替补内容
string: 原字符串
count: 替换次数,默认全部替换

示例代码：

pattern = re.compile(r'\s+')
text = "Process finished with exit code 0"
m = pattern.sub('-', text, 3)
print(m)
# 输出结果Process-finished-with-exit code 0 前三个空格被‘-’替换了

10.subn()

源码描述：

def subn(self, repl, string, count=0):
   """Return the tuple (new_string, number_of_subs_made) found by replacing the leftmost non-overlapping occurrences of pattern with the replacement repl."""
   # 返回一个由替换后的结果和替换的次数组成的元组
   pass

参数说明：

与sub()参数含义一致

示例代码：

pattern = re.compile(r'\s+')
text = "Process finished with exit code 0"
m = pattern.subn('-', text)
print(m)
# 输出结果('Process-finished-with-exit-code-0', 5)

二、总结

上一部分只是记录了re模块当中比较常用的十种方法，大家可以在源码中看到另外几种简单的或者不常用的方法：

fullmatch(string, pos=0, endpos=-1)
start(group=0)
end(group=0)
escape(string)

如果可以掌握上述的十种方法，那理解这四种方法也是轻而易举。
re模块的使用方法就讲这么多了，如果有错误的地方，希望可以指正，我自己也是在学习阶段，谢谢。

介绍一个正则测试小工具：正则表达式测试工具
后续，还将在写一篇 Python3正则表达式(三)贪婪模式与非贪婪模式

坚强的狗蛋

关注

5
点赞
踩
15

收藏

觉得还不错? 一键收藏
2
评论
Python3正则表达式(二)re模块

在Python3正则表达式(一)基本语法规则已经记录了正则表达式的基本规则，接下来将写一下在python当中如何利用正则表达式去匹配字符串，即re模块中功能函数的使用。使用时要先进行导入re模块：import re一、re模块中常用的函数1.compile()源码描述：def compile(pattern, flags=0): "Compile a regular expression
复制链接

扫一扫