Python正则表达式使用手册

最新推荐文章于 2024-11-04 21:51:40 发布

AI大模型学习路线

最新推荐文章于 2024-11-04 21:51:40 发布

阅读量91

点赞数

文章标签： python 正则表达式开发语言

本文链接：https://blog.csdn.net/maiya_yaya/article/details/131358412

版权

正则在处理字符串的领域是无可非议的强大！正因过于强大，导致正则的使用门槛也不低。

我总结一些常见的匹配方式并结合Python代码例子，希望对大家使用正则的时候有所帮助！

字符匹配

字符匹配：匹配指定的字符或字符集合，例如**「单个字符」[a-z]、「数字字符」**\d等。

import re

# 匹配单个字符
pattern = r"b[aeiou]t"
string = "bat, bet, bit, bot, but"
result = re.findall(pattern, string)
print(result) # ['bat', 'bet', 'bit', 'bot', 'but']

# 匹配数字字符
pattern = r"\d+"
string = "123 456 789"
result = re.findall(pattern, string)
print(result) # ['123', '456', '789']

位置匹配

位置匹配：匹配字符串的位置，例如 「行首」、「行尾」、「单词边界」 等。

import re

# 匹配行首
pattern = r"^The"
string = "The quick brown fox\nThe lazy dog"
result = re.findall(pattern, string, re.MULTILINE)
print(result) # ['The', 'The']

# 匹配单词边界
pattern = r"\bfox\b"
string = "The quick brown fox\njumps over the lazy dog"
result = re.findall(pattern, string)
print(result) # ['fox']

重复匹配

重复匹配：匹配重复出现的字符或字符集合，例如 「重复次数」、「重复范围」 等。

import re

# 匹配重复次数
pattern = r"a{3}"
string = "aaa abc aa a"
result = re.findall(pattern, string)
print(result) # ['aaa']

# 匹配重复范围
pattern = r"\d{2,3}"
string = "12 123 1234 12345"
result = re.findall(pattern, string)
print(result) # ['12', '123', '123', '345']

分支匹配

分支匹配：匹配多个可选项，例如 「选项1」或「选项2」。

import re

# 匹配多个可选项
pattern = r"cat|dog"
string = "The quick brown fox jumps over the lazy dog"
result = re.findall(pattern, string)
print(result) # ['dog']

# 匹配多个可选项（忽略大小写）
pattern = r"cat|dog"
string = "The quick brown Fox jumps over the lazy Dog"
result = re.findall(pattern, string, re.IGNORECASE)
print(result) # ['Fox', 'Dog']

分组匹配

分组匹配：匹配特定的字符或字符集合，并将其标记为子表达式，例如 「提取子字符串」 。

import re

# 提取子字符串
pattern = r"(\d{4})-(\d{2})-(\d{2})"
string = "2022-03-02 is a good day"
result = re.findall(pattern, string)
print(result) # [('2022', '03', '02')]

后向引用匹配

后向引用匹配：匹配之前已经匹配的子表达式，例如查找重复单词。

import re

# 提取子字符串
pattern = r"(\d{4})-(\d{2})-(\d{2})"
string = "2022-03-02 is a good day"
result = re.findall(pattern, string)
print(result) # [('2022', '03', '02')]

贪婪匹配和懒惰匹配

贪婪匹配和非贪婪匹配：贪婪匹配是指匹配 「尽可能多」的字符，懒惰匹配是指匹配「尽可能少」 的字符。

import re

# 贪婪匹配
pattern = r"<.*>"
string = "<a>hello</a><b>world</b>"
result = re.findall(pattern, string)
print(result) # ['<a>hello</a><b>world</b>']

# 非贪婪匹配
pattern = r"<.*?>"
string = "<a>hello</a><b>world</b>"
result = re.findall(pattern, string)
print(result) # ['<a>', '</a>', '<b>', '</b>']

零宽度断言匹配

零宽度断言匹配：零断言可以匹配一个位置，而不是匹配一个字符。

零断言用于在匹配字符串时，指定匹配的位置前或后必须满足某些条件，从而实现更加精确的匹配。

在正则表达式中，有四种常用的零断言：

正向零断言：匹配满足正则表达式的字符后面的位置，但不包括这些字符。

import re

# 正向零断言，匹配hello后面是world的位置
pattern = r"hello(?=world)"
string = "hellopythonhelloworld"
result = re.findall(pattern, string)
print(result) # ['hello']

反向零断言：匹配不满足正则表达式的字符后面的位置，但不包括这些字符。

import re

# 反向零断言，匹配hello后面不是world的位置
pattern = r"hello(?!world)"
string = "hellopythonhelloworld"
result = re.findall(pattern, string)
print(result) # ['hello']

正向零宽度断言：匹配满足正则表达式的字符前面的位置，但不包括这些字符。

import re

# 正向零宽度断言，匹配hello前面是python的位置
pattern = r"(?<=python)hello"
string = "pythonhellopythonworld"
result = re.findall(pattern, string)
print(result) # ['hello']

反向零宽度断言：匹配不满足正则表达式的字符前面的位置，但不包括这些字符。

import re

# 反向零宽度断言，匹配hello前面不是python的位置
pattern = r"(?<!python)hello"
string = "pythonhellopythonworld"
result = re.findall(pattern, string)
print(result) # []