Python 正则表达式知识点及实例

最新推荐文章于 2024-08-17 12:54:01 发布

XGorgeous

最新推荐文章于 2024-08-17 12:54:01 发布

阅读量54

点赞数

文章标签： python 正则表达式 mysql

本文链接：https://blog.csdn.net/weixin_50606361/article/details/132796786

版权

本文介绍了正则表达式的基本概念，展示了如何在Python中使用re模块进行搜索、匹配、替换和分割操作，包括基本语法、常用方法如match、search、findall和sub，以及高级技巧如捕获组、非贪婪匹配、忽略大小写和多行匹配。

摘要由CSDN通过智能技术生成

什么是正则表达式？

正则表达式，又称为正则表达式模式，是一种用于匹配字符串的表达式。它由一系列字符和特殊字符组成，用于描述一个字符串的模式。正则表达式可以用于搜索、匹配、替换和验证字符串。

Python 中的 re 模块

在Python中，我们可以使用re模块来处理正则表达式。首先，确保你已经导入了这个模块：

import re

基础正则表达式语法

1. 匹配单个字符

.：匹配任意单个字符，除了换行符。
[ ]：匹配括号内的任意一个字符。例如，[aeiou]匹配任何一个元音字母。
[^ ]：匹配不在括号内的任意一个字符。例如，[^0-9]匹配任何非数字字符。

2. 匹配特定数量的字符

*：匹配前一个字符0次或多次。例如，ab*c可以匹配ac、abc、abbc等。
+：匹配前一个字符1次或多次。例如，ab+c可以匹配abc、abbc等，但不能匹配ac。
?：匹配前一个字符0次或1次。例如，ab?c可以匹配ac或abc，但不能匹配abbc。

3. 匹配重复次数

{n}：匹配前一个字符恰好n次。例如，a{3}匹配aaa。
{n,}：匹配前一个字符至少n次。例如，a{2,}匹配aa、aaa等。
{n,m}：匹配前一个字符至少n次但不超过m次。例如，a{2,4}匹配aa、aaa或aaaa。

4. 锚点和边界

^：匹配字符串的开头。
$：匹配字符串的结尾。
\b：匹配单词边界。例如，\bword\b可以匹配word，但不能匹配myword。

常见正则表达式示例

1. 匹配邮箱地址

pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'

2. 匹配URL

pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'

3. 匹配手机号码

pattern = r'^1[3456789]\d{9}$'

正则表达式的搜索和匹配

一旦有了正则表达式模式，我们可以使用re模块的方法进行搜索和匹配：

1. `re.match()`

re.match()方法用于从字符串的开头匹配模式。如果在字符串的开头找到了匹配的模式，它返回一个匹配对象，否则返回None。

import re

text = "Hello, World!"
pattern = r"Hello"
result = re.match(pattern, text)

if result:
    print("Match found:", result.group())
else:
    print("Match not found")

2. `re.search()`

re.search()方法在整个字符串中搜索匹配模式。它返回第一个匹配的对象，如果没有找到匹配，则返回None。

import re

text = "Hello, World!"
pattern = r"World"
result = re.search(pattern, text)

if result:
    print("Match found:", result.group())
else:
    print("Match not found")

3. `re.findall()`

re.findall()方法返回一个包含所有匹配的字符串列表。

import re

text = "apple banana cherry banana"
pattern = r"banana"
result = re.findall(pattern, text)

print("Matches:", result)

正则表达式的替换和分割

1. `re.sub()`

re.sub()方法用于在字符串中替换匹配的部分。

import re

text = "Hello, World!"
pattern = r"Hello"
replacement = "Hi"
result = re.sub(pattern, replacement, text)

print("Result:", result)

2. `re.split()`

re.split()方法用于根据匹配模式将字符串拆分为子串。

import re

text = "apple, banana, cherry"
pattern = r", "
result = re.split(pattern, text)

print("Result:", result)

高级正则表达式技巧

1. 捕获组

捕获组允许你从匹配的文本中提取特定部分。使用圆括号 () 来创建捕获组，并使用\1、\2等来引用它们的内容。

import re

text = "Name: John, Age: 30, Name: Alice, Age: 25"
pattern = r"Name: (\w+), Age: (\d+)"
result = re.findall(pattern, text)

for match in result:
    name, age = match
    print(f"Name: {name}, Age: {age}")

2. 非贪婪匹配

默认情况下，正则表达式是贪婪的，它们会匹配尽可能多的字符。使用*?、+?、??等来实现非贪婪匹配，匹配尽可能少的字符。

import re

text = "<p>First paragraph</p><p>Second paragraph</p>"
pattern = r"<p>.*?</p>"
result = re.findall(pattern, text)

for match in result:
    print("Match:", match)

3. 忽略大小写

通过在正则表达式模式中使用re.IGNORECASE标志，可以忽略大小写匹配。

import re

text = "Hello, World!"
pattern = r"world"
result = re.search(pattern, text, re.IGNORECASE)

if result:
    print("Match found:", result.group())
else:
    print("Match not found")

4. 多行匹配

通过在正则表达式模式中使用re.MULTILINE标志，可以实现多行匹配。

import re

text = "Line 1\nLine 2\nLine 3"
pattern = r"^Line \d+"
result = re.findall(pattern, text, re.MULTILINE)

print("Matches:", result)