【附python代码】字符串匹配、替换、分割方法快速入门——正则表达式

JQW_CSU

已于 2024-07-17 17:38:35 修改

阅读量188

点赞数 2

文章标签：正则表达式 python vscode

于 2024-07-16 15:46:15 首次发布

本文链接：https://blog.csdn.net/JQW_CSU/article/details/140467925

版权

一、概念

正则表达式（Regular Expressions，regex）

是一种用于匹配字符串中字符组合的模式。

用于查找、替换、分割和验证字符串中的特定模式。

常用于：文本处理、数据清理、解析

二、作用

2.1 匹配字符串

1️⃣ re.match()——验证字符串是否符合某种模式

【example】验证电子邮件地址是否合理

import re

email = "example@example.com"
pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'

if re.match(pattern, email):
    print("有效的电子邮件地址")
else:
    print("无效的电子邮件地址")

`^`	匹配字符串的开始
`[a-zA-Z0-9_.+-]+`	匹配一个或多个字母（大写或小写）、数字、下划线、点、加号或减号
`\.`	匹配字符 "."（因为 "." 在正则表达式中是特殊字符，需反斜杠转义）
`$`	匹配字符串的结束

注意⚠️：

re.match 只从字符串的起始位置进行匹配，如果起始位置不匹配，就返回 None。

import re

# 直接使用模式进行匹配
result = re.match(r'\d+', '123abc')
if result:
    print(result.group())  # 输出：123

result = re.match(r'\d+', 'abc123')
if result is None:
    print('No match')  # 输出：No match

2️⃣ re.compile()——编译正则表达式

该函数可以返回一个正则表达式对象（regex object）
这个对象可以用于多次匹配操作，增加代码效率和可读性。

import re

# 编译正则表达式模式
pattern = re.compile(r'\d+')

# 使用编译后的模式进行匹配
result = pattern.match('123abc')
if result:
    print(result.group())  # 输出：123

3️⃣ 两者用法辨析

区别❓

函数	re.compile	re.match
用途	先编译，再匹配	直接匹配
返回值	返回正则表达式对象	返回匹配对象或 None

import re

# 编译正则表达式模式
pattern = re.compile(r'\d+')

# 使用编译后的模式进行匹配
compiled_result = pattern.match('123abc')
if compiled_result:
    print('Compiled match:', compiled_result.group())  # 输出：123

# 直接使用 re.match 进行匹配
direct_result = re.match(r'\d+', '123abc')
if direct_result:
    print('Direct match:', direct_result.group())  # 输出：123

【小结🌈】

re.compile 编译的模式和直接使用 re.match 的效果是一样的
但编译后的模式可以在多次匹配操作中重复使用，提高代码效率

2.2 查找、替换字符串

将符合模式的子串查找并替换为其他字符串

【example】查找数字、替换为#号

import re

# 查找所有数字
text = "The price is 100 dollars and the discount is 20%"
numbers = re.findall(r'\d+', text)

print(numbers)  # 输出: ['100', '20']

# 替换所有数字为#
text = re.sub(r'\d+', '#', text)

print(text)  # 输出: The price is # dollars and the discount is #%

`\d`	匹配一个数字（相当于 `[0-9]`）

2.3 分割字符串

根据匹配模式分割字符串。

【example】将水果的名称分开输出

import re

text = "苹果,香蕉;橘子:葡萄"
pattern = r'[,;:]'

fruits = re.split(pattern, text)
print(fruits)  # 输出: ['苹果', '香蕉', '橘子', '葡萄']

[ ]	方括号中的字符是一个字符类，匹配其中任何一个字符。

[,;:] 表示匹配任何一个逗号、分号或冒号

2.4 提取子串

从字符串中提取符合模式的部分。

【example】提取URL中的域名

import re

url = "https://www.example.com/path/to/page"
pattern = r'https?://(www\.)?([a-zA-Z0-9.-]+)'

match = re.search(pattern, url)
if match:
    domain = match.group(2)
    print(domain)  # 输出: example.com

？	`?` 是非贪婪匹配量词，用来匹配前面表达式零次或一次
+	贪婪匹配，匹配前面的字符类一次或多次
（）	创建一个捕获组，捕获组会捕获匹配到的子串，方便后续引用
[a-zA-Z0-9.-]	匹配并捕获一个由字母、数字、小数点、连字符组成的字符串

在这个示例中：