详解正则表达式

最新推荐文章于 2024-10-11 17:22:32 发布

BlogLaurie

最新推荐文章于 2024-10-11 17:22:32 发布

阅读量773

点赞数 15

文章标签：正则表达式 leetcode

本文链接：https://blog.csdn.net/weixin_52534930/article/details/139762201

版权

做一道leetcode2288. 价格减免

用到正则表达式

常用的一些函数及正则表达式的书写：

re.match()

re.match 从字符串的起始位置开始匹配模式。如果模式没有在起始位置匹配成功，返回 None。

import re

pattern = r'^\\$\\d+(\\.\\d{1,2})?$'
text = "$20.50"

match = re.match(pattern, text)
if match:
    print("Match found:", match.group())
else:
    print("No match")

2. `re.search()`

re.search 扫描整个字符串并返回第一个匹配成功的对象。

import re

pattern = r'\\$\\d+(\\.\\d{1,2})?'
text = "The total cost is $20.50 and the discount is $5."

match = re.search(pattern, text)
if match:
    print("Match found:", match.group())
else:
    print("No match")

3. `re.findall()`

re.findall 返回所有非重叠的匹配对象组成的列表。

import re

pattern = r'\\$\\d+(\\.\\d{1,2})?'
text = "The total cost is $20.50 and the discount is $5."

matches = re.findall(pattern, text)
print("All matches:", matches)

4. `re.sub()`

re.sub 替换字符串中每一个匹配的子串，并返回替换后的字符串。

import re

pattern = r'\\$\\d+(\\.\\d{1,2})?'
text = "The total cost is $20.50 and the discount is $5."

def apply_discount(match):
    price = float(match.group()[1:]) * 0.8  # 例如应用20%的折扣
    return f"${price:.2f}"

new_text = re.sub(pattern, apply_discount, text)
print("After discount:", new_text)

完整示例

将所有这些知识结合起来，完成一个功能：识别并折扣字符串中的价格。

import re

class Solution:
    def discountPrices(self, sentence: str, discount: int) -> str:
        pattern = r'\\$\\d+(\\.\\d{1,2})?'

        def apply_discount(match):
            price = float(match.group()[1:]) * (1 - discount / 100)
            return f"${price:.2f}"

        new_sentence = re.sub(pattern, apply_discount, sentence)
        return new_sentence

# 示例
solution = Solution()
sentence = "The total cost is $100 and the discount is $50.50."
discount = 20
result = solution.discountPrices(sentence, discount)
print(result)

解释：

定义正则表达式模式：pattern = r'\\$\\d+(\\.\\d{1,2})?' 用于匹配以 $ 开头的价格。
定义折扣函数：apply_discount 接受正则表达式的匹配对象，计算折扣后的价格，并格式化为两位小数。
使用 re.sub 进行替换：将文本中所有匹配到的价格替换为折扣后的价格。

编写正则表达式（Regular Expressions, regex）需要理解基本的字符、元字符和量词。以下是一些基本概念和常见的正则表达式模式：

基本字符

普通字符：例如 a、b、1、2 等，匹配它们自身。
特殊字符（元字符）：例如 .、^、$、``、+、?、|、(、)、[、]、{、}，这些字符有特殊含义，如果要匹配它们本身，需要用反斜杠 \\ 转义。

元字符及其含义

.：匹配除换行符外的任意字符。
^：匹配字符串的开始位置。
$：匹配字符串的结束位置。
``：匹配前面的字符零次或多次。
+：匹配前面的字符一次或多次。
?：匹配前面的字符零次或一次。
|：表示“或”操作。
[]：定义一个字符类，例如 [a-z] 表示匹配任意小写字母。
()：分组符号，用于提取子匹配或应用量词。

量词

{n}：恰好匹配 n 次。
{n,}：匹配至少 n 次。
{n,m}：匹配 n 到 m 次。

特殊字符类

\\d：匹配任何数字，相当于 [0-9]。
\\D：匹配任何非数字字符。
\\w：匹配任何字母数字字符，相当于 [a-zA-Z0-9_]。
\\W：匹配任何非字母数字字符。
\\s：匹配任何空白字符（包括空格、制表符等）。
\\S：匹配任何非空白字符。

常见正则表达式示例

匹配数字字符串：
```
\\d+
```
匹配一个或多个数字。
匹配特定格式的价格（例如 $20 或 $20.50）：
```
\\$\\d+(\\.\\d{1,2})?
```
- \\$：匹配美元符号 $。
- \\d+：匹配一个或多个数字。
- (\\.\\d{1,2})?：可选的小数点后跟一到两位数字。
匹配电子邮件地址：
```
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}
```
- [a-zA-Z0-9._%+-]+：匹配邮箱用户名部分，包括字母、数字和特定符号。
- @：匹配 @ 符号。
- [a-zA-Z0-9.-]+：匹配域名部分。
- \\.[a-zA-Z]{2,}：匹配域名的后缀部分。