正则表达式应用（一）用python正则表达式删除冗余符号-CSDN博客

本文链接：https://blog.csdn.net/weixin_42127042/article/details/146771203

日常工作中有很多类似的处理大量文本的 dirty work，学会用脚本偷懒，麻烦一阵子，舒服很多年！

以下是一个Python脚本，用于去除文本中所有的 `*` 和 `#` 符号，无论它们是连续出现还是单独存在：

```python
import re

def remove_symbols(text):
    """删除文本中所有 * 和 # 符号"""
    return re.sub(r'[*#]', '', text)

# 示例用法
input_text = "这是一段**测试**文本，#包含##多余的*和#符号***！"
cleaned_text = remove_symbols(input_text)
print(cleaned_text)
```

输出结果：
```
这是一段测试文本，包含多余的和符号！
```

---

扩展功能（按需选择）：
1. **只删除连续重复的符号**（例如将 `***` 替换为 `*`）

```python
def remove_redundant_symbols(text):
    """将连续多个 * 或 # 替换为单个符号"""
    return re.sub(r'([*#])\1+', r'\1', text)

# 示例输入: "***Hello###World**"
# 输出: "*Hello#World*"
```

2. **从文件读取并保存结果**

```python
def process_file(input_path, output_path):
    """处理文件中的文本"""
    with open(input_path, 'r', encoding='utf-8') as f:
        text = f.read()
    cleaned_text = re.sub(r'[*#]', '', text)
    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(cleaned_text)

# 使用示例
process_file("input.txt", "output.txt")
```

3. **保留特定位置的符号**（如保留电子邮件中的 `*`）

```python
def selective_remove(text):
    """删除非邮箱地址中的符号"""
    # 分两步处理：先保护邮箱中的符号，再删除其他符号
    protected = re.sub(r'(\b\w+[*#]\w+\b)', r'{{\1}}', text)  # 标记邮箱
    cleaned = re.sub(r'[*#]', '', protected)                   # 删除普通符号
    return re.sub(r'{{(.*?)}}', r'\1', cleaned)                # 恢复邮箱符号

# 示例输入: "联系*我: user*name@domain#.com"
# 输出: "联系我: user*name@domain.com"
```

---

使用说明：
1. 直接复制代码到 `.py` 文件中（如 `clean_text.py`）。
2. 按需调用函数：
- 删除所有符号：`remove_symbols(text)`
- 处理文件：`process_file("输入文件.txt", "输出文件.txt")`
3. 若需保留特定符号，根据业务逻辑调整正则表达式（如上述邮箱保护示例）。

通过正则表达式 `[*#]`，脚本会精确匹配并删除所有 `*` 和 `#`，满足基础去符号需求。

改进脚本及说明
```python
import re

def remove_excess_symbols(text):
    """
    去除文本中多余的星号（*）和井号（#）。
    
    参数:
    text (str): 输入的文本字符串。
    
    返回:
    str: 处理后的文本字符串。
    """
    # 使用正则表达式去除多余的星号和井号
    cleaned_text = re.sub(r'[\*\#]+', '', text)
    return cleaned_text

def main():
    # 示例文本
    input_text = """
    这是一段包含多余*号和#号的文本。
    例如：***Hello, World!### 这里有很多*号和#号。
    甚至在中间也有*号和#号，比如*#*#*#*#*#。
    """

    # 去除多余的星号和井号
    cleaned_text = remove_excess_symbols(input_text)

    # 输出处理后的文本
    print("处理前的文本：")
    print(input_text)
    print("\n处理后的文本：")
    print(cleaned_text)

if __name__ == "__main__":
    main()
```
脚本说明
1. 导入正则表达式模块：使用 `re` 模块来处理文本中的多余符号。
2. 定义 `remove_excess_symbols` 函数：该函数接受一个字符串参数 `text`，并使用正则表达式 `re.sub` 来替换所有出现的星号（`*`）和井号（`#`），将其替换为空字符串。
3. 主函数 `main`：
- 定义一个示例文本 `input_text`，其中包含多余的星号和井号。
- 调用 `remove_excess_symbols` 函数处理文本。
- 打印处理前后的文本进行对比。

运行脚本
你可以将上述代码保存为一个 `.py` 文件（例如 `remove_symbols.py`），然后在命令行中运行：
```sh
python remove_symbols.py
```
或者直接在 Python 环境中运行这段代码。这样你就可以看到处理前后的文本对比了。

如果你有特定的文本文件需要处理，可以稍微修改脚本来读取文件内容，处理后再写回文件。以下是扩展版本，支持从文件读取和写入：
```python
import re

def remove_excess_symbols(text):
    """
    去除文本中多余的星号（*）和井号（#）。
    
    参数:
    text (str): 输入的文本字符串。
    
    返回:
    str: 处理后的文本字符串。
    """
    # 使用正则表达式去除多余的星号和井号
    cleaned_text = re.sub(r'[\*\#]+', '', text)
    return cleaned_text

def process_file(input_file, output_file):
    """
    读取输入文件，去除多余的星号和井号，然后写入输出文件。
    
    参数:
    input_file (str): 输入文件路径。
    output_file (str): 输出文件路径。
    """
    with open(input_file, 'r', encoding='utf-8') as file:
        input_text = file.read()

    cleaned_text = remove_excess_symbols(input_text)

    with open(output_file, 'w', encoding='utf-8') as file:
        file.write(cleaned_text)

    print(f"处理完成，结果已保存到 {output_file}")

def main():
    # 示例文件路径
    input_file = 'input.txt'
    output_file = 'output.txt'

    # 处理文件
    process_file(input_file, output_file)

if __name__ == "__main__":
    main()
```
在这个版本中，`process_file` 函数会读取指定的输入文件，处理后写入输出文件。你可以根据需要修改 `input_file` 和 `output_file` 的路径。