粘贴板正则匹配特定字符串任务

最新推荐文章于 2024-02-03 20:00:56 发布

feiyy404

最新推荐文章于 2024-02-03 20:00:56 发布

阅读量448

点赞数

分类专栏：读书

本文链接：https://blog.csdn.net/Enjolras_fuu/article/details/100574287

版权

读书专栏收录该内容

11 篇文章 0 订阅

订阅专栏

任务描述

假设你有一个无聊的任务，要在一篇长的网页或文章中，找出所有电话号码和邮件地址。如果手动翻页，可能需要查找很长时间。如果有一个程序，可以在剪贴板的文本中查找电话号码和E-mail地址，那你就只要按一下Ctrl-A选择所有文本，按下Ctrl-C将它复制到剪贴板，然后运行你的程序。它会用找到的电话号码和E-mail地址，替换掉剪贴板中的文本。

构思框架

当你开始接手一个新项目时，很容易想要直接开始写代码。但更多的时候，最好是后退一步，考虑更大的图景。

我建议先草拟高层次的计划，弄清楚程序需要做什么。暂时不要思考真正的代码，稍后再来考虑。现在，先关注大框架。

现在你可以开始思考，如何用代码来完成工作。代码需要做下面的事情：

（1）使用pyperclip模块复制和粘贴字符串。
（2）创建两个正则表达式，一个匹配电话号码，另一个匹配E-mail地址。
（3）对两个正则表达式，找到所有的匹配，而不只是第一次匹配。
（4）将匹配的字符串整理好格式，放在一个字符串中，用于粘贴。
（5）如果文本中没有找到匹配，显示某种消息。

这个列表就像项目的路线图。在编写代码时，可以独立地关注其中的每一步。每一步都很好管理。它的表达方式让你知道在Python中如何去做。

明确我们需要完成什么任务：

（1）从剪贴板取得文本
（2）找出文本中的所有电话号码和 E-mail 地址
（3）将它们粘贴到剪贴板

完成最小的步骤

# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard.
import pyperclip, re

phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))?             # area code
    (\s|-|\.)?                     # separator
    (\d{3})                        # first 3 digits
    (\s|-|\.)                      # separator
    (\d{4})                        # last 4 digits
    (\s*(ext|x|ext.)\s*(\d{2,5}))?     # extension
    )''', re.VERBOSE)

# TODO: Create email regex.

# TODO: Find matches in clipboard text.

# TODO: Copy results to the clipboard.

TODO 注释仅仅是程序的框架，当我们编写真正的代码时，它们会被替换掉。

接下来我们来完成 e-maill 的匹配，

# Create email regex.
emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-]+      # username
    @                      # @ symbol
    [a-zA-Z0-9.-]+         # domain name
　    (\.[a-zA-Z]{2,4})    # dot-something
　   )''', re.VERBOSE)

然后就是剪切下的文本中找到所有的匹配：

# Find matches in clipboard text.
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
    phoneNum = '-'.join([groups[1], groups[3], groups[5]])
    if groups[8] != '':
        phoneNum += ' x' + groups[8]
    matches.append(phoneNum)
for groups in emailRegex.findall(text):
    matches.append(groups[0])

最后就是讲所有的匹配连接成一个字符串，复制到剪贴板中：

# Copy results to the clipboard.
if len(matches) > 0:
    pyperclip.copy('\n'.join(matches))
    print('Copied to clipboard:')
    print('\n'.join(matches))
else:
    print('No phone numbers or email addresses found.')

完整的代码:

# 任务描述

# 构思框架

# 第一步
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard.
import sys

import pyperclip, re

phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))?             # area code
    (\s|-|\.)?                     # separator
    (\d{3})                        # first 3 digits
    (\s|-|\.)                      # separator
    (\d{4})                        # last 4 digits
    (\s*(ext|x|ext.)\s*(\d{2,5}))?     # extension
    )''', re.VERBOSE)

# 完成一个邮箱的正则匹配
# emailRegex = re.compile(r'''(
#     [a-zA-Z0-9._%+-]+      # username
#     @                      # @ symbol
#     [a-zA-Z0-9.-]+         # domain name
# 　   (\.[a-zA-Z]{2,4})       # dot-something
# 　   )''', re.VERBOSE)
# print(emailRegex)

emailRegex = re.compile("([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,4}))")
print(emailRegex)

# Find matches in clipboard text.
text = str(pyperclip.paste())

matches = []
for groups in phoneRegex.findall(text):
    phoneNum = '-'.join([groups[1], groups[3], groups[5]])
    if groups[8] != '':
        phoneNum += ' x' + groups[8]
    matches.append(phoneNum)

for groups in emailRegex.findall(text):
    matches.append(groups[0])

# Copy results to the clipboard.
if len(matches) > 0:
    pyperclip.copy('\n'.join(matches))
    print('Copied to clipboard:')
    print('\n'.join(matches))
else:
    print('No phone numbers or email addresses found.')

"""
info@nostarch.com
media@nostarch.com
academic@nostarch.com
help@nostarch.com
ruiyang0715@gmail.com
"""