re模块compile

最新推荐文章于 2024-09-30 14:16:32 发布

霆曜

最新推荐文章于 2024-09-30 14:16:32 发布

阅读量732

点赞数 21

文章标签： python 爬虫

本文链接：https://blog.csdn.net/c11454345/article/details/142492422

版权

在 Python 的 re 模块中，compile 函数是一个编译正则表达式模式的方法，它将正则表达式编译成一个正则表达式对象。这样做的好处是，如果你需要多次使用同一个正则表达式，那么编译一次然后重复使用这个正则表达式对象会更加高效。

下面是一个使用 re.compile 的例子：

import re #导入re

pattern = re.compile(r'\d+')# 编译一个正则表达式对象
print(pattern)

match使用方法

在 Python 的 re 模块中，match 方法是编译后的正则表达式对象提供的一种方法，用于从字符串的起始位置开始匹配模式。如果模式与字符串的开始部分匹配成功，match 方法返回一个匹配对象；如果匹配失败，则返回 None。

下面是一个match使用方法的例子：

import re

# 编译一个正则表达式对象
pattern = re.compile(r'\d+')

# 使用 search 方法在整个字符串中搜索模式
text = 'abc123def456'
search_obj = pattern.search(text)

if search_obj:
    print('Found a number:', search_obj.group())
else:
    print('No match found')

运行结果为：“Matched number：123”。如果该正则表达式对象中d换成D或者text中没有数字，运行结果为No match found。由于字符串 text 以数字开头，所以匹配成功，并打印出匹配的数字。

search使用方法

在 Python 的 re 模块中，search 方法用于在整个字符串中搜索第一个与正则表达式匹配的子字符串。与 match 方法不同，search 方法不是从字符串的开始位置进行匹配，而是扫描整个字符串直到找到第一个匹配的位置。

下面是一个使用 search 方法的例子：

运行结果为：“Found a number: 123”。使用 search 方法只会找到第一个匹配项并停止搜索。这是因为 search 方法在找到第一个匹配后返回，并且不会继续搜索字符串的其余部分。

find方法使用

在 Python 的 re 模块中，find 方法与 search 方法类似，都用于在字符串中查找正则表达式的匹配项。不同之处在于 find 方法的行为类似于 findall，它会返回字符串中第一个匹配项的列表，如果没有找到匹配项，则返回空列表。

下面是一个使用find方法的例子：

import re

# 编译一个正则表达式对象
pattern = re.compile(r'\d+')

# 使用 findall 方法查找字符串中所有匹配的数字
text = 'abc123def456'
matches = pattern.findall(text)

# 打印所有找到的数字
for match in matches:
    print('Found a number:', match)

运行结果为：“Found a number: 123 Found a number: 456”。相比search，find能均显示。

split方法使用

在 Python 中，split 方法是字符串（str）对象的一个方法，它用于将字符串分割成多个子字符串，并返回一个列表。这个方法通常用于处理字符串，将其分解为更小的片段，特别是当你知道字符串中某些部分的分隔符时。

下面是一个使用split方法的例子：

import re

text = "Hello, World! This is a test string."

# 默认分隔符（空格）
words = text.split()  # 使用 split 方法分割字符串
print(words)  # 输出: ['Hello,', 'World!', 'This', 'is', 'a', 'test', 'string.']

# 指定分隔符（逗号）
# 注意：原字符串中没有逗号，所以分割结果不会改变
parts = text.split(',')
print(parts)  # 输出: ['Hello, World! This is a test string.']

# 限制分割次数
# 注意：split 方法没有 maxsplit 参数，所以这里的逻辑需要调整
# 如果您想要前两个单词，可以这样做：
limited_split = text.split(' ', 2)
print(limited_split)  # 输出: ['Hello,', 'World!', 'This is a test string.']

在这个例子中，第一次调用 split 方法时使用了默认的空白字符分隔符，第二次调用时指定了逗号作为分隔符，第三次调用时限制了分割次数为 2。

sub方法使用

在 Python 的 re 模块中，sub 方法用于替换字符串中的子串。这个方法会搜索字符串，找到所有匹配正则表达式的部分，并将其替换为指定的字符串。

下面是一个使用sub方法的例子：

import re

# 原始字符串
text = "Hello, World! This is a test string."

# 替换所有数字为下划线
new_text = re.sub(r'\d', '_', text)
print(new_text)  # 输出: "Hello, World! This is a test str_ng."

# 替换所有空格为逗号
new_text = re.sub(r'\s+', ',', text)
print(new_text)  # 输出: "Hello,World!This,is,a,test,string."

# 替换前两个匹配项
new_text = re.sub(r'\w', '-', 2)
print(new_text)  # 输出: "H-e-llo,-W-orld!-T-his-is-a-t-est-s-tr-ing."

# 使用函数进行替换
def replace_with_index(match):
    return f"Word{match.start() + 1}"

new_text = re.sub(r'\w+', replace_with_index, text)
print(new_text)  # 输出: "Word1,Word2,Word3,Word4,Word5,Word6,Word7."