正则表达式

学python的新人

已于 2023-12-07 03:37:12 修改

阅读量27

点赞数

文章标签： mysql java 数据库

于 2023-12-07 03:18:23 首次发布

本文链接：https://blog.csdn.net/2301_79224207/article/details/134844711

版权

1、案例

import re #正则表达式库

content = """    
买了5斤白菜花了2块    
买了2斤葡萄花了20元   
买了3斤苹果花了15元    
买了33斤茄子花了12.3元    
"""

# 去除字符串前后可能存在的空格
content = content.strip()

for line in content.split('\n'):
    pattern = r"(\d+)斤(.*?)花了(\d+(\.\d+)?)"
    # #正则表达式的应用
    # r"(\d+)斤(.*?)花了(\d+(\.\d+)?)元"
    # (\d+)匹配一个或多个数字，用于斤数。
    # (.* ?)使用非贪婪模式匹配除了换行符之外的所有字符，用于商品名称。
    # (\d+(\.\d+)?) 匹配一个或多个数字，可能后面跟着一个小数点和一个或多个数字，用于价格。
    #？表示不确定有多少个数据


    match = re.search(pattern, line)
    if match:
        print(f'{match.group(1)}\t{match.group(2)}\t{match.group(3)}')
    else:
        print('no')

import re
text = "JGood is a handsome boy, he is cool, clever, and so on..."

regex = re.compile(r'\w*oo\w*') #查找所有包含'oo'的单词
print(regex.findall(text))
print(regex.sub(lambda m: '[' + m.group(0) + ']', text)) #将字符串中含有'oo'的单词用[]括起来

pattern=r'\w*oo\w*'
print(re.findall(pattern,text))

2 语法

2.1 查找字符串 match()、search()、compile()

re.match(pattern, string, flags=0)   # 只匹配字符串的开始位置
re.search(pattern, string, flags=0)  # 匹配整个字符串，找到第一个匹配
re.compile(pattern[, flags])         # 编译正则表达式，供 ↑ 使用

# pattern：正则表达式模式
# string：要匹配的字符串
# flags=0：标志位。参见："正则表达式修饰符 - 标志位"

示例：None 中没有 .span() 故报错

import re

# .span(): 位置   .group(): 值
m1 = re.match('www', 'www.baidu.com').span()
m2 = re.match('com', 'www.baidu.com')

s1 = re.search('www', 'www.baidu.com').span()
s2 = re.search('com', 'www.baidu.com').span()

patter_c = re.compile(r'\d+')  # 至少匹配一个数字
c1 = patter_c.match('阿梦520', 2, 10)  # 从 2 开始，10 结束

print(m1)  # (0, 3)
print(m2)  # None
print(s1)  # (0, 3)
print(s2)  # (10, 13)
print(c1)  # <re.Match object; span=(2, 5), match='520'>

2.2 查找字符串 findall()、finditer()

# 请注意 match 和 search 是匹配一次，findall 匹配所有
re.findall(string[, pos[, endpos]])  # return list
re.finditer(pattern, string, flags=0)  # 同 findall，return 迭代器

# string: 原字符串
# pos：开始位置，默认 0 
# endpos：结束位置，默认 字符串的长度

示例：

import re

f1 = re.findall(r'\d+', '阿梦520 - 1314', 2)
f2 = re.finditer(r'\d+', '阿梦520 - 1314', 2)

print(f1)

for match in f2:
    print(match.group())

测试结果：

['520', '1314']
520
1314

2.3 替换字符串 sub()

re.sub(pattern, repl, string, count=0, flags=0)

# pattern：正则表达式模式
# repl：替换后的字符串
# string：原字符串
# count=0：匹配次数，0：替换所有的匹配
# flags=0：标志位

import re

str = '阿梦：151-2345-6789'

# 将任意非数字替换为空
s1 = re.sub('\D+', '', str)
print(s1)  # 15123456789

2.4 拆分字符串 split()

re.split(pattern, string[, maxsplit=0, flags=0])

# pattern：正则表达式模式
# string：原字符串
# maxsplit：分割次数。默认 0：不限制
# flags：标志位

示例：

import re

# 按 非字符、数字、下划线 拆分字符串
s1 = re.split(r'\W+', '阿梦,520,阿优')

print(s1)  # ['阿梦', '520', '阿优']