正则表达式re模块

最新推荐文章于 2022-03-11 15:30:40 发布

qq_42052864

最新推荐文章于 2022-03-11 15:30:40 发布

阅读量250

点赞数

分类专栏：正则表达式

本文链接：https://blog.csdn.net/qq_42052864/article/details/104102126

版权

正则表达式专栏收录该内容

2 篇文章 0 订阅

订阅专栏

正则表达式匹配规则

re模块使用步骤：

使用compile()函数将正则表达式的字符串形式编译为一Pattern对象
通过Pattern 对象提供的一系列方法对文本进行匹配查找,获得匹配结果,一个match对象
最后使用Match对象提供的属性和方法获得信息,根据需要进行其他操作

import re

pattern=re.compile(r'([a-z]+) ([a-z]+)',re.I) # re.I 忽略大小写
m=pattern.match('Hello World Wide Web')

m.group(0)
#'Hello World'
m.span(0)
#(0,11)

m.group(1)
#'Hello'
m.group(2)
#'World'
m.groups()
#('Hello','World')

pattern=re.compile('\d+')
m=pattern.search('one12twothree34four')
m.group()
#'12'
m=pattern.search('one12twothree34four',10,20)
m.group()
#'34'
m.span()
#(13,15)

result1 = pattern.findall('hello 123456 789')
result2 = pattern.findall('one1two2three3four4', 0, 10)
print (result1)
print (result2)

#['123456', '789']
#['1', '2']

p = re.compile(r'[\s\,\;]+')
print (p.split('a,b;; c   d'))

#['a', 'b', 'c', 'd']



p = re.compile(r'(\w+) (\w+)') # \w = [A-Za-z0-9]
s = 'hello 123, hello 456'

print (p.sub(r'hello world', s))  # 使用 'hello world' 替换 'hello 123' 和 'hello 456'
print (p.sub(r'\2 \1', s))        # 引用分组

#hello world, hello world
#123 hello, 456 hello


def func(m):
    print(m)
    return 'hi' + ' ' + m.group(2) #group(0) 表示本身，group(1)表示hello，group(2) 表示后面的数字

print (p.sub(func, s))  #多次sub，每次sub的结果传递给func
print (p.sub(func, s, 1))         # 最多替换一次

#hi 123, hi 456
#hi 123, hello 456

#中文的 unicode 编码范围 主要在 [u4e00-u9fa5]

title = '你好，hello，世界'
pattern = re.compile(r'[\u4e00-\u9fa5]+')
result = pattern.findall(title)

print (result)

#['你好', '世界']

注意：贪婪模式与非贪婪模式

贪婪模式：在整个表达式匹配成功的前提下，尽可能多的匹配 ( * )；
非贪婪模式：在整个表达式匹配成功的前提下，尽可能少的匹配 ( ? )；
Python里数量词默认是贪婪的。

例子添加

import re

# 1.提取url
s ='<a href="https://geekori.com">极客起源</a> <a href="https://www.microsoft.com">微软</a>'
result = re.findall('<a[^>]*href="([^>]*)">',s,re.I)
print(result) # ['https://geekori.com', 'https://www.microsoft.com']


# 2.提取手机号
m = re.search('1\d{10}','我的手机号是：18612345678')
if m is not None:
    print(m.group()) # 18612345678
    print(m.start()) # 7
    print(m.end()) # 18
    
    
# 3.提取浮点数（包括负数）
'''
1. 表示浮点数的正则表达式  -?\d+(\.\d+)?
2. 格式化浮点数 format
3. 如何替换原来的浮点数  sub   subn
'''
def fun(matched):
    return format(float(matched.group()),'0.2f')
result = re.subn('-?\d+(\.\d+)?',fun,'PI is 3.141592654, e is 2.71828183.  -0.2 + 1.3 = 1.1')
print(result) # ('PI is 3.14, e is 2.72.  -0.20 + 1.30 = 1.10', 5)
print(result[0]) # PI is 3.14, e is 2.72.  -0.20 + 1.30 = 1.10
print(result[1]) #5


# 4.提取email地址
s = '我的Email地址是abcd@163.com，你的Email是多少呢？是xyz@122.net吗？ 或者是ccc@125.org'
prefix = '[0-9a-zA-Z]+@[0-9a-zA-Z]+\.'
result = re.findall(prefix + 'com|' + prefix + 'net',s,re.I)
print(result)


# 5.提取日期
s = 'Today is 2013-12-01.'
m = re.match('.*\d{4}-\d{2}-\d{2}.*',s)

if m is not None:
    print(m.group())

ps：sub和subn都用于替换字符串中所有符合条件的子字符串，但sub函数只返回替换后的结果，而subn返回一个元组，元组的第1个元素返回替换后的结果，第2个元素返回替换的次数。

qq_42052864

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
正则表达式re模块

正则表达式匹配规则re模块使用步骤：使用compile()函数将正则表达式的字符串形式编译为一Pattern对象通过Pattern 对象提供的一系列方法对文本进行匹配查找,获得匹配结果,一个match对象最后使用Match对象提供的属性和方法获得信息,根据需要进行其他操作import repattern=re.compile(r'([a-z]+) ([a-z]+)',r...
复制链接

扫一扫

专栏目录