python语言中的正则表达式

最新推荐文章于 2023-05-23 11:18:42 发布

Xavier丶Zeng

最新推荐文章于 2023-05-23 11:18:42 发布

阅读量351

点赞数

分类专栏：正则表达式文章标签：正则表达式 python

本文链接：https://blog.csdn.net/qq583083658/article/details/87968456

版权

正则表达式专栏收录该内容

2 篇文章 0 订阅

订阅专栏

python语言中的正则表达式常用函数

序号	常用函数	意义
1	re.match(‘regex’, content)	按正则表达式regex规则，从头开始匹配文本content
2	result.group(number)	匹配子表达式，number表示子表达式在正则表达式中的位置
3	result.span()	输出结果在正则表达式中的索引范围
4	修饰符re.S，re.I等	re.S使.匹配包括换行符在内的所有字符，re.I忽略正则表达式中字母的大小写
5	re.search(‘regex’, content, re.S)	不是从头开始时匹配，而是扫描整个字符串，返回第一个成功匹配的结果
6	re.compile(‘regex’))	返回一个正则表达式对象，以在后面继续复用该对象

1. re.match(‘regex’, content)

按正则表达式regex规则，从头开始匹配文本content，并返回成功匹配的内容。

import re
content = 'Hello 123 4567 World_This is a Regex Demo'
result = re.match('^Hello\s(\d{3}\s\d{4})\sworld', content, re.I)
print(result)
print(result.group())
print(result.group(1))
print(result.span())

输出结果：

<_sre.SRE_Match object; span=(0, 20), match='Hello 123 4567 World'>
Hello 123 4567 World
123 4567
(0, 20)

2. re.search(‘regex’, content, re.S)

不是从头开始时匹配，而是扫描整个字符串，返回第一个成功匹配的结果

import re
content = '''http://www.forta.com/  
https://mail.forta.com/ 
ftp://ftp.forta.com/'''
result = re.search('.+(?=:)', content)
print(result)
print(result.group())
print(result.span())

输出结果：

<_sre.SRE_Match object; span=(0, 4), match='http'>
http
(0, 4)

3. re.findall(‘regex’, content, re.S)

扫描整个字符串，返回一个列表，包含所有成功匹配的结果。


import re
content = '''
<BODY>
<H1>Welcomme to my Homepage</H1>
Content is decided into two sections:<BR>
<H2>ColdFusion</H2>
Information about Macromedia ColdFusion.
<H2>Wireless</H2>
Information about Bluetooth, 802.11, and more.
<H2>This is not valid HTML</H3>
</BODY>
'''
result = re.findall('<([Hh][1-6])>(.*?)</([Hh][1-6])>', content)
if result:    
    print(result)

for elment in result:
    print(elment)

输出结果：

[('H1', 'Welcomme to my Homepage', 'H1'), ('H2', 'ColdFusion', 'H2'), ('H2', 'Wireless', 'H2'), ('H2', 'This is not valid HTML', 'H3')]
('H1', 'Welcomme to my Homepage', 'H1')
('H2', 'ColdFusion', 'H2')
('H2', 'Wireless', 'H2')
('H2', 'This is not valid HTML', 'H3')

4. re.sub(‘regex’, ‘replace-string’, content)

替换匹配到的文本。

import re
content = '''Hello, ben@forta.com is my  email address'''

result = re.sub('(\w+[\w\.]*@[\w\.]+\.\w+)', '<A HREF="MAILTO:$1">$1</A>', content)
print(result)

输出结果：

Hello, <A HREF="MAILTO:$1">$1</A> is my  email address

re.sub不能使用正则表达式中的回溯引用来替换原文本，只能简单替换。解决方案：将替换文本里面的回溯引用$1替换为搜索到的字符串。如下：

import re
content = '''Hello, ben@forta.com is my  email address'''
searchResult = re.search('(\w+[\w\.]*@[\w\.]+\.\w+)', content)
subResult = re.sub('(\w+[\w\.]*@[\w\.]+\.\w+)', '<A HREF="mailto:'+searchResult.group()+">"+searchResult.group()+'"</A>', content)
print(subResult)

输出结果：

Hello, <A HREF="mailto:ben@forta.com>ben@forta.com"</A> is my  email address

5. re.compile(‘regex’)

返回一个正则表达式对象，以在后面继续复用该对象。


import re
content1 = '''2016-12-15 12:00'''
content2 = '''2016-12-17 12:55'''
content3 = '''2016-12-22 13:21'''
pattern = re.compile('\d{2}:\d{2}')
result1 = re.sub(pattern, '', content1)
result2 = re.sub(pattern, '', content2)
result3 = re.sub(pattern, '', content3)
print(result1, result2, result3)