Python正则表达式1.3：正则表达式和Python(上）

最新推荐文章于 2024-07-17 09:15:39 发布

中文过六级再取名

最新推荐文章于 2024-07-17 09:15:39 发布

阅读量460

点赞数 1

分类专栏： Python核心编程文章标签： python 正则表达式

本文链接：https://blog.csdn.net/w666667/article/details/103751177

版权

Python核心编程专栏收录该内容

8 篇文章 0 订阅

订阅专栏

Python正则表达式1.3

正则表达式和Python

正则表达式和Python

在Python中，通过使用re模块来支持正则表达式。re模块支持更强大而且更加通用的Perl风格的正则表达式，且该模块允许多线程共享同一个已编辑的正则表达式对象，也支持命名子组。

编译正则表达式

在模式匹配发生前，正则表达式模式必须编译成正则表达式对象。但由于在执行的过程中将进行多次比较操作，因此最好先预编译。
所以大致步骤如下：
正则表达式—（预编译）—>—（编译）—>正则表达式对象

compile()函数

在执行编译的时候，使用compile()函数。
功能如下：

compile(pattern,flags=0)
# 使用任何可选的标记来编译正则表达式的模式，然后返回一个正则表达式对象

匹配对象以及group()和groups()方法

匹配对象：顾名思义，是调用match()和search()返回的对象。
匹配对象主要有两种方法：
group()
groups()

使用match()方法匹配字符串

match()函数试图从字符串起始部分对模式进行匹配，如果匹配成功，就返回一个匹配对象；如果失败，返回None，匹配对象的group()方法能够用于显示那个成功的匹配。

示例如下：

import re

def re_match():
    content = 'Hello 123 4567 World_This is a Regex Demo'
    result = re.match('^Hello\s\d\d\d\s\d{4}\s\w{10}.*Demo$', content)
    #^符号匹配字符串开头，Hello开头的字符串写过来,
	#\s匹配任意的空白字符的，\d可以匹配任意的数字的
	#后面有4567，写4个\d太麻烦了，d{4}
	#\w匹配字母或者下划线
	#后面一长串字符不想写了。直接用.来全部代替,匹配任意字符
	#* 匹配0个或多个表达式
	#也就是.*可以匹配任意的字符除了换行符
	#用Demo$制定正则表达式的结尾
    print(len(content))
    print(result)
    print(result.group())
    print(result.span())

if __name__ == '__main__':
    re_match()

运行后结果如下：

41
<re.Match object; span=(0, 41), match='Hello 123 4567 World_This is a Regex Demo'>
Hello 123 4567 World_This is a Regex Demo
(0, 41)
>>>

使用search()在一个字符串中查找模式（搜索与匹配的对比）

search()的工作方式与match()完全一致，不同之处在于search()会用到它的字符串参数，会在任意位置对给定正则表达式模式搜索第一次出现的匹配情况。如果搜索到成功的匹配，就会返回一个匹配对象，否则返回None。

示例如下：

import re

def re_search():
    content = 'The food is tasted like foot !'
    result = re.search('foo', content)
    print(len(content))
    print(result)
    print(result.group())
    print(result.span())

if __name__ == '__main__':
    re_search()

结果如下：

30
<re.Match object; span=(4, 7), match='foo'>
foo
(4, 7)
>>>

由此可见，search()函数不但会搜索模式在字符串中第一次出现的位置，而且严格地从字符串从左到右搜索，但是细心的博友可以看到我的本意是想搜索出两个’foo’，但结果只显示了第一个’foo’的位置，那么想要实现搜索多个字符串，就要看接下来的知识点了。

匹配多个字符串

示例如下：

>>> import re
>>> bt = 'bat|bet|bit'
>>> m = re.match(bt,'bat')
>>> if m is not None:
	m.group()

	
'bat' #匹配成功
##############################
>>> m = re.match(bt,'blt')
>>> if m is not None:
	m.group()

	
# 对于'blt'没有匹配
##############################
>>> m = re.match(bt,'He bit me!')
>>> if m is not None:
	m.group()
	
# 匹配失败不能匹配字符串
#################################	
>>> m = re.search(bt,'He bit me!')
>>> if m is not None:
	m.group()

	
'bit'  # 通过搜索查找'bit'
>>>

匹配任何单个字符

接下来，我通过代码展示点号(.)不能匹配一个换行符\n或者非字符，也就是空字符串。

示例1：

import re

anyend = '.end'

def re_match_1():
    m = re.match(anyend,'bend')
    print('-----1------')
    if m is not None:
        print(m.group())
    
def re_match_2():
    m = re.match(anyend,'end')
    print('-----2------')
    if m is not None:
        print(m.group())
    
def re_match_3():
    m = re.match(anyend,'\nend')
    print('-----3------')
    if m is not None:
        print(m.group())
        
def re_search_1():
    m = re.search(anyend,'The end.')
    print('-----4------')
    if m is not None:
        print(m.group())
        
if __name__ == '__main__':
    re_match_1()
    re_match_2()
    re_match_3()
    re_search_1()

执行结果如下：

-----1------
bend
-----2------
-----3------
-----4------
 end
>>>

由此发现：
第一个match()成功进行点号匹配b;
第二个match()因为end前空字符串，故不匹配任何字符；
第三个match()因为end前有换行符\n，故不匹配任何字符；
最后一个search()在搜索中匹配end前的空字符。

示例2：
下面的示例是在正则表达式中搜索一个真正的句点（小数点），而我么通过使用一个反斜线对句点的功能进行转义：

import re

patt314 = '3.14'
pi_patt = '3\.14'


def re_match_1():
    m = re.match(pi_patt,'3.14')
    print('-----1------')
    if m is not None:
        print(m.group())
    
def re_match_2():
    m = re.match(patt314,'3014')
    print('-----2------')
    if m is not None:
        print(m.group())
    
def re_match_3():
    m = re.match(patt314,'3.14')
    print('-----3------')
    if m is not None:
        print(m.group())

if __name__ == '__main__':
    re_match_1()
    re_match_2()
    re_match_3()

执行结果如下：

-----1------
3.14
-----2------
3014
-----3------
3.14
>>>

由此可见：
第一个match()精确匹配字符串；
第二个match()点号匹配0；
第三个match()点号匹配.,表示字面量的点号。

创建字符集([ ])

下面通过一个示例说明对于’|‘的限制比’[]'更为严格。
示例：

import re

def re_match_1():
    m = re.match('[cr][23][dp][o2]','c3po')
    print('-----1------')
    if m is not None:
        print(m.group())
    
def re_match_2():
    m = re.match('[cr][23][dp][o2]','c2do')
    print('-----2------')
    if m is not None:
        print(m.group())
        
def re_match_3():
    m = re.match('r2d2|c2po','c2do')
    print('-----3------')
    if m is not None:
        print(m.group())

def re_match_4():
    m = re.match('r2d2|c2po','r2d2')
    print('-----4------')
    if m is not None:
        print(m.group())
        
if __name__ == '__main__':
    re_match_1()
    re_match_2()
    re_match_3()
    re_match_4()

执行结果如下：

-----1------
c3po
-----2------
c2do
-----3------
-----4------
r2d2
>>>

重复，特殊字符以及分组

正则表达式中最常见的情况包括特殊字符的使用、正则表达式模式的重复出现，以及使用圆括号对匹配模式的各部分进行分组和提取操作。

首先，举一个简单的电子邮箱地址的正则表达式的例子：

\w+\@\w+.com

如果我们想要匹配比这个正则表达式所允许的更多邮箱地址。例如在域名前添加主机名称支持（www.xxx.com），仅仅允许xxx.com作为整个域名等等。
示例代码：

import re
def re_match():
    patt='\w+@(\w+\.)?\w+\.com'
    m=re.match(patt,'nobody@xxx.com')
    print(m.group())

if __name__ == '__main__':
    re_match()

但是仅仅是数字字母字符并不能完全成功匹配，因为电子邮件地址还可能为：xxx-yyy.com，诸如此类域名，都是无法成功才匹配的，对应解决代码如下：

import re
m = re.match('\w\w\w=\d\d\d','abc-123')
if m is not None:
	m.group()

执行结果：'abc-123'

为了使该正则表达式能够提取字母数字字符串和数字，可以使用分组，下例子解决如何使用group()方法访问每个独立的子组以及group()方法以获取一个包含所有匹配子组的元组。

import re


def re_match():
    patt='(\w\w\w)-(\d\d\d)'
    m=re.match(patt,'abc-123')
    for i in range(0,3):    #通过一个for循环打印出数组
        print(m.group(i))
    print(m.groups())    #最后打印出所有数组的集合

if __name__ == '__main__':
    re_match()

执行结果如下：

abc-123
abc
123
('abc', '123')

匹配字符串的起始和结尾以及单词边界

如下显示突出显示表示位置的正则表达式操作符，该操作符更多应用于表示搜索而不是匹配，因为match()总是从字符串开始位置进行匹配的。

import re

def re_search1():
    demo1=('The end.','end. The')
    demo2=('bite the dog','bitethe dog')
    for i in range(0,2):
        m=re.search('^The', demo1[i])
        if m is not None:
            print(m.group(),'|')
    for i in range(0,2):
        m=re.search(r'\bthe',demo2[i])
        if m is not None:
            print(m.group(),'|')
    for i in range(0,2):
        m=re.search(r'\Bthe',demo2[i])
        if m is not None:
            print(m.group(),'|')
if __name__ == '__main__':
    re_search1()

显示结果：

The |
the |
the |
>>>

中文过六级再取名

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Python正则表达式1.3：正则表达式和Python(上）

Python正则表达式1.3正则表达式和Python编译正则表达式compile()函数匹配对象以及group()和groups()方法使用match()方法匹配字符串使用search()在一个字符串中查找模式（搜索与匹配的对比）正则表达式和Python在Python中，通过使用re模块来支持正则表达式。re模块支持更强大而且更加通用的Perl风格的正则表达式，且该模块允许多线程共享同一个已编...
复制链接

扫一扫