python------正则表达式基础知识

最新推荐文章于 2020-11-20 15:36:17 发布

伪装的TA

最新推荐文章于 2020-11-20 15:36:17 发布

阅读量191

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/qq_42543301/article/details/81141420

版权

Python 专栏收录该内容

27 篇文章 0 订阅

订阅专栏

一、re.match函数

#引入re包
import re
'''
re.match函数
原型;match(pattern,string,flags=0)
pattern:匹配的正则表达式
string:要匹配的字符串
flags:标志位，用于控制正则表达式地匹配方式
re.I 忽略大小写
re.L 做本地话识别
re.M 多行匹配，影响^和$
re.S 是.匹配包括换行符在内的所有字符
re.U 根据Unicode字符集解析字符，影响\w \W \b \B
re.X 使我们以更灵活的格式理解正则表达式
功能：尝试从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话，也会返回None
'''
#www.baidu.com
print(re.match("www","www.baidu.com"))
print(re.match("www","ww.baidu.com"))
print(re.match("www","baidu.wwwcom"))
print(re.match("www","wwW.baidu.com"))
print(re.match("www","wwW.baidu.com",flags=re.I))
#扫描整个字符串，返回从起始位置成功的匹配
print('-----------------------------------------')
print(re.match("www","www.baidu.com").span())
print(re.match("www","wwW.baidu.com",flags=re.I))

运行：

<_sre.SRE_Match object; span=(0, 3), match='www'>
None
None
None
<_sre.SRE_Match object; span=(0, 3), match='wwW'>
-----------------------------------------
(0, 3)
<_sre.SRE_Match object; span=(0, 3), match='wwW'>

二、re.search函数

'''
re.search函数
原型;search(pattern,string,flags=0)
参数：
pattern:匹配的正则表达式
string:要匹配的字符串
flags:标志位，用于控制正则表达式地匹配方式
功能：扫描整个字符串，并返回第一个匹配成功的

'''
print(re.search("tom","good man is tom!tom is nice "))

运行：

<_sre.SRE_Match object; span=(12, 15), match='tom'>

三、re.findall()函数

re.findall()函数
原型;findall(pattern,string,flags=0)
参数：
pattern:匹配的正则表达式
string:要匹配的字符串
flags:标志位，用于控制正则表达式地匹配方式
功能：扫描整个字符串，并返回结果列表

'''
print(re.findall("tom","good man is tom!Tom is nice ",flags=re.I))

运行：

['tom', 'Tom']

四、匹配单个字符和数字

print('------------匹配单个字符与数字-------------')
'''
.               匹配除换行符以外的任意字符
[0123456789]    []是字符集合，表示匹配方括号中所包含的任意一个字符
[tom]           匹配't','o','m'中任意一个字符
[a-z]           匹配任意小写字母
[A-Z]           匹配任意大写字母
[0-9]           匹配任意数字，类似[0123456789]
[0-9a-zA-Z]     匹配任意的数字和字母
[0-9a-zA-Z_]    匹配任意数字、字母和下划线
[^tom]          匹配除了tom这几个字母以为的所有字符
                中括号里的^称为脱字符，表示不匹配集合中的字符      
[^0-9]          匹配所有的非数字字符   
\d              匹配数字，效果同[0-9]
\D              匹配非数字字符，效果同[^0-9]
\w              匹配数字，字母和下划线，效果同[0-9a-zA-Z]
\W              匹配非数字，字母和下划线，效果同[^0-9a-zA-Z]
\s              匹配任意的空白符（空格，换行，回车，换页，制表），效果同[ \f\n\r\t]
\S              匹配任意的非空白符，效果同[^\f\n\r\t]
    
'''
print(re.search("[0123456789]","tom is a good man 8"))
print(re.findall("[^0-9]","tom is a good man 8"))
print(re.findall("\d","tom is a good man 8"))
print(re.findall("\D","tom is a good man 8"))
print(re.findall("\w","_tom is a good man 8"))
print(re.findall("\W","_tom is a good man 8 /"))
print(re.findall("\d","_tom is 6a go8od man 8"))

运行：

------------匹配单个字符与数字-------------
<_sre.SRE_Match object; span=(18, 19), match='8'>
['t', 'o', 'm', ' ', 'i', 's', ' ', 'a', ' ', 'g', 'o', 'o', 'd', ' ', 'm', 'a', 'n', ' ']
['8']
['t', 'o', 'm', ' ', 'i', 's', ' ', 'a', ' ', 'g', 'o', 'o', 'd', ' ', 'm', 'a', 'n', ' ']
['_', 't', 'o', 'm', 'i', 's', 'a', 'g', 'o', 'o', 'd', 'm', 'a', 'n', '8']
[' ', ' ', ' ', ' ', ' ', ' ', '/']
['6', '8', '8']

五、锚字符

print('---------------锚字符（边界字符）--------------------')

'''
^           行首匹配，和在[]里的不是一个意思
$           行位匹配（每一行的行首都匹配）
\A          匹配字符串开始，和^的区别是，"\A"只匹配整个字符串的开头，即使在re.M模式下也不匹配他行的行首
\Z          匹配字符串结束，和$的区别是，"\Z"只匹配整个字符串的结束，即使在re.M模式下也不匹配他行的行尾
\b          匹配一个单词的边界，也就是指单词和空格间的位置
            'er\b'可以匹配never,不能匹配nerve
\B          匹配非单词边界
'''
print(re.search("^tom","tom is a good man"))
print(re.search("man$","tom is a good man"))

print(re.findall("\Atom","tom is a good man\ntom is a nice man",re.M))
print(re.findall("^tom","tom is a good man\ntom is a nice man",re.M))

print(re.findall("man$","tom is a good man\ntom is a nice man",re.M))
print(re.findall("man\Z","tom is a good man\ntom is a nice man",re.M))

print(re.search(r"er\b","never"))
print(re.search(r"er\b","nerve"))
print(re.search(r"er\B","never"))
print(re.search(r"er\B","nerve"))

运行：

---------------锚字符（边界字符）--------------------
<_sre.SRE_Match object; span=(0, 3), match='tom'>
<_sre.SRE_Match object; span=(14, 17), match='man'>
['tom']
['tom', 'tom']
['man', 'man']
['man']
<_sre.SRE_Match object; span=(3, 5), match='er'>
None
None
<_sre.SRE_Match object; span=(1, 3), match='er'>

六、匹配多字符

print('----------------------匹配多个字符--------------')
'''
说明：下方的x,y,z均为假设的普通字符，n,m(为非负整数)非正则表达式的元字符
(xyz)    匹配小括号内的xyz(作为一个整体去匹配)
x?       匹配0个或者一个x
x*       匹配0个或者任意多个x（.*表示匹配0个或者任意多个字符（换行符除外））
x+       匹配至少一个x
x{n}     匹配确定的n个x(n是一个非负整数)
x{n,}    匹配至少n个x
x{n,m}   匹配至少n个最多m个x，注意：n<=m
x|y      | 表示或，匹配的是x或y
'''
print(re.findall(r"(tom)","tomgood is a good man,tom is a nice man"))
print(re.findall(r"o?","tomgood is a good man,tom is a nice man"))
print(re.findall(r"a*","aaa"))#非贪婪模式(尽可能少的匹配)
print(re.findall(r"a*","aaabaa"))#贪婪模式(尽可能多的匹配)
print(re.findall(r"a+","aaa"))
print(re.findall(r"a+","aaabaa"))#贪婪模式尽(可能多的匹配)
print("&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&")
print(re.findall(r"a{3}","aaa"))
print(re.findall(r"a{3}","aa"))
print(re.findall(r"a{3}","aaaa"))
print(re.findall(r"a{3}","aaaaabaaa"))
print('@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@')
print(re.findall(r"a{3,}","aaaaa"))
print(re.findall(r"a{3,}","aaaabaaaaaa"))#贪婪模式尽(可能多的匹配)
print('#################################################')
print(re.findall(r"a{3,6}","aaaaaa"))
print(re.findall(r"a{3,6}","aaaaaabaaa"))
print(re.findall(r"((t|T)unck)","tom--Tom"))
#需求：提取tom......man
str = "tom is a good man!tom is a nice man!tom is a very good man"
print(re.findall(r"^tom.*man$",str))
print(re.findall(r"tom.*?man",str))
print(re.findall(r"^tom.*?man",str))

运行：

['tom', 'tom']
['', 'o', '', '', 'o', 'o', '', '', '', '', '', '', '', '', 'o', 'o', '', '', '', '', '', '', '', 'o', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
['aaa', '']
['aaa', '', 'aa', '']
['aaa']
['aaa', 'aa']
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
['aaa']
[]
['aaa']
['aaa', 'aaa']
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
['aaaaa']
['aaaa', 'aaaaaa']
#################################################
['aaaaaa']
['aaaaaa', 'aaa']
[]
['tom is a good man!tom is a nice man!tom is a very good man']
['tom is a good man', 'tom is a nice man', 'tom is a very good man']
['tom is a good man']

伪装的TA

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python------正则表达式基础知识

一、re.match函数#引入re包import re'''re.match函数原型;match(pattern,string,flags=0)pattern:匹配的正则表达式string:要匹配的字符串flags:标志位，用于控制正则表达式地匹配方式re.I 忽略大小写re.L 做本地话识别re.M 多行匹配，影响^和$re.S 是.匹配包括换行符在内的所有字符re...
复制链接

扫一扫

专栏目录