python正则表达式简单例子
写这篇文章的原因是希望给和我一样在学正则表达式的同学全部正则表达式模式的一个简单例子,以及方便我日后回顾复习。因此我并不打算介绍正则表达式的任何模式,只是给出所以模式的一个简单例子。请多多指教。
需要正则表达式全面知识点的同学可以直接参考:菜鸟教程python3正则表达式
为了偷懒,我这里之间写了一段测试代码,欢迎吐槽:
import re
class CPracticeRe(object):
def __init__(self,sPattern,sStr,iFlag = 0,sReplace = ''):
self.m_Pattern = sPattern
self.m_Str = sStr
self.m_Flag = iFlag
self.m_Replace = sReplace
def PractiseAll(self):
print("re.match:",re.match(self.m_Pattern,self.m_Str,self.m_Flag) )
print("re.search:",re.search(self.m_Pattern,self.m_Str,self.m_Flag))
print("re.sub:",re.sub(self.m_Pattern,self.m_Replace,self.m_Str))
print("re.findall:",re.findall(self.m_Pattern,self.m_Str,self.m_Flag))
print("re.finditer:",re.finditer(self.m_Pattern,self.m_Str,self.m_Flag))
print("re.split:",re.split(self.m_Pattern,self.m_Str,self.m_Flag))
def PracticeRe(sPattern,sStr,iFlag = 0):
oPracticeRe = CPracticeRe(sPattern,sStr,iFlag)
oPracticeRe.PractiseAll()
PracticeRe("\0l","dw\nwe a2\t>d")
以下是正则表达式各种模式(pattern)的案例:
1,^ 匹配字符串开头
PracticeRe("^d,","d,w\n4.we a2\t>d")
#结果:
re.match: <re.Match object; span=(0, 2), match='d,'>
re.search: <re.Match object; span=(0, 2), match='d,'>
re.sub: w
4.we a2 >d
re.findall: ['d,']
re.finditer: <callable_iterator object at 0x000002A740D2AB50>
re.split: ['', 'w\n4.we a2\t>d']
2,$匹配字符串结尾
PracticeRe("\t>d$","d,w\n4.we a2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(11, 14), match='\t>d'>
re.sub: d,w
4.we a2
re.findall: ['\t>d']
re.finditer: <callable_iterator object at 0x000002B14548AB50>
re.split: ['d,w\n4.we a2', '']
3,匹配任意字符,除了换行
PracticeRe(".w","d,w\n4.we a2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(1, 3), match=',w'>
re.sub: d
4e a2 >d
re.findall: [',w', '.w']
re.finditer: <callable_iterator object at 0x000001B86E49AB50>
re.split: ['d', '\n4', 'e a2\t>d']
4,[…] 表示一组字符串。:[amk]匹配a,m或k
容易理解,不说。
5,^ …匹配不在[]内的字符。:^ abc匹配a,b,c意外的字符
容易理解,不说。
6,re*匹配0或以上个字符re。re为任意字符。
PracticeRe("ww*","d,awww\n4.awe a2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(3, 6), match='www'>
re.sub: d,a
4.ae a2 >d
re.findall: ['www', 'w']
re.finditer: <callable_iterator object at 0x000001C1A38DAB50>
re.split: ['d,a', '\n4.a', 'e a2\t>d']
7,re+匹配1或以上个字符re。re为任意字符。
效果跟re*相似,偷懒不展示了。
8,re?匹配1个或者0个由前面正则表达式定义的片段。有则1无则0
PracticeRe("aw?","d,awww\n4.awe a2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 4), match='aw'>
re.sub: d,ww
4.e 2 >d
re.findall: ['aw', 'aw', 'a']
re.finditer: <callable_iterator object at 0x0000020FAAD8AB50>
re.split: ['d,', 'ww\n4.', 'e ', '2\t>d']
9,re{n}精确匹配n个re
PracticeRe("aw{2}","d,awww\n4.awe a2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 5), match='aww'>
re.sub: d,w
4.awe a2 >d
re.findall: ['aww']
re.finditer: <callable_iterator object at 0x000002833F47AB50>
re.split: ['d,', 'w\n4.awe a2\t>d']
10,re{n,}匹配n+个re
PracticeRe("aw{2,}","d,awww\n4.awe a2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 6), match='awww'>
re.sub: d,
4.awe a2 >d
re.findall: ['awww']
re.finditer: <callable_iterator object at 0x00000243787BAB50>
re.split: ['d,', '\n4.awe a2\t>d']
11,re{n,m}匹配n~m个re
不展示了,参考前面两个自己尝试一下啊
12,a|b 匹配a或b
PracticeRe("aw|2","d,awww\n4.awe a2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 4), match='aw'>
re.sub: d,ww
4.e a >d
re.findall: ['aw', 'aw', '2']
re.finditer: <callable_iterator object at 0x000002A57E17AB50>
re.split: ['d,', 'ww\n4.', 'e a', '\t>d']
13,(re)对正则表达式分组并记住匹配的文本
这里我不太懂什么叫分组,但是切片split的时候好像不太一样,请大佬指点。
PracticeRe("(aw|2)","d,awww\n4.awe a2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 4), match='aw'>
re.sub: d,ww
4.e a >d
re.findall: ['aw', 'aw', '2']
re.finditer: <callable_iterator object at 0x000001AD1619AB50>
re.split: ['d,', 'aw', 'ww\n4.', 'aw', 'e a', '2', '\t>d']
14,(?imx),(?-imx)
我还没找到用法,请了解的大佬指点一下
15,(?:re)和(…)类似,但不表示一个分组
PracticeRe("(?:aw|2)","d,awww\n4.awe a2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 4), match='aw'>
re.sub: d,ww
4.e a >d
re.findall: ['aw', 'aw', '2']
re.finditer: <callable_iterator object at 0x000001F71EABAB50>
re.split: ['d,', 'ww\n4.', 'e a', '\t>d']
16,(?imx:re)在括号中使用IMX标志(如果flag为空则IMX全有效,否则只有flag的内容有效)
PracticeRe("(?imx:Aw|2)","""d,awww\n4.awe
a2\t>d""")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 4), match='aw'>
re.sub: d,ww
4.e
a >d
re.findall: ['aw', 'aw', '2']
re.finditer: <callable_iterator object at 0x000001E1C682AB50>
re.split: ['d,', 'ww\n4.', 'e\na', '\t>d']
PracticeRe("(?imx:Aw|2)","""d,awww\n4.awe
a2\t>d""",re.I)
#结果:
re.match: None
re.search: <re.Match object; span=(2, 4), match='aw'>
re.sub: d,ww
4.e
a >d
re.findall: ['aw', 'aw', '2']
re.finditer: <callable_iterator object at 0x0000016788EF44F0>
re.split: ['d,', 'ww\n4.', 'e\na2\t>d']
17,(?-imx:)在括号内使IMX失效。(字面上意思,别钻牛角尖)
这里自己尝试一下。
18,(?=)前向界定符。例如/d(?=def):只匹配def前面的数字
PracticeRe("\d(?=\t)","d,awww\n4.awea2\t>d")
#结果:别问,问就是2
re.match: None
re.search: <re.Match object; span=(13, 14), match='2'>
re.sub: d,awww
4.awea >d
re.findall: ['2']
re.finditer: <callable_iterator object at 0x000001A33F40AB50>
re.split: ['d,awww\n4.awea', '\t>d']
19,(?!re)向前否定界定符。不在当前位置时 匹配时。成功。
自己去尝试。
20,(?>re)匹配的独立模式,省去回溯。什么?不知道什么是独立模式?看:点击一下啦
PracticeRe("(?>aw{0,4}w)","d,awww\n4.awea2\t>d")#报错了,不展示PracticeRe("(aw{0,4}w)","d,awww\n4.awea2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 6), match='awww'>
re.sub: d,
4.ea2 >d
re.findall: ['awww', 'aw']
re.finditer: <callable_iterator object at 0x000002490BABAB50>
re.split: ['d,', 'awww', '\n4.', 'aw', 'ea2\t>d']
21,\w匹配数字字母下划线
PracticeRe("a\w","d,awwwa\n4.awea2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 4), match='aw'>
re.sub: d,wwa
4.e >d
re.findall: ['aw', 'aw', 'a2']
re.finditer: <callable_iterator object at 0x000001E8A6AEAB50>
re.split: ['d,', 'wwa\n4.', 'e', '\t>d']
22,\W匹配非数字字母下划线
PracticeRe("a\W","d,awwwa\n4.awea2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(6, 8), match='a\n'>
re.sub: d,awww4.awea2 >d
re.findall: ['a\n']
re.finditer: <callable_iterator object at 0x00000209A7EEAB50>
re.split: ['d,awww', '4.awea2\t>d']
23,\s匹配任意空白字符
PracticeRe("a\s","d,awwwa\n4.awea2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(6, 8), match='a\n'>
re.sub: d,awww4.awea2 >d
re.findall: ['a\n']
re.finditer: <callable_iterator object at 0x00000235CBA9AB50>
re.split: ['d,awww', '4.awea2\t>d']
24,\S匹配任意非空字符
PracticeRe("a\S","d,awwwa\n4.awea2\t>d")
#结果:
re.match: None
re.search: <re.Match object; span=(2, 4), match='aw'>
re.sub: d,wwa
4.e >d
re.findall: ['aw', 'aw', 'a2']
re.finditer: <callable_iterator object at 0x000002952BDDAB50>
re.split: ['d,', 'wwa\n4.', 'e', '\t>d']
25,\d匹配任意数字
自己尝试。
26,\D匹配任意非数字
自己尝试。
27,\A匹配字符串开始
PracticeRe("\Ada","dawwwa\n4.aweda2\t>d")
#结果:
re.match: <re.Match object; span=(0, 2), match='da'>
re.search: <re.Match object; span=(0, 2), match='da'>
re.sub: wwwa
4.aweda2 >d
re.findall: ['da']
re.finditer: <callable_iterator object at 0x000002DEB878AB50>
re.split: ['', 'wwwa\n4.aweda2\t>d']
28,\Z匹配字符串结束,如果存在换行,只匹配到换行前的结束字符串
尝试过后,发现不管换不换行,都是匹配最后的字符。了解大佬的麻烦指点一下
PracticeRe("da\Z","""dawwwada\n4.aweda2\tda
sdda""")
#结果:
re.match: None
re.search: <re.Match object; span=(23, 25), match='da'>
re.sub: dawwwada
4.aweda2 da
sd
re.findall: ['da']
re.finditer: <callable_iterator object at 0x00000212D216AB50>
re.split: ['dawwwada\n4.aweda2\tda\nsd', '']
29,\z 匹配字符串结束
求大佬指点
PracticeRe("da\z","""dawwwada\n4.aweda2\tda
sdda""")#报错
30,\G 匹配最后匹配完成的位置
求大佬指点
PracticeRe("da\G","""dawwwada\n4.aweda2\tda
sdda""")#报错
PracticeRe("\Gda","""dawwwada\n4.aweda2\tda
sdda""")#报错
31,\b匹配一个单词边界,也就是单词和空格间的位置。
如:‘er\b’匹配’never’的’er’,但不匹配’verb’的’er’
字符串前的r十分重要,防止转义
PracticeRe(r"\bnic","It's a nice day today.")
#结果:
re.match: None
re.search: <re.Match object; span=(7, 10), match='nic'>
re.sub: It's a e day today.
re.findall: ['nic']
re.finditer: <callable_iterator object at 0x0000021C09A8AB50>
re.split: ["It's a ", 'e day today.']
在学习过程中我踩了几个坑,还填不上。希望懂的大佬拯救一下,谢谢。