re模块
功能:实现python对正则表达式对支持与应用,将想要得到对内容按照正则表达式匹配出来
应用场景:爬虫脚本、对用户输入内容进行合规检查(如qq格式检查)等
常用方法
findall
功能:匹配对象中所有符合正则表达式的内容,并取出来
返回值:列表,所匹配到对项都会返回到列表中
import re
content = '13621abc15323def19323'
regex = r'1[3-9]\d{3}'
ret = re.findall(regex,content)
print(ret)
#返回值:['13621', '15323', '19323']
search
功能:从开头开始匹配,任何地方有符合规则的都返回一个(只返回一个)
返回值:re自定义类型
import re
content = '13621abc15323def19323'
regex = r'(1[3-9]\d{3})(\D{3})'
ret = re.search(regex,content)
print(ret)
#返回值:<_sre.SRE_Match object; span=(0, 5), match='13621'>
search 取分组中的内容 group()
根据序号取
import re
content = '13621abc15323'
regex = r'(1[3-9]\d{3})(\D{3})(1[3-9]\d{3})'
ret = re.search(regex,content)
print(ret.group()) # 13621abc15323
print(ret.group(0)) # 13621abc15323
print(ret.group(1)) # 13621
print(ret.group(2)) # abc
print(ret.group(3)) # 15323
根据组名取
import re
content = '136abc15323'
regex = r'(?P<num1>\d)(?P<num2>\d)(?P<num3>\d)'
ret = re.search(regex,content)
print(ret.group()) # 136
print(ret.group('num1')) # 1
print(ret.group('num2')) # 3
print(ret.group('num3')) # 6
分组的引用
注:由于?P=num1是引用了num1分组,所以它匹配到的内容必须和num1匹配到到内容一样
import re
content = '136abc15332'
regex = r'(?P<num1>\d)(?P=num1)(?P<num3>\d)'
ret = re.search(regex,content)
print(ret.group())
# 332
match
功能:
从开头匹配,若开头部分匹配到则匹配成功,否则失败
匹配用户输入的内容是否合法时,都是用match
返回值:
匹配到:re自定义类型
未匹配到:None
import re
content = 'supervisorctl'
ret1 = re.match('super',content)
ret2 = re.match('visor',content)
ret3 = re.search('visor',content)
print(ret1) # <_sre.SRE_Match object; span=(0, 5), match='super'>
print(ret2) # None
print(ret3) # <_sre.SRE_Match object; span=(5, 10), match='visor'>
进阶方法
compile
功能:预编译,预先编译我们写的规则,方便代码中多次引用
import re
regex = r'\d{6}'
content = 'admin:123456;user:654321'
rule = re.compile(regex)
ret = rule.findall(content)
print(ret)
# 返回结果:['123456', '654321']
finditer
功能:返回一个迭代器,循环取出的是re的自定义类型;可通过group取值,能够节省空间
import re
content = 'admin:123456;user:654321;root:987654'
ret = re.finditer(r'\d{6}',content)
print(ret) # <callable_iterator object at 0x00BE9290>
for i in ret:
print(i.group()) # 123456 654321 987654
了解内容
split
功能:根据正则做切割
import re
regex1 = r'\d'
regex2 = r'\d{3}'
content = 'abc123def456ghi'
ret1 = re.split(regex1,content)
ret2 = re.split(regex2,content)
print(ret1) # ['abc', '', '', 'def', '', '', 'ghi']
print(ret2) # ['abc', 'def', 'ghi']
sub
功能:将满足正则的部分替换成指定的内容(可指定替换个数)
用法:re.sub(正则,要替换的内容,待替换的字符串,要替换的个数)
import re
regex1 = r'\d'
regex2 = r'\d{3}'
content = 'abc123def456ghi'
ret1 = re.sub(regex1,'M',content,1)
ret2 = re.sub(regex2,'H',content,2)
print(ret1) # abcM23def456ghi
print(ret2) # abcHdefHghi
subn
功能:替换所有满足正则的部分
用法:re.sub(正则,要替换的内容,待替换的字符串)
返回值:元祖(元素1:替换后的结果;元素2:替换的次数)
import re
regex1 = r'\d'
regex2 = r'\d{3}'
content = 'abc123def456ghi'
ret1 = re.subn(regex1,'M',content)
ret2 = re.subn(regex2,'H',content)
print(ret1) # ('abcMMMdefMMMghi', 6)
print(ret2) # ('abcHdefHghi', 2)