正则表达式Python版本

最新推荐文章于 2024-07-25 18:55:58 发布

m悟空

最新推荐文章于 2024-07-25 18:55:58 发布

阅读量246

点赞数

分类专栏： Python学习文章标签： python 正则表达式

本文链接：https://blog.csdn.net/qq_41890797/article/details/114678836

版权

Python学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

re模块
   match   只比较开头 span 位置，group内容
   search   从头开始找，只能找一次
   findall  从头开始找，找多次
   sub   替换
   split 切割

限定符
use?             ?前的字符可有（1个）可无
us*e             *前的字符可有(任意个)可无
us+e             +前的字符最少出现一次
ab{6}c           {6}前的字符只能出现6次
ab{2,6}c          b出现的次数在2-6次
ab{2，}c          b出现的次数>=2
a(bc)？           bc可有可无，同理...

或
a (cat | dog)     可以匹配a cat 或者 a dog

字符类   [内容]+
[abc]+    匹配只涵 abc字符的串
[a-z]+    所有的小写
[A-Z]+
[0-9]+
[a-z A-Z 0-9]+
[^0-9]+   除数字外的所有字符，包含换行符

元字符
\b    边界
\d    数字
\w    单词，数字下划线
\s    空白符，包括TAB和换行符
\D    非数字字符
\W    非单词字符
\S    非空白符
.     任意字符，不包含换行符

^匹配行首   $匹配行尾
^a  行首a
a$  行尾a

量词
*  >=0
+  >=1
?  0,1

s = '娜扎佟丽娅迪丽热巴'
result = re.match('佟丽娅',s)   #从头开始比较一次，不成功的话返回None
#print(result.span())   span  返回一个内容
#print(result.group())   group 提取到匹配的内容部分

result = re.search('佟丽娅',s)    #从头开始比较多次，不成功的话返回None,成功的话只找一次
result = re.findall('佟丽娅',s)  #可以匹配多个


result = re.sub(r'\d+','90','python:89')  #sub替换
#print(result) python:90
def func(temp):
    num = temp.group()
    new_num = int(num) + 1
    return str(new_num)
result = re.sub(r'\d+',func,'python:89') #sub也可以用函数作为参数
#print(result) python:90

result = re.split(r'[,:]','pthon:80,java:89')  #切割
#print(result)     ['pthon', '80', 'java', '89']



msg = 'a7asadsa88asdasdas7878s'
result = re.findall('[a-z][0-9]+[a-z]',msg)  #['a7a', 'a88a', 's7878s']

qq = '765513215'
result = re.findall('^[1-9][0-9]{4,10}$',qq) #验证QQ号，5-11位，开头不为0


#用户名必须是字母或数字，第一位不为数字，长度必须是6位以上
usename = 'a1asasd'
result = re.match('^[a-zA-Z][0-9a-zA-Z]{5,}$',usename) #验证整个用户名的话，记得前加 ^ 后加 $

#用户名必须是字母或数字下划线，第一位不为数字，长度必须是6位以上
result = re.match('^[a-zA-Z]\w{5,}$',usename)


msg = 'asd.txt asw.py sas.doc dspysw.txt od.py'
result = re.findall(r'\w*\.py\b',msg)


phone = '15237290263'
result = re.match('1[35789]\d{9}$',phone)  #验证手机号

#print(result)    <re.Match object; span=(0, 11), match='15237290263'>

#分组
#匹配数字0-100

n = '1'
result = re.match(r'[1-9]?\d?$|100$',n)  #<re.Match object; span=(0, 1), match='1'>

meail = '765513215@qq.com'
result = re.match(r'\w{5,20}@(163|qq|126)\.(com|cn)$',meail) #<re.Match object; span=(0, 16), match='765513215@qq.com'>


#分别提取手机号的区号和号码
phone = '010-15237290263'
result = re.match(r'(\d{3}|\d{4})-(\d{11})$',phone)
#print(result)  <re.Match object; span=(0, 15), match='010-15237290263'>
#print(result.group())  010-15237290263
#print(result.group(1))  010   提取第一组的内容
#print(result.group(2))  15237290263  提取到第二组的内容

msg1 = '<html>asd</html>'
msg2 = '<h1>内容</h1>'
msg3 = '<html>asd</h1>'
result = re.match(r'<[0-9a-zA-Z]+>(.+)</[0-9a-zA-Z]+>$',msg1)
#print(result.group(1))  asd

result = re.match(r'<[0-9a-zA-Z]+>(.+)</[0-9a-zA-Z]+>$',msg3)
#print(result.group(1)) asd

result = re.match(r'<([0-9a-zA-Z]+)>(.+)</\1>$',msg3)   #引用，\1 表示和第一个()内的内容一致
#print(result) None

result = re.match(r'<([0-9a-zA-Z]+)>(.+)</\1>$',msg1)
#print(result.group(2)) asd


#给标签起名的方式  （？P<名字>正则）  （？P=名字）
msg = '<html><h1>asd</h1></html>'
result = re.match(r'<(?P<name1>\w+)><(?P<name2>\w+)>(.+)</(?P=name2)></(?P=name1)>$',msg)
#print(result)  <re.Match object; span=(0, 25), match='<html><h1>asd</h1></html>'>

m悟空

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
正则表达式Python版本

re模块 match 只比较开头 span 位置，group内容 search 从头开始找，只能找一次 findall 从头开始找，找多次 sub 替换 split 切割限定符use? ?前的字符可有（1个）可无us*e *前的字符可有(任意个)可无us+e +前的字符最少出现一次ab{6}c {6}前的字符只能出现6次ab{2,6}c ..
复制链接

扫一扫

专栏目录