date:2019/10/8
describtion:正则表达式
正则表达式模式
import re
#字符串匹配:match(pattern,string,flag),search(pattern,string,flag)
print(re.match('www','www.baidu.com').span())#只匹配开始,开始不匹配立刻终止
print(re.match('com','www.baidu.com'))
line = 'cats are smarter than dogs'
matchObj = re.match( '(.*) are (.*?) .*', line, re.M|re.I)#re.M是多行匹配,re.I对大小写不敏感
print(matchObj.group())
print(matchObj.group(1))
print(matchObj.group(2))
print(re.search('www','www.baidu.com').span())#全字符串匹配,直到全部找到
print(re.search('com','www.baidu.com').span())
#检索和替换 sub(pattern,relp,string,count,flag)
phone = "2004-959-559 # 这是一个国外电话号码"
print(re.sub('#.*$',"",phone))
print(re.sub('\D',"",phone))#去掉‘-’
#生成正则表达式对象 compile(pattern,flag)
pattern = re.compile('\d+')
print(pattern.match('one12twothree34four'))#None从头开始无匹配
print(pattern.match('one12twothree34four',3,10).group())#12
print(pattern.match('one12twothree34four',3,10).span())#(3, 5)
pattern1 = re.compile('([a-z]+) ([a-z]+)',re.I)#re.I是不区分大小写
print(pattern1.match('Hello World Wide Web'))
print(pattern1.match('Hello World Wide Web').span())#(0, 11)
print(pattern1.match('Hello World Wide Web').group())#Hello World
print(pattern1.match('Hello World Wide Web').groups())#('Hello', 'World')
#findall(pattern,pos,endpos)
pattern2 = re.compile('\d+')
print(pattern2.findall('runoob 123 google 456'))#['123', '456']
print(pattern2.findall('run88b 123 google 456',2,10))#['88', '123']
#split()
print(re.split('\W+','runoob, runoob, runoob.'))#['runoob', 'runoob', 'runoob', '']
print(re.split('(\W+)','runoob, runoob, runoob.'))#['runoob', ', ', 'runoob', ', ', 'runoob', '.', '']
print(re.split('\W+',' runoob, runoob, runoob.'))
print(re.split('\W+',' runoob, runoob, runoob.'))
几个常见的正则模式:
(1)(.?) :匹配任意长度的任意字符
(2)(\w):匹配任意字母或数字
(3)(\d):匹配任意数字
(4)(\d+):+号代表前面的字符至少匹配一次。
(5)runoob:代表前面的字符可以不出现,也可以出现多次(可以匹配 runob、runoob、runoooooob )
(6)colou?r:? 问号代表前面的字符最多只可以出现一次(可以匹配 color 或者 colour)