<<python 核心编程>>第一章　正则表达式

最新推荐文章于 2020-12-15 22:54:40 发布

孤鸿子_

最新推荐文章于 2020-12-15 22:54:40 发布

阅读量618

点赞数

分类专栏： programing-language 文章标签： python 正则表达式

本文链接：https://blog.csdn.net/dylan_frank/article/details/76219039

版权

programing-language 专栏收录该内容

29 篇文章 1 订阅

订阅专栏

基本使用方法match与search

import re

m = re.match('foo|bar','foo')

print(m.group())

foo

type(m.group())

str

m.group() 返回匹配的字符串

pattern = 'foo|bar'
m = re.match(pattern,'foobarfooooooook')
print(m.group())

foo

re.match() 返回从支付串头部开始的匹配，若没有则返回None,若有返回匹配字符

m = re.match(pattern,'ofoo')
m is None

True

m = re.search(pattern,'ofoo')
if m is not None:
    print(m.group())

foo

search 找第一次匹配的字符串

特殊字符

\s 匹配任何空格字符
\d 匹配数字
\w 匹配所有字符[0-9A-Za-z]

anyend = '.end'
m = re.match(anyend,'okend')
if m is not None:
    print(m.group())
else :
    print('None')
    tmp = re.match(anyend,'oend')
    print(tmp.group())

None
oend

‘.’　匹配单个字符，除了’\n’

m = re.match(anyend,'\nend')
if m is not None:
    print(m.group())
else:
    print('can\'t mtach "\\n"')#加'\n'表示本源字符

can't mtach "\n"

匹配字符集[]

[]创建字符集，[]相当于逻辑或的功能,例如[abcd][defg]表示匹配第一个字符为’abcd’第二个字符为’defg’

pat = '[abcd][defg]'
m = re.match(pat,'adefg')
if m is not None:
    print(m.group())
else:
    print('can\'t match ')

#防止多次调用书写，写作函数
def print_match(m):
    if m is not None:
        print(m.group())
    else:
        print('can\'t match')

ad

重复特殊字符，以及分组

限定范围和否定

字符集[]可以用作限定范围的匹配
eg:
* [a-z]匹配小写字母
* [0-9]匹配数字
* [^0-9]不匹配数字

闭包实现频数匹配

{n}匹配前面出现的正则表达式n次
{n,m}匹配n到m次
*匹配０次或多次
+匹配1次或多次
？匹配０次或多次

eg:
* [0-9]{9}匹配９次数字
*

表示字符集的特殊字符

\w　表示匹配所有字母数字的字符集相当于[A-Za-z0-9]
\d 表示所有数字相当于[0-9]

特殊字符集的大写表示不匹配

eg:
* \D 表示不匹配数字相当于[^0-9]

圆括号表示分组()

就相当于四则运算中的结合，将其作为一个整体看待

eg:
* \d+(.\d+)? 十进制小数

将其分组可用m.group(1)调用第一个子组当然group(2)调用第二个子组..m.groups()返回匹配的所有元组的字组

pat = '\w\w\w-\d{3}'
m = re.match(pat,'abc-123')
print_match(m)

abc-123

pat = '(\w{3})-(\d{3})' #分两个子组
m = re.match(pat,'abx-123')
print_match(m)
print('m.group1',m.group(1))
print('m.group2',m.group(2))
print('m.groups',m.groups())

abx-123
m.group1 abx
m.group2 123
m.groups ('abx', '123')

匹配字符串边界

^匹配以某个特定字符串开头
$匹配字符串结尾
\b匹配以特定字符开始的字符串
\B匹配不以特定字符开始的字符串，即在中间的字符串

eg:

^from 匹配以from开头的字符串
/bin$ 匹配以/bin结尾的字符串
\bthe 匹配以the作为开头的字符串，多用search
\Bthe 匹配　the中间的字符串

pat = r'\bthe' #原生字符避免转义，因为'\b'表示空格
m = re.search(pat,'othe')
print_match(m)

can't match

pat = '\Bthe'
m = re.search(pat , 'othe')
print_match(m)

the

其他特殊函数

findall 和finditer

findall 表示返回匹配的所有字符串的一个列表
finditer 与findall类似但更节省类存，返回的是迭代器

pat = r'(th\w+) and (th\w+)'
s = 'this and that'
print('findall')
print(re.findall(pat,s,re.I)) #标记re.I表示忽略大小写的匹配
print('finditer')
[x for g in re.finditer(pat,s,re.I) for x in g.groups()]

findall
[('this', 'that')]
finditer





['this', 'that']

sub 与subn

sub(patern,repl,string,count=0)

用repl替换pattern在字符串中出现的位置,count为０表示全部替换

subn还返回一个替换总数

pat = 'X'
s = 'Mr zou'
print(re.sub(pat,s,'att:X\n\nDear X\n\n'))
print(re.subn(pat,s,'att:X\n\nDear X\n\n'))

att:Mr zou

Dear Mr zou


('att:Mr zou\n\nDear Mr zou\n\n', 2)

#group除了匹配分组编号外还可以使用\N其中Ｎ是分组编号
#日期格式替换
re.sub(r'(\d{1,2})/(\d{1,2})/(\d{2}|\d{4})',r'\2/\1/\3','2/20/91')

'20/2/91'

扩展标记

?标记

eg:
* (?i)表示对后面的字符串忽略大小写注意必须放在匹配模式开头
* (?s) 点号全匹配

re.findall(r'(?i)yes','OYESok,oyesOK,YES')

['YES', 'yes', 'YES']

(?:…)

对正则表达式分组，但不保存结果用作后续检索或操作

re.findall(r'http://(?:\w+\.)*(\w+\.com)','http://google.code.com-1234')#只返回分组的东西

['code.com']

(?P\ < name > )

用name替代分组编号

search　返回一个字典,name对应一个key

pat = r'\((?P<areacode>\d{3})\) (?P<prefix>\d{3})-(?:\d{4})'
re.search(pat,'(800) 555-1212')

<_sre.SRE_Match object; span=(0, 14), match='(800) 555-1212'>

对应的可用反斜杠g进行解析
\g < name >

re.sub(pat,'(\g<areacode>) \g<prefix>-x','(800) 555-1212')

'(800) 555-x'

前视断言与后视断言

(?=…)正向前视断言匹配条件是…出现在字符串以后，但不使用该分组如(?=.com)匹配后面是.com的字符串
(?!…)负项前视断言匹配条件是…不出现在字符串以后，但不使用该分组
(?<=…)正向后视断言匹配条件是…出现在字符串以前，但不使用该分组
(?

pat = r'^\s+(?!noreply|postmaster)(\w+)'#(\w+)匹配此分组不为noreply or postmaster
re.findall(pat,'''
    sales@31256phter.com
    postmaster@phptr.com
    12345678@phptr.com
    ''',re.M)

['sales', '12345678']

条件匹配

(?(id/name)Y|N) 若分组id(or name)存在就与Y匹配否则与N匹配|N是可选项

#匹配只由x与y的交错项组成的字符串
bool(re.search(r'(?:(x)|y)(?(1)y|x)','xyxyxy'))

True