python中re模块的函数_[Python]re模块的核心函数和方法

re模块常用的函数和方法

Function/Method

Description

re Module Function Only

compile(pattern, flags=0)

Compile REpattern with any optionalflags and return a regex object

re Module Functions and regex Object Methods

match(pattern, string, flags=0)

Attempt to match REpattern tostring with optionalflags; return match object on success,None on failure

search(pattern, string, flags=0)

Search for first occurrence of REpattern withinstring with optionalflags; return match object on success,None on failure

findall(pattern, string[,flags])

Look for all (non-overlapping) occurrences ofpattern instring; return a list of matches

finditer(pattern, string[, flags])

Same asfindall() except returns an iterator instead of a list; for each match, the iterator returns a match object

split(pattern, string, max=0)

Splitstring into a list according to REpattern delimiter and return list of successful matches, splitting at mostmax times (split all occurrences is the default)

sub(pattern, repl, string, max=0)

Replace all occurrences of the REpattern instring withrepl, substituting all occurrences unlessmax provided (also seesubn() which, in addition, returns the number of substitutions made)

Match Object Methods

group(num=0)

Return entire match (or specific subgroupnum)

groups()

Return all matching subgroups in a tuple (empty if there weren't any)

1,使用compile()编译正则表达式

大多数re模块函数都可以作为regex对象的方法。建议对模式进行预编译。

2,匹配对象和group()、groups()方法

match()、和search()被成功调用后返回一种对象类型-匹配对象。匹配对象有两个主要方法:group()和groups()。group()方法或者返回所有匹配对象或是根据要求返回某个特定自组。groups()则很简单,它返回一个包含唯一或所有子组的元组。如果正则表达式中没有子组的话,groups()将返回一个空元组,而group()仍会返回全部匹配对象。

3,用match()匹配字符串

match()函数尝试从字符串的开头开始对模式进行匹配。如果匹配成功,就返回一个匹配对象,而如果匹配失败了,就返回None。

>>> m = re.match('foo','foo')

>>> if m is not None:

m.group()

'foo'

>>> m

>>> m = re.match('foo','bar')

>>> if m is not None: m.group()

>>> m = re.match('foo','food on the table')

>>> m.group()

'foo'

>>> re.match('foo','food on the table').group()

'foo'

4,search()在一个字符串中查找一个模式(搜索与匹配的比较)

search()和match()的工作一样,不同之处在于search()会检查参数字符串任意位置的地方给定正则表达式的匹配情况。如果匹配成功,则会返回一个匹配对象,否则返回None。

>>> m = re.match('foo','seafood')

>>> if m is not None: m.group()

>>> m = re.search('foo','seafood')

>>> if m is not None: m.group()

'foo'

5,匹配多个字符串(|)

>>> bt = 'bat|bet|bit'

>>> m = re.match(bt,'bat')

>>> if m is not None: m.group()

'bat'

>>> m = re.match(bt,'blt')

>>> if m is not None: m.group()

>>> m = re.match(bt,'He bit me!')

>>> if m is not None: m.group()

>>> m = re.search(bt,'He bit me!')

>>> if m is not None: m.group()

'bit'

6,匹配任意单个字符(.)

句点是不能匹配换行符或非字符(即空字符串)。

>>> anyend = '.end'

>>> m = re.match(anyend,'bend')

>>> if m is not None: m.group()

'bend'

>>> m = re.match(anyend,'end')

>>> if m is not None: m.group()

>>> m = re.match(anyend,'\nend')

>>> if m is not None: m.group()

>>> m = re.search('.end','The end.')

>>> if m is not None: m.group()

' end'

>>> patt314 = '3.14'

>>> pi_patt = '3\.14'

>>> m = re.match(pi_patt,'3.14')

>>> if m is not None: m.group()

'3.14'

>>> m = re.match(patt314,'3014')

>>> if m is not None: m.group()

'3014'

>>> m = re.match(patt314,'3.14')

>>> if m is not None: m.group()

'3.14'

7,创建字符集合([])

>>> m = re.match('[cr][23][dp][o2]','c3po')

>>> if m is not None: m.group()

'c3po'

>>> m = re.match('[cr][23][dp][o2]','c2do')

>>> if m is not None: m.group()

'c2do'

>>> m = re.match('r2d2|c3po','c2do')

>>> if m is not None: m.group()

>>> m = re.match('r2d2|c3po','r2d2')

>>> if m is not None: m.group()

'r2d2'

8,重复、特殊字符和子组

>>> patt = '\w+@(\w+\.)?\w+\.com'

>>> re.match(patt,'nobody@xxx.com').group()

'nobody@xxx.com'

>>> re.match(patt,'nobody@www.xxx.com').group()

'nobody@www.xxx.com'

>>> patt = '\w+@(\w+\.)*\w+\.com'

>>> re.match(patt,'nobody@www.xxx.yyy.zzz.com').group()

'nobody@www.xxx.yyy.zzz.com'

>>> m = re.match('\w\w\w-\d\d\d','abc-123')

>>> if m is not None: m.group()

'abc-123'

>>> m = re.match('\w\w\w-\d\d\d','abc-xyz')

>>> if m is not None: m.group()

>>> m = re.match('(\w\w\w)-(\d\d\d)','abc-123')

>>> m.group()

'abc-123'

>>> m.group(1)

'abc'

>>> m.group(2)

'123'

>>> m.groups()

('abc', '123')

>>> m = re.match('ab','ab') # 无子组

>>> m.group()

'ab'

>>> m.groups()

()

>>> m = re.match('(ab)','ab')

>>> m.group()

'ab'

>>> m.groups(1)

('ab',)

>>> m.groups()

('ab',)

>>> m = re.match('(a)(b)','ab')

>>> m.group()

'ab'

>>> m.group(1)

'a'

>>> m.group(2)

'b'

>>> m.groups()

('a', 'b')

>>> m = re.match('(a(b))','ab')

>>> m.group()

'ab'

>>> m.group(1)

'ab'

>>> m.group(2)

'b'

>>> m.groups()

('ab', 'b')

9,从字符串的开头或结尾匹配及在单词边界上的匹配

>>> m = re.search('^The','The end.')

>>> if m is not None: m.group()

'The'

>>> m = re.search('^The','end. The')

>>> if m is not None: m.group()

>>> m = re.search(r'\bthe','bitethe dog')

>>> if m is not None: m.group()

>>> m = re.search(r'\bthe','bite the dog')

>>> if m is not None: m.group()

'the'

>>> m = re.search(r'\Bthe','bitethe dog')

>>> if m is not None: m.group()

'the'

10,用findall()找到每个出现的匹配部分

非重叠地搜索某个字符串中一个正则表达式模式出现的情况。findall()和search()相似之处在于二者都执行字符串搜索,但findall()和match()与search()不同之处是,findall()总返回一个列表。如果findall()没有找到匹配部分,会返回空列表;如果成功找到匹配部分,则返回所有匹配部分的列表。

>>> re.findall('car','car')

['car']

>>> re.findall('car','scary')

['car']

>>> re.findall('car','carry the barcardi to the car')

['car', 'car', 'car']

11,用sub()和subn()进行搜索和替换

>>> re.sub('X','Mr.Smith','attn: X\n\nDear X,\n')

'attn: Mr.Smith\n\nDear Mr.Smith,\n'

>>> re.subn('X','Mr.Smith','attn: X\n\nDear X,\n')

('attn: Mr.Smith\n\nDear Mr.Smith,\n', 2)

>>> print re.sub('X','Mr.Smith','attn: X\n\nDear X,\n')

attn: Mr.Smith

Dear Mr.Smith,

>>> re.sub('[ae]','X','abcdef')

'XbcdXf'

>>> re.subn('[ae]','X','abcdef')

('XbcdXf', 2)

12,用split()分割(分割模式)

>>> re.split(':','str1:str2:str3')

['str1', 'str2', 'str3']

正则表达式练习的数据生成代码

from random import randint ,choice

from string import lowercase

from sys import maxint

from time import ctime

doms = ('com','deu','net','org','gov')

for i in range(randint(5,10)):

dtint = randint(0, maxint - 1) #date

dtstr = ctime(dtint)

shorter = randint(4,7) #login shorter

em = ''

for j in range(shorter): #generate login

em += choice(lowercase)

longer = randint(shorter,12) # domain longer

dn = ''

for j in range(longer):

dn += choice(lowercase)

print '%s::%s@%s.%s::%d-%d-%d'%(dtstr,em,dn,choice(doms),dtint,shorter,longer)Wed Sep 10 01:27:06 2025::ivepup@lyduwbnwec.deu::1757438826-6-10

Thu Mar 24 10:37:16 2011::hvvyogn@wtplvnkuocfh.net::1300934236-7-12

Sun Sep 14 18:00:06 2036::uebmhs@vmcjmxjpqiul.org::2104999206-6-12

Thu Mar 28 10:46:48 1985::phsdtd@srgwdpovndy.deu::480826008-6-11

Wed Dec 12 23:31:36 2012::xhfd@qgtrtgkfja.com::1355326296-4-10

Mon Mar 14 10:20:36 2011::uynyvfm@xiimpgwkmw.gov::1300069236-7-10

Mon Oct 11 01:42:40 1976::ehqt@ntxfyu.gov::213817360-4-6

Wed Feb 27 12:07:46 1985::zwcqrlu@zyifcxsleb.com::478325266-7-10

Sun Sep 13 11:27:21 1970::mtfn@umbsfsrmrue.deu::22044441-4-11

Sun Sep 04 16:13:58 2005::qhwz@mvbgvpe.net::1125821638-4-7

REF:Core Python Programming

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值