整理下Python re模块几个重要的东西。
使用re 模块,我的习惯
1.编译pattern
pattern=re.compile(r'hello')
2.使用re的搜索匹配函数
pattern.search("hello world")
3.获取匹配结果
if match:
print match.groups()
re的匹配函数有 match,search,findall,finditer,split,我常用的就这5个
match 返回的是 tuple 元组
search 返回的是 tuple 元组
findall 返回的是list 列表
finditer 返回的是iter 迭代器
split 返回的是list 列表
具体的测试例子如下所示:
xluren@test re_compile]$ cat demo.py
import re
str1='218.205.750.157 46 TCP_MISS [16/Oct/2014:19:29:38 +0800] "GET /i.jpg HTTP/1.1" 200 4576 "-" "-" "GT-droid" "2297768042"'
str2='www.baidu.cn 220.162.917.199 9 TCP_HIT [16/Oct/2014:21:01:39 +0800] "GET /r.gif HTTP/0.0" 200 13815 "-" "-" "vroid" ""'
pattern=re.compile(r'([\w\d.]{0,})\s([0-9.]+)\s(\d+|-)\s(\w+)\s\[([^\[\]]+)\s\+\d+\]\s"((?:[^"]|\")+)"\s(\d{3})\s(\d+|-)\s"((?:[^"]|\")+|-)"\s"(.+|-)"\s"((?:[^"]|\")+)"\s"(.{0,}|-)"$')
print "="*10
print "match test"
match=pattern.match(str1)
if match:
print match.groups()
match=pattern.match(str2)
if match:
print match.groups()
print "return type is :",type(match.groups()).__name__
print "="*10
print "search"
search=pattern.search(str1)
if search:
print search.groups()
search=pattern.search(str2)
if search:
print search.groups()
print "return type is ",type(search.groups()).__name__
print "="*10
print "split"
split=pattern.split(str1)
if split:
print split
print 'return type is ',type(split).__name__
print "="*10
print "finditer"
finditer=pattern.finditer(str1)
if finditer:
for i in finditer:
print i
finditer=pattern.finditer(str2)
if finditer:
for i in finditer:
print i.group()
print "return type is ",type(finditer).__name__
print "="*10
print "findall"
findall=pattern.findall(str1)
if findall:
print findall
findall=pattern.findall(str2)
if findall:
print findall
print "return type is ",type(findall).__name__
print "="*10
p = re.compile(r'(\w+) (\w+)')
s = 'i say, hello world!'
print p.sub(r'\2 \1', s)
[xluren@test re_compile]$
测试输出结果:
[xluren@test re_compile]$ python demo.py
==========
match test
('www.baidu.cn', '220.162.917.199', '9', 'TCP_HIT', '16/Oct/2014:21:01:39', 'GET /r.gif HTTP/0.0', '200', '13815', '-', '-', 'vroid', '')
return type is : tuple
==========
search
('www.baidu.cn', '220.162.917.199', '9', 'TCP_HIT', '16/Oct/2014:21:01:39', 'GET /r.gif HTTP/0.0', '200', '13815', '-', '-', 'vroid', '')
return type is tuple
==========
split
['218.205.750.157 46 TCP_MISS [16/Oct/2014:19:29:38 +0800] "GET /i.jpg HTTP/1.1" 200 4576 "-" "-" "GT-droid" "2297768042"']
return type is list
==========
finditer
www.baidu.cn 220.162.917.199 9 TCP_HIT [16/Oct/2014:21:01:39 +0800] "GET /r.gif HTTP/0.0" 200 13815 "-" "-" "vroid" ""
return type is callable-iterator
==========
findall
[('www.baidu.cn', '220.162.917.199', '9', 'TCP_HIT', '16/Oct/2014:21:01:39', 'GET /r.gif HTTP/0.0', '200', '13815', '-', '-', 'vroid', '')]
return type is list
==========
say i, world hello!
[xluren@test re_compile]$