Python re 模块整理

最新推荐文章于 2024-08-19 21:02:45 发布

xluren

最新推荐文章于 2024-08-19 21:02:45 发布

阅读量1.3k

点赞数

分类专栏：日志处理相关 python 周边文章标签： python re

本文链接：https://blog.csdn.net/xluren/article/details/40182273

版权

python 周边同时被 2 个专栏收录

46 篇文章 0 订阅

订阅专栏

日志处理相关

2 篇文章 0 订阅

订阅专栏

整理下Python re模块几个重要的东西。

使用re 模块，我的习惯

1.编译pattern

pattern=re.compile(r'hello')

2.使用re的搜索匹配函数

pattern.search("hello world")

3.获取匹配结果

if match：

print match.groups()

re的匹配函数有 match，search，findall，finditer，split，我常用的就这5个

match 返回的是 tuple 元组

search 返回的是 tuple 元组

findall 返回的是list 列表

finditer 返回的是iter 迭代器

split 返回的是list 列表

具体的测试例子如下所示：

xluren@test re_compile]$ cat demo.py 
import re

str1='218.205.750.157 46 TCP_MISS [16/Oct/2014:19:29:38 +0800] "GET /i.jpg HTTP/1.1" 200 4576 "-" "-" "GT-droid" "2297768042"'

str2='www.baidu.cn 220.162.917.199 9 TCP_HIT [16/Oct/2014:21:01:39 +0800] "GET /r.gif HTTP/0.0" 200 13815 "-" "-" "vroid" ""'

pattern=re.compile(r'([\w\d.]{0,})\s([0-9.]+)\s(\d+|-)\s(\w+)\s\[([^\[\]]+)\s\+\d+\]\s"((?:[^"]|\")+)"\s(\d{3})\s(\d+|-)\s"((?:[^"]|\")+|-)"\s"(.+|-)"\s"((?:[^"]|\")+)"\s"(.{0,}|-)"$')

print "="*10
print "match test"
match=pattern.match(str1)
if match:
    print match.groups()

match=pattern.match(str2)
if match:
    print match.groups()
print "return type is :",type(match.groups()).__name__

print "="*10
print "search"
search=pattern.search(str1)
if search:
    print search.groups()

search=pattern.search(str2)
if search:
    print search.groups()
print "return type is ",type(search.groups()).__name__


print "="*10
print "split"
split=pattern.split(str1)
if split:
    print split
print 'return type is ',type(split).__name__ 

print "="*10
print "finditer"
finditer=pattern.finditer(str1)
if finditer:
    for i in finditer:
        print i
finditer=pattern.finditer(str2)
if finditer:
    for i in finditer:
        print i.group()
print "return type is ",type(finditer).__name__

print "="*10
print "findall"
findall=pattern.findall(str1)
if findall:
    print findall
findall=pattern.findall(str2)
if findall:
    print findall
print "return type is ",type(findall).__name__

print "="*10
p = re.compile(r'(\w+) (\w+)')
s = 'i say, hello world!'
print p.sub(r'\2 \1', s)
[xluren@test re_compile]$

测试输出结果：

[xluren@test re_compile]$ python demo.py 
==========
match test
('www.baidu.cn', '220.162.917.199', '9', 'TCP_HIT', '16/Oct/2014:21:01:39', 'GET /r.gif HTTP/0.0', '200', '13815', '-', '-', 'vroid', '')
return type is : tuple
==========
search
('www.baidu.cn', '220.162.917.199', '9', 'TCP_HIT', '16/Oct/2014:21:01:39', 'GET /r.gif HTTP/0.0', '200', '13815', '-', '-', 'vroid', '')
return type is  tuple
==========
split
['218.205.750.157 46 TCP_MISS [16/Oct/2014:19:29:38 +0800] "GET /i.jpg HTTP/1.1" 200 4576 "-" "-" "GT-droid" "2297768042"']
return type is  list
==========
finditer
www.baidu.cn 220.162.917.199 9 TCP_HIT [16/Oct/2014:21:01:39 +0800] "GET /r.gif HTTP/0.0" 200 13815 "-" "-" "vroid" ""
return type is  callable-iterator
==========
findall
[('www.baidu.cn', '220.162.917.199', '9', 'TCP_HIT', '16/Oct/2014:21:01:39', 'GET /r.gif HTTP/0.0', '200', '13815', '-', '-', 'vroid', '')]
return type is  list
==========
say i, world hello!
[xluren@test re_compile]$