part3
开局先介绍一个python标准库:pathlib 类似于os模块,但是要比os强大
for filename in Path('/').rglob('*.py'):
print(filename)
这个打印出来是一个生成器
需求:
你有很多服务器请求日志,分散在各个目录下,有的日志还被压缩了,你要在这很多文件中找到你需要的那些日志:
import gzip, bz2
import re
from pathlib import Path
def gen_find(filepat, top):
yield from Path(top).rglob(filepat)
def gen_open(paths):
for path in paths:
if path.suffix == '.gz':
yield gzip.open(path, 'rt')
elif path.suffix == '.bz2':
yield bz2.open(path, 'rt')
else:
yield open(path, 'rt')
def gen_cat(sources):
for src in sources:
yield from src
def gen_grep(pat, lines):
patc = re.compile(pat)
return (line for line in lines if patc.search(line))
if __name__ == '__main__':
pat = r'ply-.*\.gz'
logdir = 'www'
#开始寻找文件,这个时候filesnames是一个生成器,
filenames = gen_find("access-log*",logdir)
#此时logfiles也是一个生成器
logfiles = gen_open(filenames)
loglines = gen_cat(logfiles)
patlines = gen_grep(pat,loglines)
bytecol = (line.rsplit(None,1)[1] for line in patlines)
bytes_sent= (int(x) for x in bytecol if x != '-')
print("Total", sum(bytes_sent))
知识点:yield和yield from 的区别:
简单理解:
yield是返回一个值,yield from 是返回生成器的每一个值。
示例代码:
def up():
yield from [1,2,3,4,5,6]
for i in up():
print(i)
###output
1
2
3
4
5
6