通过迭代器对文件切片
日常工作中会遇到多达4,5G的日志文件,如果把文件都一次读到内存,再进行切片比较浪费资源:
In [7]: f = open('access.log')
In [7]: lines = f.readlines()
In [7]: lines[1:19]
这样对文本是可以切片的,但是如果文件很大,就很浪费资源
可以用迭代器对文本进行切片,这个时候需要用到itertools包下的islice这个函数,
In [7]: from itertools import islice
In [8]: islice?
Type: type
String form: <type 'itertools.islice'>
Docstring:
islice(iterable, [start,] stop [, step]) --> islice object
Return an iterator whose next() method returns selected values from an
iterable. If start is specified, will skip all preceding elements;
otherwise, start defaults to zero. Step defaults to one. If
specified as another value, step determines how many values are
skipped between successive calls. Works like a slice() on a list
but returns an iterator。
这个函数需要一个可迭代对象, 起始值,终止值, 步进值 ,下面试试看:
In [3]: from itertools import islice
In [4]: islice(lines,100,300)
Out[4]: <itertools.islice at 0x7f50152ddfc8>
#说明是个可迭代对象
In [5]: for x in islice(lines,100,300):
...: print x
结果是 完美的 成功对文本进行迭代切片