1.按代码块读取–生成器
def read_in_chunks(file_obj, chunk_size = 2048):
"""
逐件读取文件
默认块大小:2KB
"""
while True:
data = file_obj.read(chunk_size) # 每次读取指定的长度
if not data:
break
yield data
with open('filename', 'r', encoding = 'utf-8') as f:
for chuck in read_in_chunks(f):
do_something(chunk)
2.迭代器读取:
使用itertools模块,islice返回的是一个生成器,可以用list格式化
from itertools import islice
def read_itertools(path):
with open(path, 'r', encoding='utf-8') as fout:
list_gen = islice(fout, 0, 5) # 两个参数分别表示开始行和结束行
for line in list_gen:
print(line)
注:readlines() 方法(read() 也一样)会将整个文件加载到内存中。在文件较大时,往往会引发 MemoryError(内存溢出)。readline是一行一行的读取