问题描述
实现Linux的tail,输出文件最后n行
解决方案
利用collections.deque
代码
不断往test.txt
写入数据
i = 0
while True:
with open('test.txt', 'a') as f:
print(f.write(f'{i}\n'))
i += 1
间断读取
import time
from collections import deque
while True:
print(''.join(deque(open('test.txt'), 10)))
print()
time.sleep(2)
命令行,第一个参数为文件名,第二个参数为返回行数
python -c "import sys; from collections import deque; print(''.join(deque(open(sys.argv[1]), int(sys.argv[2]) if sys.argv[2] else 10)))" test.txt 5
缺点:不适合大文件
超大文件处理
使用超过1GB的文件,使用上述方法需要10s,以下方法几乎不耗时
不断写日志
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger()
logger.setLevel(logging.INFO)
fileHandler = logging.FileHandler('test.log', mode='a')
formatter = logging.Formatter('%(asctime)s - %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
fileHandler.setFormatter(formatter)
logger.addHandler(fileHandler)
i = 0
while True:
logger.info(str(i))
i += 1
超大文件tail,从后往前读
import time
def tail(f, size=10):
'''类似Linux的tail命令'''
if size == 0:
return []
BUFSIZE = 1024 # 一次读取长度
f.seek(0, 2) # 将流位置改为末尾
remaining_bytes = f.tell() # 当前流的位置
block = -1
data = []
while size >= 0 and remaining_bytes > 0:
if remaining_bytes - BUFSIZE > 0:
f.seek(block * BUFSIZE, 2) # 将流位置改为给定的偏移位置
bunch = f.read(BUFSIZE) # 读取
else:
f.seek(0, 0) # 文件太小,从头开始
bunch = f.read(remaining_bytes) # 只读没读过的东西
bunch = bunch.decode('utf-8')
data.insert(0, bunch)
size -= bunch.count('\n') # 读够一行减一次
remaining_bytes -= BUFSIZE
block -= 1 # 继续往回读
return ''.join(data).splitlines()[-size:]
beg = time.time()
print(tail(f=open('test.log', 'rb')))
print(time.time() - beg)
推荐阅读:
流式读取
该方法为顺序流式读取
from functools import partial
def read_from_file(filename, block_size=1024 * 8):
with open(filename, "r") as fp:
for chunk in iter(partial(fp.read, block_size), ""):
yield chunk
for i in range(10):
print(next(read_from_file('test.log')))