Python一行代码实现Linux的tail

最新推荐文章于 2021-10-29 15:47:16 发布

XerCis

最新推荐文章于 2021-10-29 15:47:16 发布

阅读量784

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/lly1122334/article/details/106906704

版权

Python 专栏收录该内容

529 篇文章

订阅专栏

文章目录

问题描述
解决方案
代码
超大文件处理
流式读取
参考文献

问题描述

实现Linux的tail，输出文件最后n行

在这里插入图片描述

解决方案

利用collections.deque

代码

不断往test.txt写入数据

i = 0
while True:
    with open('test.txt', 'a') as f:
        print(f.write(f'{i}\n'))
    i += 1

间断读取

import time
from collections import deque

while True:
    print(''.join(deque(open('test.txt'), 10)))
    print()
    time.sleep(2)

命令行，第一个参数为文件名，第二个参数为返回行数

python -c "import sys; from collections import deque; print(''.join(deque(open(sys.argv[1]), int(sys.argv[2]) if sys.argv[2] else 10)))" test.txt 5

在这里插入图片描述

缺点：不适合大文件

超大文件处理

使用超过1GB的文件，使用上述方法需要10s，以下方法几乎不耗时

不断写日志

import logging

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger()
logger.setLevel(logging.INFO)
fileHandler = logging.FileHandler('test.log', mode='a')
formatter = logging.Formatter('%(asctime)s - %(levelname)s: %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
fileHandler.setFormatter(formatter)
logger.addHandler(fileHandler)

i = 0
while True:
    logger.info(str(i))
    i += 1

超大文件tail，从后往前读
在这里插入图片描述

import time


def tail(f, size=10):
    '''类似Linux的tail命令'''
    if size == 0:
        return []

    BUFSIZE = 1024  # 一次读取长度
    f.seek(0, 2)  # 将流位置改为末尾
    remaining_bytes = f.tell()  # 当前流的位置
    block = -1
    data = []

    while size >= 0 and remaining_bytes > 0:
        if remaining_bytes - BUFSIZE > 0:
            f.seek(block * BUFSIZE, 2)  # 将流位置改为给定的偏移位置
            bunch = f.read(BUFSIZE)  # 读取
        else:
            f.seek(0, 0)  # 文件太小，从头开始
            bunch = f.read(remaining_bytes)  # 只读没读过的东西

        bunch = bunch.decode('utf-8')
        data.insert(0, bunch)
        size -= bunch.count('\n')  # 读够一行减一次
        remaining_bytes -= BUFSIZE
        block -= 1  # 继续往回读

    return ''.join(data).splitlines()[-size:]


beg = time.time()
print(tail(f=open('test.log', 'rb')))
print(time.time() - beg)

推荐阅读：

io — 处理流的核心工具

流式读取

该方法为顺序流式读取

from functools import partial


def read_from_file(filename, block_size=1024 * 8):
    with open(filename, "r") as fp:
        for chunk in iter(partial(fp.read, block_size), ""):
            yield chunk


for i in range(10):
    print(next(read_from_file('test.log')))