Python 利用迭代器or生成器,让小内存也能处理大文件

首先
准备1个log 文件 app.log 它有60000 行数据

from loguru import logger
import os

def sample1():
    log_path = get_log_path()
    with open(log_path, "r") as f:
        list_logs = f.readlines()

    logger.info("length of app.logs: {}".format(len(list_logs)))


# get project path
def get_project_path():
    return os.path.dirname(os.path.dirname(os.path.dirname(__file__)))

# get log path
def get_log_path():
    import os
    return os.path.join(get_project_path(), "logs", "app.log")


if __name__ == "__main__":
    sample1()

输出:

(.venv) [gateman@manjaro-x13 python_common_import]$ /home/gateman/Projects/python/python_common_import/.venv/bin/python /home/gateman/Projects/python/python_common_import/src/generator/gen_sample6.py
2024-05-13 01:16:19.932 | INFO     | __main__:sample1:9 - length of app.logs: 62285

使用普通方法输出app.log 的内容到output.log

我们改一下文件, 增加1个方法sample2()来实现

from loguru import logger
import os
from src.decorator.sum_info import sum_info

@sum_info
def sample2():
    log_path = get_log_path()
    with open(log_path, "r") as f:
        list_logs = f.readlines()
    
    output_path = get_output_path()
    with open(output_path, "w") as f:
        for i in list_logs:
            f.write(i)

    logger.info("moved logs to output.log")

# get project path
def get_project_path():
    return os.path.dirname(os.path.dirname(os.path.dirname(__file__)))

# get log path
def get_log_path():
    import os
    return os.path.join(get_project_path(), "logs", "app.log")

# get output path
def get_output_path():
    return os.path.join(get_project_path(), "logs", "output.log")


if __name__ == "__main__":
    sample2()

这个方法利用f.readlines() 一次把文件内容读入1个列表
然后循环这个列表输出到另1个文件

我们看下内存占用,

(.venv) [gateman@manjaro-x13 python_common_import]$ /home/gateman/Projects/python/python_common_import/.venv/bin/python /home/gateman/Projects/python/python_common_import/src/generator/gen_sample5.py
2024-05-13 01:43:55.288 | INFO     | src.decorator.print_time:wrapper:10 - Start time of sample2 is 2024-05-13 01:43:55
2024-05-13 01:43:55.343 | INFO     | __main__:sample2:16 - moved logs to output.log
2024-05-13 01:43:55.351 | INFO     | src.decorator.print_mem:wrapper:14 - Current memory usage is 0.000866MB; Peak was 9.868371MB
2024-05-13 01:43:55.352 | INFO     | src.decorator.print_time:wrapper:13 - End time of sample2 is 2024-05-13 01:43:55
2024-05-13 01:43:55.352 | INFO     | src.decorator.print_time:wrapper:14 - Time used of sample2 is 0.06403422355651855 seconds

可见到峰值内存是9Mb 多 , 因为它要把整个文件的内容读入内存

使用迭代器

我们改一下文件, 增加1个方法sample3()来实现

from loguru import logger
import os
from src.decorator.sum_info import sum_info

@sum_info
def sample3():
    log_path = get_log_path()
    output_path = get_output_path()
    count = 0
    with open(log_path, "r") as f:
        with open(output_path, "a") as f2:
            for i in f:
                f2.write(i)
                count += 1
    
    logger.info("moved {} logs to output.log".format(count))

# get project path
def get_project_path():
    return os.path.dirname(os.path.dirname(os.path.dirname(__file__)))

# get log path
def get_log_path():
    import os
    return os.path.join(get_project_path(), "logs", "app.log")

# get output path
def get_output_path():
    return os.path.join(get_project_path(), "logs", "output.log")


if __name__ == "__main__":
    sample3()

由于 f实际上是1 TextIOWrapper, 它是1个interable
所以我们可以用for … in 来迭代它

这种方法的内存占用:

(.venv) [gateman@manjaro-x13 python_common_import]$ /home/gateman/Projects/python/python_common_import/.venv/bin/python /home/gateman/Projects/python/python_common_import/src/generator/gen_sample7.py
2024-05-13 01:50:33.133 | INFO     | src.decorator.print_time:wrapper:10 - Start time of sample3 is 2024-05-13 01:50:33
2024-05-13 01:50:33.229 | INFO     | __main__:sample3:16 - moved 62320 logs to output.log
2024-05-13 01:50:33.229 | INFO     | src.decorator.print_mem:wrapper:14 - Current memory usage is 0.00086MB; Peak was 0.041176MB
2024-05-13 01:50:33.230 | INFO     | src.decorator.print_time:wrapper:13 - End time of sample3 is 2024-05-13 01:50:33
2024-05-13 01:50:33.230 | INFO     | src.decorator.print_time:wrapper:14 - Time used of sample3 is 0.09714841842651367 seconds

只有0.04MB
大大节省了内存!

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

nvd11

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值