[Python] 生成器按行读取大文件-CSDN博客

本文链接：https://blog.csdn.net/Spade_/article/details/108210412

我们平时很少读取1个G或者N个G的大文件。但假如要读取500G的大文件，是不可能直接通过 f.read() 读到内存的，因为内存会爆掉··· 如果是超过内存容量的大文件，需要分次从磁盘内读取到内存中，这时候生成器就格外的重要了。直接上代码，非常简单。

按行读取生成器：

def read_file(file):
    with open(file, mode='r', encoding='utf8') as f:
        while True:
            one_line = f.readline().strip()
            if not one_line:
                return
            yield one_line

按行读取csv文件

有个 student_info.csv 文件的部分内容如下：

Name,Date,English,Math,Chinese,Money,Other_1,Other_2,Other_3
XiaoMing,2020-07-20T01:07:00Z,42,93,0.45,3077,5739,0.54,1
XiaoHu,2020-07-20T01:07:31Z,320,852,0.38,18874,37143,0.51,1
XiaoWang,2020-07-20T01:07:48Z,38,118,0.32,3581,34875,0.1,1
XiaoYe,2020-07-20T01:12:48Z,312,477,0.65,3210,4935,0.65,1
XiaoAn,2020-07-20T01:14:04Z,163,263,0.62,2152,4117,0.52,1
XiaoChen,2020-07-20T01:17:30Z,8,10,0.8,423,777,0.54,0.98
XiaoPeng,2020-07-20T01:17:44Z,5,9,0.56,1053,1398,0.75,1
XiaoHong,2020-07-20T01:19:33Z,392,797,0.49,8969,15366,0.58,1
XiaoNing,2020-07-20T01:20:59Z,24,41,0.59,1387,2677,0.52,1
XiaoJing,2020-07-20T01:23:22Z,16,53,0.3,696,1378,0.51,1
XiaoChong,2020-07-20T01:23:45Z,76,111,0.68,2127,2713,0.78,1
XiaoMing,2020-07-20T01:24:01Z,52,135,0.39,3251,6695,0.49,0.99

现在使用生成器按行读取并输出前 6 列和前 5 行：

import csv
from collections import namedtuple

def read_file(file):
    with open(file, mode='r', encoding='utf8') as f:
        while True:
            one_line = f.readline().strip()
            if not one_line:
                return
            yield one_line


lines = read_file("student_info.csv")	# lines 是一个生成器
csv_reader = csv.reader(lines)  

header = next(csv_reader)[:6]	# 只使用前 6 列
print(header)

# Student = namedtuple("Student", header)
Student = namedtuple("Student", "Name Date English Math Chinese Money")

for index, row in enumerate(csv_reader):
    _ = Student._make(row[:6])  # 适配前 6 列
    print(row[:4])      # 输出前 4 列
    print(_.Name, _.Date, _.English, _.Math, _.Chinese, _.Chinese)
    if index == 5:      # 输出前 5 行
        break