Read Large Files in Python

7 篇文章 0 订阅

Read Large Files in Python

I have a large file ( ~4G) to process in Python. I wonder whether it is OK to "read" such a large file. So I tried in the following several ways:

The original large file to deal with is not "./CentOS-6.5-i386.iso", I just take this file as an example here.

1:  Normal Method. (ignore try/except/finally)

复制代码
def main():    
    f = open(r"./CentOS-6.5-i386.iso", "rb")
    for line in f:
        print(line, end="")
    f.close()

if __name__ == "__main__":
    main()
复制代码

2: "With" Method.

复制代码
def main():    
    with open(r"./CentOS-6.5-i386.iso", "rb") as f:
        for line in f:
            print(line, end="")

if __name__ == "__main__":
    main()
复制代码

3:  "readlines" Method. [Bad Idea]

复制代码
#NO. readlines() is really bad for large files.
#Memory Error.
def main():
    for line in open(r"./CentOS-6.5-i386.iso", "rb").readlines():
        print(line, end="")
    
if __name__ == "__main__":
    main()
复制代码

4: "fileinput" Method.

复制代码
import fileinput

def main():    
    for line in fileinput.input(files=r"./CentOS-6.5-i386.iso", mode="rb"):
        print(line, end="")

if __name__ == "__main__":
    main()
复制代码

5: "Generator" Method.

复制代码
def readFile():
    with open(r"./CentOS-6.5-i386.iso", "rb") as f:    
        for line in f:
            yield line

def main():
    for line in readFile():
        print(line, end="")

if __name__ == "__main__":
    main()
复制代码

The methods above, all work well for small files, but not always for large files(readlines Method). The readlines() function loads the entire file into memory as it runs. 

When I run the readlines Method, I got the following error message:

 When using the readlines Method, the Percentage of Used CPU and Used Memory rises rapidly(in the following figure). And when the percentage of Used Memory reaches over 50%, I got the "MemoryError" in Python.

The other methods (Normal Method, With Method, fileinput Method, Generator Method) works well for large files. And when using these methods, the workload for CPU and memory which is shown in the following figure does not get a distinct rise.

By the way, I recommend the generator method, because it shows clearly that you have taken the file size into account.

 

Reference:

How to read large file, line by line in python


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值