python大文本文件处理_python – 逐行处理非常大(> 20GB)的文本文件

写这样的代码比较惯用

def ProcessLargeTextFile():

with open("filepath", "r") as r, open("outfilepath", "w") as w:

for line in r:

x, y, z = line.split(' ')[:3]

w.write(line.replace(x,x[:-3]).replace(y,y[:-3]).replace(z,z[:-3]))

这里的主要保存是只做一次拆分,但是如果CPU没有被征税,这很可能会有很大差异

它可能有助于一次节省几千行,并将其写入一个命中以减少您的硬盘驱动器的抖动。一百万行只有54MB的RAM!

def ProcessLargeTextFile():

bunchsize = 1000000 # Experiment with different sizes

bunch = []

with open("filepath", "r") as r, open("outfilepath", "w") as w:

for line in r:

x, y, z = line.split(' ')[:3]

bunch.append(line.replace(x,x[:-3]).replace(y,y[:-3]).replace(z,z[:-3]))

if len(bunch) == bunchsize:

w.writelines(bunch)

bunch = []

w.writelines(bunch)

建议@Janne,一种生成线条的替代方法

def ProcessLargeTextFile():

bunchsize = 1000000 # Experiment with different sizes

bunch = []

with open("filepath", "r") as r, open("outfilepath", "w") as w:

for line in r:

x, y, z, rest = line.split(' ', 3)

bunch.append(' '.join((x[:-3], y[:-3], z[:-3], rest)))

if len(bunch) == bunchsize:

w.writelines(bunch)

bunch = []

w.writelines(bunch)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值