python切割txt文件_使用Python将大文本文件分割成较小的文本文件

1586010002-jmsa.png

I have a text file say really_big_file.txt that contains:

line 1

line 2

line 3

line 4

...

line 99999

line 100000

I would like to write a Python script that divides really_big_file.txt into smaller files with 300 lines each. For example, small_file_300.txt to have lines 1-300, small_file_600 to have lines 301-600, and so on until there are enough small files made to contain all the lines from the big file.

I would appreciate any suggestions on the easiest way to accomplish this using Python

解决方案from itertools import izip_longest

def grouper(n, iterable, fillvalue=None):

"Collect data into fixed-length chunks or blocks"

# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx

args = [iter(iterable)] * n

return izip_longest(fillvalue=fillvalue, *args)

n = 300

with open('really_big_file.txt') as f:

for i, g in enumerate(grouper(n, f, fillvalue=''), 1):

with open('small_file_{0}'.format(i * n), 'w') as fout:

fout.writelines(g)

The advantage of this method as opposed to storing each line in a list, is that it works with iterables, line by line, so it doesn't have to store each small_file into memory at once.

Note that the last file in this case will be small_file_100200 but will only go until line 100000. This happens because fillvalue='', meaning I write out nothing to the file when I don't have any more lines left to write because a group size doesn't divide equally. You can fix this by writing to a temp file and then renaming it after instead of naming it first like I have. Here's how that can be done.

import os, tempfile

with open('really_big_file.txt') as f:

for i, g in enumerate(grouper(n, f, fillvalue=None)):

with tempfile.NamedTemporaryFile('w', delete=False) as fout:

for j, line in enumerate(g, 1): # count number of lines in group

if line is None:

j -= 1 # don't count this line

break

fout.write(line)

os.rename(fout.name, 'small_file_{0}.txt'.format(i * n + j))

This time the fillvalue=None and I go through each line checking for None, when it occurs, I know the process has finished so I subtract 1 from j to not count the filler and then write the file.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值