好吧,我试着按照你的准则来做。它只在大文件中迭代一次,不必费心通过csv模块解析这些行,因为您只是在写入期间重新连接它们。在id=("a","b")
start_L=(1,15)
stop_L=(16,40)
i=0
table=open('largefile.tsv',"r")
out1=(("file%s.txt")%(id[i]))
temp_out=open(out1,"w")
# start iterating through the file
for line in table:
stop=int(stop_L[i])
# Split the line into a position piece, and a
# throw away variable based upon the 1st tab char
position,the_rest= line.split("\t",1)
# I'm ignoring start as you mentioned it was sorted in the file
if int(position) >= stop :
# Close the current file
temp_out.close()
# Increment index so file name is pulled from id properly
# If the index is past the length of the id list then
# break otherwise open the new file for writing
i += 1
if (i < len(id)):
out1=(("file%s.txt")%(id[i]))
temp_out=open(out1,"w")
else:
break
temp_out.write(line)
我的测试文件行看起来像
^{pr2}$
根据您的具体数据,这可以简化很多,但我希望它至少给您一个开始。在