这完全符合您的要求,但没有Pandas . 它逐行读取intest.csv(而不是将整个文件读入RAM) . 它使用文件系统进行大部分处理,使用一系列最终聚合到outtest.csv文件中的块文件 . 如果更改maxLines,则可以优化生成的块文件数量与消耗的RAM量(更高的数量消耗更多的RAM但产生更少的块文件) . 如果要将CSV标头保留在第一行,请将keepHeader设置为True;如果设置为False,则会反转整个文件,包括第一行 .
对于踢,我在一个旧的Raspberry Pi上使用128GB闪存驱动器在6MB csv测试文件上运行它,我认为出了问题,因为它几乎立即返回,所以即使在较慢的硬件上它也很快 . 它只导入一个标准的python库函数(删除),因此它非常便携 . 此代码的一个优点是它不会重新定位任何文件指针 . 一个限制是它不适用于在数据中有换行符的CSV文件 . 对于该用例,pandas将是读取块的最佳解决方案 .
from os import remove
def writechunk(fileCounter, reverseString):
outFile = 'tmpfile' + str(fileCounter) + '.csv'
with open(outFile, 'w') as outfp:
outfp.write(reverseString)
return
def main():
inFile = 'intest.csv'
outFile = 'outtest.csv'
# This is our chunk expressed in lines
maxLines = 10
# Is there a header line we want to keep at the top of the output file?
keepHeader = True
fileCounter = 0
lineCounter = 0
with open(inFile) as infp:
reverseString = ''
line = infp.readline()
if (line and keepHeader):
headerLine = line
line = infp.readline()
while (line):
lineCounter += 1
reverseString = line + reverseString
if (lineCounter == maxLines):
fileCounter += 1
lineCounter = 0
writechunk(fileCounter, reverseString)
reverseString = ''
line = infp.readline()
# Write any leftovers to a chunk file
if (lineCounter != 0):
fileCounter += 1
writechunk(fileCounter,reverseString)
# Read the chunk files backwards and append each to the outFile
with open(outFile, 'w') as outfp:
if (keepHeader):
outfp.write(headerLine)
while (fileCounter > 0):
chunkFile = 'tmpfile' + str(fileCounter) + '.csv'
with open(chunkFile, 'r') as infp:
outfp.write(infp.read())
remove(chunkFile)
fileCounter -= 1
if __name__ == '__main__':
main()