There is a large dataset, containing a strings.
I just want to open it via read_fwf using widths, like this:
widths = [3, 7, ..., 9, 7]
tp = pandas.read_fwf(file, widths=widths, header=None)
It would help me to mark the data,
But the system crashes (works with nrows=20000). Then I decided to do it by chunk (e.g. 20000 rows), like this:
cs = 20000
for chunk in pd.read_fwf(file, widths=widths, header=None, chunksize=ch)
...:
My question is: what should I do in a loop to merge (concatenate?) the chunks back in a .csv file after some processing of chunk (marking the row, dropping or modyfiing the column)? Or there is another way?
解决方案
I'm going to assume that since reading the entire file
tp = pandas.read_fwf(file, widths=widths, header=None)
fails but reading in chunks works, that the file is too big to be read at once and that you encountered a MemoryError.
In that case, if you can process the data in chunks, then to concatenate the results in a CSV, you could use chunk.to_csv to write the CSV in chunks:
filename = ...
for chunk in pd.read_fwf(file, widths=widths, header=None, chunksize=ch)
# process the chunk
chunk.to_csv(filename, mode='a')
Note that mode='a' opens the file in append mode, so that the output of each
chunk.to_csv call is appended to the same file.