python 合并数据集_分块，处理和在Pandas / Python中合并数据集

最新推荐文章于 2020-11-29 20:33:51 发布

weixin_39616003

最新推荐文章于 2020-11-29 20:33:51 发布

阅读量126

点赞数

文章标签： python 合并数据集

There is a large dataset, containing a strings.

I just want to open it via read_fwf using widths, like this:

widths = [3, 7, ..., 9, 7]

tp = pandas.read_fwf(file, widths=widths, header=None)

It would help me to mark the data,

But the system crashes (works with nrows=20000). Then I decided to do it by chunk (e.g. 20000 rows), like this:

cs = 20000

for chunk in pd.read_fwf(file, widths=widths, header=None, chunksize=ch)

...:

My question is: what should I do in a loop to merge (concatenate?) the chunks back in a .csv file after some processing of chunk (marking the row, dropping or modyfiing the column)? Or there is another way?

解决方案

I'm going to assume that since reading the entire file

tp = pandas.read_fwf(file, widths=widths, header=None)

fails but reading in chunks works, that the file is too big to be read at once and that you encountered a MemoryError.

In that case, if you can process the data in chunks, then to concatenate the results in a CSV, you could use chunk.to_csv to write the CSV in chunks:

filename = ...

for chunk in pd.read_fwf(file, widths=widths, header=None, chunksize=ch)

# process the chunk

chunk.to_csv(filename, mode='a')

Note that mode='a' opens the file in append mode, so that the output of each

chunk.to_csv call is appended to the same file.

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_39616003

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 合并数据集_分块，处理和在Pandas / Python中合并数据集

There is a large dataset, containing a strings.I just want to open it via read_fwf using widths, like this:widths = [3, 7, ..., 9, 7]tp = pandas.read_fwf(file, widths=widths, header=None)It would help...
复制链接

扫一扫