python读取大文件csv_Python生成器读取大型CSV文件

1586010002-jmsa.png

I need to write a Python generator that yields tuples (X, Y) coming from two different CSV files.

It should receive a batch size on init, read line after line from the two CSVs, yield a tuple (X, Y) for each line, where X and Y are arrays (the columns of the CSV files).

I've looked at examples of lazy reading but I'm finding it difficult to convert them for CSVs:

Also, unfortunately Pandas Dataframes are not an option in this case.

Any snippet I can start from?

Thanks

解决方案

You can have a generator, that reads lines from two different csv readers and yield their lines as pairs of arrays. The code for that is:

import csv

import numpy as np

def getData(filename1, filename2):

with open(filename1, "rb") as csv1, open(filename2, "rb") as csv2:

reader1 = csv.reader(csv1)

reader2 = csv.reader(csv2)

for row1, row2 in zip(reader1, reader2):

yield (np.array(row1, dtype=np.float),

np.array(row2, dtype=np.float))

# This will give arrays of floats, for other types change dtype

for tup in getData("file1", "file2"):

print(tup)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值