python读取csv指定列,使用python读取csv中的特定列

该博客介绍了如何在Python中使用`csv`模块从大型CSV文件中仅读取指定列(如'AAA','DDD','FFF','GGG'),并跳过表头,将数据转换为元组列表。提供的解决方案利用了`namedtuple`创建自定义数据结构,并通过索引映射来提取所需列。示例代码展示了如何实现这一功能,并给出了输出结果。
摘要由CSDN通过智能技术生成

I have a csv file that look like this:

+-----+-----+-----+-----+-----+-----+-----+-----+

| AAA | bbb | ccc | DDD | eee | FFF | GGG | hhh |

+-----+-----+-----+-----+-----+-----+-----+-----+

| 1 | 2 | 3 | 4 | 50 | 3 | 20 | 4 |

| 2 | 1 | 3 | 5 | 24 | 2 | 23 | 5 |

| 4 | 1 | 3 | 6 | 34 | 1 | 22 | 5 |

| 2 | 1 | 3 | 5 | 24 | 2 | 23 | 5 |

| 2 | 1 | 3 | 5 | 24 | 2 | 23 | 5 |

+-----+-----+-----+-----+-----+-----+-----+-----+

...

How can I only read the columns "AAA,DDD,FFF,GGG" in python and skip the headers?

The output I want is a list of tuples that looks like this:

[(1,4,3,20),(2,5,2,23),(4,6,1,22)]. I'm thinking to write these data to a SQLdatabase later.

I referred to this post:Read specific columns from a csv file with csv module?.

But I don't think it is helpful in my case. Since my .csv is pretty big with whole bunch of columns, I hope I can tell python the column names I want, so python can read the specific columns row by row for me.

解决方案def read_csv(file, columns, type_name="Row"):

try:

row_type = namedtuple(type_name, columns)

except ValueError:

row_type = tuple

rows = iter(csv.reader(file))

header = rows.next()

mapping = [header.index(x) for x in columns]

for row in rows:

row = row_type(*[row[i] for i in mapping])

yield row

Example:

>>> import csv

>>> from collections import namedtuple

>>> from StringIO import StringIO

>>> def read_csv(file, columns, type_name="Row"):

... try:

... row_type = namedtuple(type_name, columns)

... except ValueError:

... row_type = tuple

... rows = iter(csv.reader(file))

... header = rows.next()

... mapping = [header.index(x) for x in columns]

... for row in rows:

... row = row_type(*[row[i] for i in mapping])

... yield row

...

>>> testdata = """\

... AAA,bbb,ccc,DDD,eee,FFF,GGG,hhh

... 1,2,3,4,50,3,20,4

... 2,1,3,5,24,2,23,5

... 4,1,3,6,34,1,22,5

... 2,1,3,5,24,2,23,5

... 2,1,3,5,24,2,23,5

... """

>>> testfile = StringIO(testdata)

>>> for row in read_csv(testfile, "AAA GGG DDD".split()):

... print row

...

Row(AAA='1', GGG='20', DDD='4')

Row(AAA='2', GGG='23', DDD='5')

Row(AAA='4', GGG='22', DDD='6')

Row(AAA='2', GGG='23', DDD='5')

Row(AAA='2', GGG='23', DDD='5')

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值