python分析csv文件一节值 不同的个数_如何将两个csv文件与公共列值组合,但两个文件的行数不同...

对于基于一个或多个公共列合并多个文件(甚至是>2),python中最有效的方法之一是使用“brewery”。您甚至可以指定需要考虑合并哪些字段以及需要保存哪些字段。import brewery

from brewery

import ds

import sys

sources = [

{"file": "grants_2008.csv",

"fields": ["receiver", "amount", "date"]},

{"file": "grants_2009.csv",

"fields": ["id", "receiver", "amount", "contract_number", "date"]},

{"file": "grants_2010.csv",

"fields": ["receiver", "subject", "requested_amount", "amount", "date"]}

]

创建所有字段的列表并添加文件名以存储有关数据记录来源的信息。浏览源定义并收集字段:for source in sources:

for field in source["fields"]:

if field not in all_fields:

out = ds.CSVDataTarget("merged.csv")

out.fields = brewery.FieldList(all_fields)

out.initialize()

for source in sources:

path = source["file"]

# Initialize data source: skip reading of headers

# use XLSDataSource for XLS files

# We ignore the fields in the header, because we have set-up fields

# previously. We need to skip the header row.

src = ds.CSVDataSource(path,read_header=False,skip_rows=1)

src.fields = ds.FieldList(source["fields"])

src.initialize()

for record in src.records():

# Add file reference into ouput - to know where the row comes from

record["file"] = path

out.append(record)

# Close the source stream

src.finalize()

cat merged.csv | brewery pipe pretty_printer

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值