python将两个excel匹配_希望通过ID将两个Excel文件合并为一个使用Python 2.7的Excel文件...

I am new to the Python family and have been trying to solve merge two Excel files for days. I have researched merging endlessly and tried to adapt my code to fit my needs, but it hasn't been working. I was wondering if I could get any help of why my code isn't working. I feel that this could be a common problem to others using Python, so hopefully this will help out others as well. I appreciate any comments!

I have two excel files, 'Chinese Scores3.csv' and 'Chinese Scores4.csv' which I am trying to merge by an ID, which is unique to each company. Other than the company ID, there are no matching columns for each excel file. Also, not all companies are listed on both files. Some are listed both, but others are listed on either one or the other. I would like to attach all the information for a company ID together in one row on an excel sheet. i.e. the first excel file columns are ID, JanSales, FebSales, etc. and the second excel file columns are ID, CreditScore, EMMAScore, etc. The excel file I would like to create has columns: ID, JanSales, FebSales, CreditScore, EMMAScore

all according to company ID.

Is this making sense? It's like using VLOOKUP in excel, but I would like to do this using Python. Anyway, here is my coding, which isn't working. I try manipulating it, but it isn't working. I hope to get feedback!

import sys

import csv

def main(arg):

headers= []

for arg in 'Chinese Scores3.csv':

with open(arg) as f:

curr = 'Chinese Scores3.csv'.reader(f).next()

headers.append(curr)

try:

keys=list( set(keys) & set (curr))

except NameError:

keys = curr

header = list(keys)

for h in headers:

header += [ k for k in h if k not in keys ]

data = {}

for arg in 'Chinese Scores4.csv':

with open(arg) as f:

reader = 'Chinese Scores4.csv'.DictReader(f)

for line in reader:

data_key = tuple([ line[k] for k in keys ])

if not data_key in data: data[data_key] = {}

for k in header:

try:

data[data_key][k] = line[k]

except KeyError:

pass

for key in data.keys():

for col in header:

if key in data and not col in data[key]:

del( data[key] )

print ','.join(header)

for key in sorted(data):

row = [ data[key][col] for col in header ]

print ','.join(row)

if __name__ == '__main__':

sys.exit( main( sys.argv[1:]) )

解决方案

While we could fix your code, I'd strongly recommend looking into the pandas library if you're going to be doing this sort of work instead. It makes life a lot easier, and often borderline trivial.

For example, if we had two csv files (although we could have started straight from Excel files if we wanted):

>>> !cat scores3.csv

ID,JanSales,FebSales

1,100,200

2,200,500

3,300,400

>>> !cat scores4.csv

ID,CreditScore,EMMAScore

2,good,Watson

3,okay,Thompson

4,not-so-good,NA

We could read these into objects called DataFrames (think of them sort of like Excel sheets):

>>> import pandas as pd

>>> s3 = pd.read_csv("scores3.csv")

>>> s4 = pd.read_csv("scores4.csv")

>>> s3

ID JanSales FebSales

0 1 100 200

1 2 200 500

2 3 300 400

>>> s4

ID CreditScore EMMAScore

0 2 good Watson

1 3 okay Thompson

2 4 not-so-good NaN

And then we can merge them on the ID column:

>>> merged = s3.merge(s4, on="ID", how="outer")

>>> merged

ID JanSales FebSales CreditScore EMMAScore

0 1 100 200 NaN NaN

1 2 200 500 good Watson

2 3 300 400 okay Thompson

3 4 NaN NaN not-so-good NaN

After which we could save it to a csv file or to an Excel file:

>>> merged.to_csv("merged.csv")

>>> merged.to_excel("merged.xlsx")

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值