defaultdict python,根据键,使用defaultdict python进行聚合

博客内容讲述了如何处理一个包含球队和球员名字的数据集,通过使用Python和collections库,将球员名字根据团队和年份进行去重和聚合。最终目标是创建一个按团队和年份组合的球员名字列表,并能比较不同年份同一团队的球员名单差异。
摘要由CSDN通过智能技术生成

I have a bunch of lines in text with names and teams in this format:

Team (year)|Surname1, Name1

e.g.

Yankees (1993)|Abbot, Jim

Yankees (1994)|Abbot, Jim

Yankees (1993)|Assenmacher, Paul

Yankees (2000)|Buddies, Mike

Yankees (2000)|Canseco, Jose

and so on for several years and several teams.

I would like to aggregate names of players according to team (year) combination deleting any duplicated names (it may happen that in the original database there is some redundant information). In the example, my output should be:

Yankees (1993)|Abbot, Jim|Assenmacher, Paul

Yankees (1994)|Abbot, Jim

Yankees (2000)|Buddies, Mike|Canseco, Jose

I've written this code so far:

file_in = open('filein.txt')

file_out = open('fileout.txt', 'w+')

from collections import defaultdict

teams = defaultdict(set)

for line in file_in:

items = [entry.strip() for entry in line.split('|') if entry]

team = items[0]

name = items[1]

teams[team].add(name)

I end up with a big dictionary made up by keys (the name of the team and the year) and sets of values. But I don't know exactly how to go on to aggregate things up.

I would also be able to compare my final sets of values (e.g. how many players have Yankee's team of 1993 and 1994 in common?). How can I do this?

Any help is appreciated

解决方案

You can use a tuple as a key here, for eg. ('Yankees', '1994'):

from collections import defaultdict

dic = defaultdict(list)

with open('abc') as f:

for line in f:

key,val = line.split('|')

keys = tuple(x.strip('()') for x in key.split())

vals = [x.strip() for x in val.split(', ')]

dic[keys].append(vals)

print dic

for k,v in dic.iteritems():

print "{}({})|{}".format(k[0],k[1],"|".join([", ".join(x) for x in v]))

Output:

defaultdict(,

{('Yankees', '1994'): [['Abbot', 'Jim']],

('Yankees', '2000'): [['Buddies', 'Mike'], ['Canseco', 'Jose']],

('Yankees', '1993'): [['Abbot', 'Jim'], ['Assenmacher', 'Paul']]})

Yankees(1994)|Abbot, Jim

Yankees(2000)|Buddies, Mike|Canseco, Jose

Yankees(1993)|Abbot, Jim|Assenmacher, Paul

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值