python excel按列名选取多列_Python-按多列分组并获得最大或总和

I am interested in getting max value of the Product price.

Here is the input data.Download Sales Data

I want to group by State,Country

How do I go about grouping these two columns get the max value of Price.

import csv

import locale

from itertools import groupby

locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' )

total_price = 0

max_price = 0

reader = csv.DictReader(open('/Users/myuser/Downloads/SalesData.csv', 'rU'), dialect='excel')

groups = groupby(reader, lambda d: d['State'])

result = [max(g, key=lambda d: d['State']) for k, g in groups]

for row in reader:

print row["State"], row["Country"], locale.atoi(row["Price"])

max_price = max(row.iteritems(), key=operator.itemgetter(1))

total_price += locale.atoi(row["Price"])

With use of panda's. Can I get this without using pandas.

import pandas as pd

from pandas import DataFrame

import locale

locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' )

df = pd.read_csv('/Users/myuser/Downloads/SalesData.csv', index_col=False, header=0,thousands=',')

print df.groupby(["Country","State"]).max()["Price"]

解决方案

itertools.groupby only works on lists sorted using the same key function used for groupby , as given in the documentation -

itertools.groupby(iterable[, key])

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

So for achieving what you want with itertools.groupby , you would most probably need to sort the data first based on both 'Country' and 'State' , and then take groupby on it.

also, when taking max() you should use 'Price' not 'State' . Example -

reader = csv.DictReader(open('/Users/myuser/Downloads/SalesData.csv', 'rU'), dialect='excel')

sortedreader = sorted(reader, key=lambda d: (d['Country'], d['State']))

groups = groupby(sortedreader, key=lambda d: (d['Country'], d['State']))

result = [(k, max(g, key=lambda d: d['Price'])) for k, g in groups]

I added the key to the result , to identify which Country/State each max corresponds to. After this you can iterate over result and print each if that is what you really want.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值