如何在 pandas groupby 中通过聚合函数分组多个列

qq^^614136809

于 2024-06-19 16:15:51 发布

阅读量182

点赞数 5

文章标签： pandas

本文链接：https://blog.csdn.net/D0126_/article/details/139806432

版权

在使用 pandas groupby 聚合数据时，有时我们需要考虑除被聚合列之外的其他列中的值。例如，我们可能希望根据另一列中的最大值来聚合某列。
在这里插入图片描述

以下是一个例子：

DateList = np.array( [(datetime.date.today() - datetime.timedelta(7)) + datetime.timedelta(days = x) for x in [1, 2, 2, 3, 4, 4, 5]] + \
[(datetime.date.today() - datetime.timedelta(7)) + datetime.timedelta(days = x) for x in [1, 1, 2, 3, 4, 5, 5]]
Names = np.array(['Joe' for x in xrange(7)] + ['John' for x in xrange(7)])
Product = np.array(['Product1', 'Product1', 'Product2', 'Product2', 'Product2', 'Product3', 'Product3', \
                    'Product1', 'Product2', 'Product2', 'Product2', 'Product2', 'Product2', 'Product3'])
Volume = np.array([100, 0, 150, 175, 15, 120, 150, 75, 0, 115, 130, 135, 10, 120])
Prices = {'Product1' : 25.99, 'Product2': 13.99, 'Product3': 8.99}
SalesDF = DataFrame({'Date' : DateLists, 'Seller' : Names, 'Product' : Product, 'Volume' : Volume})
SalesDF.sort(['Date', 'Seller'], inplace = True)
SalesDF['Prices'] = SalesDF.Product.map(Prices)

在这个例子中，我们有一个包含销售数据的数据框。我们希望根据日期和销售员对数据进行分组，并根据每组中的最大销量来聚合产品和价格。

2、解决方案

一种解决方法是使用自定义聚合函数。自定义聚合函数可以访问除被聚合列之外的其他列中的值。

以下是一个自定义聚合函数的示例：

def AggFunc(x, df, col1):
    #Create list of index values that index the data in the column passed as x
    IndexVals = list(x.index)

    #Use those index values to create a list of the values of col1 in those index positions in the underlying data frame. 
    ColList = list(df[col1][IndexVals])

    # Find the max value of the list of values of col1
    MaxVal = np.max(ColList)

    # Find the index value of the max value of the list of values of col1
    MaxValIndex = ColList.index(MaxVal)

    #Return the data point in the list of data passed as column x which correspond to index value of the the max value of the list of col1 data
    return list(x)[MaxValIndex]

这个函数接受三个参数：

x：要聚合的列
df：包含数据的 DataFrame
col1：要考虑的列

该函数首先创建要聚合的列的索引值列表。然后，它使用这些索引值来创建要在考虑的列中聚合的值的列表。接下来，它找到聚合值的列表中的最大值。然后，它找到聚合值的列表中最大值的索引值。最后，它返回聚合列中对应于最大值的索引值的数据点。

我们可以使用这个自定义聚合函数来聚合数据：

FunctionDict = {'Product': lambda x : AggFunc(x, SalesDF, 'Volume'), 'Volume' : 'max',\
'Prices': lambda x : AggFunc(x, SalesDF, 'Volume')}

SalesDF.groupby(['Date', "Seller"], as_index = False).agg(FunctionDict)

这将根据日期和销售员对数据进行分组，并根据每组中的最大销量来聚合产品和价格。

另一种解决方法是使用 pandas 的 idxmax() 函数。idxmax() 函数返回最大值的索引值。

以下是一个使用 idxmax() 函数的示例：

grouped = SalesDF.groupby(["Date", "Seller"])["Volume"]
max_idx = grouped.apply(pd.Series.idxmax)
SalesDF.loc[max_idx]

这将根据日期和销售员对数据进行分组，并根据每组中的最大销量来聚合产品和价格。

qq^^614136809

关注

5
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
如何在 pandas groupby 中通过聚合函数分组多个列

然后，它使用这些索引值来创建要在考虑的列中聚合的值的列表。然后，它找到聚合值的列表中最大值的索引值。在使用 pandas groupby 聚合数据时，有时我们需要考虑除被聚合列之外的其他列中的值。我们希望根据日期和销售员对数据进行分组，并根据每组中的最大销量来聚合产品和价格。自定义聚合函数可以访问除被聚合列之外的其他列中的值。这将根据日期和销售员对数据进行分组，并根据每组中的最大销量来聚合产品和价格。这将根据日期和销售员对数据进行分组，并根据每组中的最大销量来聚合产品和价格。函数返回最大值的索引值。
复制链接

扫一扫