pandas算加权平均值_使用Pandas数据框计算加权平均值

该博客介绍了如何在Pandas DataFrame中计算加权平均值,并按特定列进行分组。首先,通过将'dist'列除以每组'dist'的总和得到权重。然后,将这些权重与各列的值相乘,最后对每个组的加权值求和,从而得到每个组的加权平均值。提供的解决方案包括对原始DataFrame进行操作,创建权重列,然后对'cas'和'diff'列进行加权求和并按'ind'分组。
摘要由CSDN通过智能技术生成

I have te following pandas dataframe:

data_df = pd.DataFrame({'ind':['la','p','la','la','p','g','g','la'],

'dist':[10.,5.,7.,8.,7.,2.,5.,3.],

'diff':[0.54,3.2,8.6,7.2,2.1,1.,3.5,4.5],

'cas':[1.,2.,3.,4.,5.,6.,7.,8.]})

that is

cas diff dist ind

0 1 0.54 10 la

1 2 3.20 5 p

2 3 8.60 7 la

3 4 7.20 8 la

4 5 2.10 7 p

5 6 1.00 2 g

6 7 3.50 5 g

7 8 4.50 3 la

I need to compute the weighted average of all the columns where the weights are in the 'dist' column and group the values by 'ind'.

For example for 'ind'='la' and the 'diff' column:

((10*0.54)+(8.60*7)+(7.20*8)+(4.50*3))/(10+7+8+3) = 4.882143

The result I want to obtain is the following

cas diff

ind

g 6.714286 2.785714

la 3.107143 4.882143

p 3.750000 2.558333

which is obtained by multiplying each value of each colums by the corrisponding value in the 'dist' column, sum the results with the same 'ind' and then dividing the result by the sum of all the 'dist' values corrisponding to the same ind.

I thought this would have been an easy task done by the dataframe 'groupby' method, but actually it's kind of tricky.

Can someone please help me?

解决方案

You may obtain within groups normalized weights by using transform:

>>> df['weight'] = df['dist'] / df.groupby('ind')['dist'].transform('sum')

>>> df['weight']

0 0.357143

1 0.416667

2 0.250000

3 0.285714

4 0.583333

5 0.285714

6 0.714286

7 0.107143

Name: weight, dtype: float64

Then, you just need to multiply these weight by the values, and take the sum:

>>> df['wcas'], df['wdiff'] = (df[n] * df['weight'] for n in ('cas', 'diff'))

>>> df.groupby('ind')[['wcas', 'wdiff']].sum()

wcas wdiff

ind

g 6.714286 2.785714

la 3.107143 4.882143

p 3.750000 2.558333

Edit: with in-place mutation:

>>> backup = df.copy() # make a backup copy to mutate in place

>>> cols = df.columns[:2] # cas, diff

>>> df[cols] = df['weight'].values[:, None] * df[cols]

>>> df.groupby('ind')[cols].sum()

cas diff

ind

g 6.714286 2.785714

la 3.107143 4.882143

p 3.750000 2.558333

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值