你可以聚合
groupby
含集料
sum
并按
unstack
,最后一个为丢失的类别替换nan
a
通过
fillna
:
df = df.groupby(['name','condition'], sort=False)['data1'].sum().unstack()
df['total'] = df['a'].fillna(df['b'])
print (df)
condition a b total
name
one 7.0 3.0 7.0
two NaN 48.0 48.0
three 39.0 13.0 39.0
对于新的
DataFrame
:
df1 = df.reset_index().rename_axis(None, 1)[['name','total']]
print (df1)
name total
0 one 7.0
1 two 48.0
2 three 39.0
另一个解决方案
apply
:
def f(x):
if (x['condition'] == 'a').any():
return x.loc[x['condition'] == 'a', 'data1'].sum()
else:
return x.loc[x['condition'] == 'b', 'data1'].sum()
df1 = df.groupby('name', sort=False).apply(f).reset_index(name='total')
print (df1)
name total
0 one 7
1 two 48
2 three 39
更好的方法是创建用于聚合筛选的数据帧的序列,然后
combine_first
,但这个解决方案忽略了所有
name
不带的S组
一
或
b
条件:
a = df.loc[df['condition'] == 'a'].groupby('name', sort=False)['data1'].sum()
b = df.loc[df['condition'] == 'b'].groupby('name', sort=False)['data1'].sum()
df = a.combine_first(b).reset_index(name='total')
print (df)
name total
0 one 7.0
1 three 39.0
2 two 48.0