pandas apply vs agg vs transform

最新推荐文章于 2024-07-10 23:22:15 发布

fanfanyuzhui

最新推荐文章于 2024-07-10 23:22:15 发布

阅读量3.1k

点赞数

分类专栏： python 数据分析文章标签： pandas

本文链接：https://blog.csdn.net/fanfanyuzhui/article/details/78503608

版权

数据分析同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

python

4 篇文章 0 订阅

订阅专栏

Data Prepartion

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 1, 2, 2],'B': [1, 2, 3, 4],'C': np.random.randn(4)})

2.先来一波正常聚合操作

1.求和 sum,计数(非去重:size 或者 len 都可以,去重的:pd.Series.nunique ),最大最小:max,min

df.groupby('A',as_index=False).agg({'B':{'B_S':sum,'B_C':size}})

df.groupby('A',as_index=False).agg({'B':{'B_S':sum,'B_C':len}})

df.groupby('A',as_index=False).agg({'B':{'B_S':sum,'B_C':pd.Series.nunique}})


----------
   A    B     
      B_C  B_S
0  1    2    3
1  2    2    7
存在mutilevel,需要自己重命名

df.columns=['A','B_C','B_S']
   A  B_C  B_C
0  1  2  3
1  2  2  7

3.apply 和 agg 有什么区别呢，agg 调用的时候要指定字段，apply 默认传入的是整个dataframe

df.groupby('A',as_index=False).agg({'B':{'B_S':sum}})
等价于
df.groupby('A',as_index=False).apply(lambda x:sum(x['B'])).reset_index()

function 1
def add(df):
    a=[','.join(map(str,df[i])) for i in df.columns]
    return pd.DataFrame(df[['C','B']])
df.groupby('A',as_index=False).apply(add)

df
          C
0  0.826834
1  1.121229
2  0.428046
3 -1.669947

function 2
def add(df):
    a=[','.join(map(str,df[i])) for i in df.columns]
    return pd.DataFrame({'A':a})

df.groupby('A',as_index=False).apply(add)

0                           1,1
1                           1,2
2  0.826834174612,1.12122858358
0                           2,2
1                           3,4
2  0.42804565743,-1.66994731973

不存在聚合操作的apply,会按照原先的DateFrame 输出,存在聚合操作的apply index表示所在分区的自增index.

4.transform 是针对输入的元素级别转换

df.groupby('A',as_index=False).transform(sum)

df
     B         C
0  3.0 -2.186667
1  3.0 -2.186667
2  7.0 -0.348783
3  7.0 -0.348783

fanfanyuzhui

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录