Pandas DataFrame GroupBy.Apply

最新推荐文章于 2025-03-13 21:09:30 发布

Claroja

最新推荐文章于 2025-03-13 21:09:30 发布

阅读量1.7k

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/claroja/article/details/106660803

版权

Python 专栏收录该内容

398 篇文章

订阅专栏

本文深入探讨了Pandas库中GroupBy对象的apply方法，解释了如何使用此方法对数据进行分组处理，包括排序和排名计算。通过具体实例展示了如何对不同性别年龄数据进行排序，并介绍了如何通过apply方法实现数据的内插填充。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.GroupBy.apply.html?highlight=apply#pandas.core.groupby.GroupBy.apply

GroupBy.apply(self, func, *args, **kwargs)
对分组进行操作,并将各分组处理结果合并成一个数据框

GroupBy.apply(self, func, *args, **kwargs)

参数	描述
func	callable,可执行的函数,第一个参数是groupby的每个分组的数据框
args, kwargs	tuple and dict,该函数的其他参数
return	groupby之前的数据框

分组,然后排序

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'age':[15,16,14,13,17,16],
    'gender':["man","woman","man","man","woman","man"]
})


df.groupby('gender').apply(lambda x: x.sort_values('age')).reset_index(drop=True)


def rank(x):
    x['rank'] = x['age'].rank(method = 'first',ascending=False)
    x = x.sort_values('age')
    x = x.set_index('age',drop=False)
    x = x.reindex([13,14,15,16,17])
    x['age'] = x['age'].interpolate(method='linear',limit_direction='both')
    return x

df.groupby('gender').apply(rank)
df.loc["man"]


se = pd.Series([5,np.NaN,np.NaN,2,3,4])

se.interpolate(limit_direction='both',method = 'linear')