[python] groupby的几种输出格式

高数辅导讲义

已于 2022-11-01 10:13:29 修改

阅读量2.3k

点赞数

分类专栏： python数据处理文章标签： python 数据挖掘 pandas

于 2022-10-26 18:29:57 首次发布

本文链接：https://blog.csdn.net/qq_29662001/article/details/127536573

版权

python数据处理专栏收录该内容

2 篇文章 0 订阅

订阅专栏

[python] groupby的几种输出格式

发现PYTHON groupby使用的过程种，会出现三种不同的输出格式，特写此片文章记录：：
为方便演示，使用以下数据作为案例数据，做groupby(‘code’)[‘data’].rolling(2).min()处理

df = pd.DataFrame({"date":["2021-01-01","2021-01-01","2021-01-02","2021-01-02","2021-01-03","2021-01-03"],
                  "code":["001.sh","002.sh","001.sh","002.sh","001.sh","002.sh"],
                   "report_date":["2020q1",'2020q1',"2020q2",'2020q2',"2020q3",'2020q3'],
                  "data":[1,2,3,4,5,6],
                  "data1":[1,1,1,1,1,1]})
df.loc[:,"date"] = df['date'].astype('datetime64')

第一种：输出为Series格式

df.groupby('code')['data'].rolling(2).min()

输出为：格式为Series格式
在这里插入图片描述
第二种：输出为dataframe格式
如果我想要输出的数据有索引，用如下写法
(貌似只有分组后得数据得索引一致才会出现以下情况，如下图，每个code的索引一致)

def test2(x):
    data = x.copy()
    return data.min()
    
def test1(x,factor):
#     data = x[['report_date',factor]].copy()
#     res = data.rolling(2).apply(test2)
    data = x.copy()
    data.set_index("report_date", inplace=True)
    res = data[factor].rolling(2).apply(test2)
    return res  
df.groupby('code').apply(test1,'data')

结果为
在这里插入图片描述
第三种：写为另一种写法：
输出为Series格式，但是带了索引

def test2(x):
    data = x.copy()
    return data.min()
    
def test1(x,factor):
    data = x.copy()
    data = x[['report_date',factor]].copy()
    data.set_index("report_date", inplace=True)
    res = data.rolling(2).apply(test2)
    
#     data.set_index("report_date", inplace=True)
#     res = data[factor].rolling(2).apply(test2)
    return res  
df.groupby('code').apply(test1,'data')