pandas高阶--透视表介绍

 

 

 

In [36]:

 

# 创建dataframe
d = {
    'Name':['Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine',
            'Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine'],
    'Semester':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1',
            'Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],
 
    'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science',
              'Mathematics','Mathematics','Mathematics','Science','Science','Science'],
    'Score':[62,47,55,74,31,77,85,63,42,67,89,81]}
df = pd.DataFrame(d)
df

Out[36]:

 NameSemesterSubjectScore
0AlisaSemester 1Mathematics62
1BobbySemester 1Mathematics47
2CathrineSemester 1Mathematics55
3AlisaSemester 1Science74
4BobbySemester 1Science31
5CathrineSemester 1Science77
6AlisaSemester 2Mathematics85
7BobbySemester 2Mathematics63
8CathrineSemester 2Mathematics42
9AlisaSemester 2Science67
10BobbySemester 2Science89
11CathrineSemester 2Science81

层级分组

In [37]:

 

 
df.groupby(by=['Semester','Subject'])['Score'].mean().to_frame()

Out[37]:

  Score
SemesterSubject 
Semester 1Mathematics54.666667
Science60.666667
Semester 2Mathematics63.333333
Science79.000000

pivot_table

In [38]:

 

 
df.pivot_table(values='Score',index='Semester',columns='Subject') # 数据透视表

Out[38]:

SubjectMathematicsScience
Semester  
Semester 154.66666760.666667
Semester 263.33333379.000000

margin参数

In [39]:

 

 
df.pivot_table(values='Score',index='Semester',columns='Subject',margins=True) # 求和

Out[39]:

SubjectMathematicsScienceAll
Semester   
Semester 154.66666760.66666757.666667
Semester 263.33333379.00000071.166667
All59.00000069.83333364.416667

多个聚合操作

In [40]:

 

 
df.pivot_table(values='Score',index='Semester',columns='Subject',aggfunc=['mean','max','min'])

Out[40]:

 meanmaxmin
SubjectMathematicsScienceMathematicsScienceMathematicsScience
Semester      
Semester 154.66666760.66666762774731
Semester 263.33333379.00000085894267

多个聚合操作

In [42]:

 

 
df.pivot_table(values='Score',index='Semester',columns='Subject',aggfunc=['mean','max','min'])

Out[42]:

 meanmaxmin
SubjectMathematicsScienceMathematicsScienceMathematicsScience
Semester      
Semester 154.66666760.66666762774731
Semester 263.33333379.00000085894267

分析车辆数据集

In [44]:

 

 
# 加载数据
cars_df = pd.read_csv(r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第二节-数据分析工具pandas高阶\3_lesson_06\lesson_06\examples\datasets\cars.csv')
cars_df.head()

Out[44]:

 YEARMakeModelSize(kW)Unnamed: 5TYPECITY (kWh/100 km)HWY (kWh/100 km)COMB (kWh/100 km)CITY (Le/100 km)HWY (Le/100 km)COMB (Le/100 km)(g/km)RATING(km)TIME (h)
02012MITSUBISHIi-MiEVSUBCOMPACT49A1B16.921.418.71.92.42.10NaN1007
12012NISSANLEAFMID-SIZE80A1B19.323.021.12.22.62.40NaN1177
22013FORDFOCUS ELECTRICCOMPACT107A1B19.021.120.02.12.42.20NaN1224
32013MITSUBISHIi-MiEVSUBCOMPACT49A1B16.921.418.71.92.42.10NaN1007
42013NISSANLEAFMID-SIZE80A1B19.323.021.12.22.62.40NaN1177

In [46]:

 

# 比较不同年份的不同厂商的车,在电池方面的不同
cars_df.pivot_table(values='(kW)',index='YEAR',columns='Make')

Out[46]:

MakeBMWCHEVROLETFORDKIAMITSUBISHINISSANSMARTTESLA
YEAR        
2012NaNNaNNaNNaN49.080.0NaNNaN
2013NaNNaN107.0NaN49.080.035.0280.000000
2014NaN104.0107.0NaN49.080.035.0268.333333
2015125.0104.0107.081.049.080.035.0320.666667
2016125.0104.0107.081.049.080.035.0409.700000

In [ ]:

 

 

 

 

 

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值