pandas——数据透视表

数据透视表

import pandas as pd
example = pd.DataFrame({'Amount': [74., 235., 175., 100., 115., 245., 180., 90., 88., 129., 273., 300.],
              'Category': ['Transportation', 'Grocery', 'Household', 'Entertainment', 
                           'Transportation', 'Grocery', 'Household', 'Entertainment', 
                           'Transportation', 'Grocery', 'Household', 'Entertainment'],
              'Month': ['January', 'January', 'January', 'January', 
                        'February', 'February', 'February', 'February', 
                        'Marth', 'Marth', 'Marth', 'Marth']})
example
AmountCategoryMonth
074.0TransportationJanuary
1235.0GroceryJanuary
2175.0HouseholdJanuary
3100.0EntertainmentJanuary
4115.0TransportationFebruary
5245.0GroceryFebruary
6180.0HouseholdFebruary
790.0EntertainmentFebruary
888.0TransportationMarth
9129.0GroceryMarth
10273.0HouseholdMarth
11300.0EntertainmentMarth
上面这个数据表感觉非常的杂乱,数据透视的意思就是按照我们的意思重新组织这张数据表
new_example = example.pivot(index='Category', columns='Month', values='Amount')    
index 表示以原先数据的哪个属性的值为行坐标进行分类 columns 表示以原先数据的哪个属性的值为新的属性
values 表示以原先数据的哪个属性的值填充前面两个 index 和 columns 定义出的新表格
new_example
MonthFebruaryJanuaryMarth
Category
Entertainment90.0100.0300.0
Grocery245.0235.0129.0
Household180.0175.0273.0
Transportation115.074.088.0
new_example.sum(axis=1)
Category
Entertainment     490.0
Grocery           609.0
Household         628.0
Transportation    277.0
dtype: float64
new_example.sum(axis=0)
Month
February    630.0
January     584.0
Marth       790.0
dtype: float64
经过重新组织后,数据表的呈现方式更有意义
df = pd.read_csv('../../datasets/titanic/test.csv')
df.head(5)
PassengerIdPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
08923Kelly, Mr. Jamesmale34.5003309117.8292NaNQ
18933Wilkes, Mrs. James (Ellen Needs)female47.0103632727.0000NaNS
28942Myles, Mr. Thomas Francismale62.0002402769.6875NaNQ
38953Wirz, Mr. Albertmale27.0003151548.6625NaNS
48963Hirvonen, Mrs. Alexander (Helga E Lindqvist)female22.011310129812.2875NaNS
需求: 统计男女在不同船舱等级的票价
df.pivot_table(index='Sex', columns='Pclass', values='Fare')   # 默认就是求平均值
Pclass123
Sex
female115.59116826.43875013.735129
male75.58655120.18465411.826350
df.pivot_table(index='Sex', columns='Pclass', values='Fare', aggfunc='min')  # 求最小值
Pclass123
Sex
female25.710.50006.9500
male0.09.68753.1708
需求: 统计不同船舱等级里男女的平均年龄
df.pivot_table(index='Pclass', columns='Sex', values='Age')
Sexfemalemale
Pclass
141.33333340.520000
224.37655230.940678
323.07340024.525104
  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值