pandas——数据透视表

最新推荐文章于 2024-05-08 10:08:59 发布

One_T.

最新推荐文章于 2024-05-08 10:08:59 发布

阅读量1.5k

点赞数

分类专栏：数据分析文章标签： python 数据分析深度学习

本文链接：https://blog.csdn.net/D_Low/article/details/123763284

版权

数据分析专栏收录该内容

15 篇文章 0 订阅

订阅专栏

数据透视表

import pandas as pd

example = pd.DataFrame({'Amount': [74., 235., 175., 100., 115., 245., 180., 90., 88., 129., 273., 300.],
              'Category': ['Transportation', 'Grocery', 'Household', 'Entertainment', 
                           'Transportation', 'Grocery', 'Household', 'Entertainment', 
                           'Transportation', 'Grocery', 'Household', 'Entertainment'],
              'Month': ['January', 'January', 'January', 'January', 
                        'February', 'February', 'February', 'February', 
                        'Marth', 'Marth', 'Marth', 'Marth']})
example

	Amount	Category	Month
0	74.0	Transportation	January
1	235.0	Grocery	January
2	175.0	Household	January
3	100.0	Entertainment	January
4	115.0	Transportation	February
5	245.0	Grocery	February
6	180.0	Household	February
7	90.0	Entertainment	February
8	88.0	Transportation	Marth
9	129.0	Grocery	Marth
10	273.0	Household	Marth
11	300.0	Entertainment	Marth

上面这个数据表感觉非常的杂乱，数据透视的意思就是按照我们的意思重新组织这张数据表

new_example = example.pivot(index='Category', columns='Month', values='Amount')

index 表示以原先数据的哪个属性的值为行坐标进行分类 columns 表示以原先数据的哪个属性的值为新的属性

values 表示以原先数据的哪个属性的值填充前面两个 index 和 columns 定义出的新表格

new_example

Month	February	January	Marth
Category
Entertainment	90.0	100.0	300.0
Grocery	245.0	235.0	129.0
Household	180.0	175.0	273.0
Transportation	115.0	74.0	88.0

new_example.sum(axis=1)

Category
Entertainment     490.0
Grocery           609.0
Household         628.0
Transportation    277.0
dtype: float64

new_example.sum(axis=0)

Month
February    630.0
January     584.0
Marth       790.0
dtype: float64

经过重新组织后，数据表的呈现方式更有意义

df = pd.read_csv('../../datasets/titanic/test.csv')
df.head(5)

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	892	3	Kelly, Mr. James	male	34.5	0	0	330911	7.8292	NaN	Q
1	893	3	Wilkes, Mrs. James (Ellen Needs)	female	47.0	1	0	363272	7.0000	NaN	S
2	894	2	Myles, Mr. Thomas Francis	male	62.0	0	0	240276	9.6875	NaN	Q
3	895	3	Wirz, Mr. Albert	male	27.0	0	0	315154	8.6625	NaN	S
4	896	3	Hirvonen, Mrs. Alexander (Helga E Lindqvist)	female	22.0	1	1	3101298	12.2875	NaN	S

需求：统计男女在不同船舱等级的票价

df.pivot_table(index='Sex', columns='Pclass', values='Fare')   # 默认就是求平均值

Pclass	1	2	3
Sex
female	115.591168	26.438750	13.735129
male	75.586551	20.184654	11.826350

df.pivot_table(index='Sex', columns='Pclass', values='Fare', aggfunc='min')  # 求最小值

Pclass	1	2	3
Sex
female	25.7	10.5000	6.9500
male	0.0	9.6875	3.1708

需求：统计不同船舱等级里男女的平均年龄

df.pivot_table(index='Pclass', columns='Sex', values='Age')

Sex	female	male
Pclass
1	41.333333	40.520000
2	24.376552	30.940678
3	23.073400	24.525104

One_T.

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
pandas——数据透视表

数据透视表import pandas as pdexample = pd.DataFrame({'Amount': [74., 235., 175., 100., 115., 245., 180., 90., 88., 129., 273., 300.], 'Category': ['Transportation', 'Grocery', 'Household', 'Entertainment', 'Transport
复制链接

扫一扫