Pandas高阶--第一节 层级索引、分组与聚合介绍、GroupBy对象及常用的聚合操作、自定义分组及聚合操作

 

 

 

 

实验:

 

第6课 数据分析工具

第一节 层级索引

In [30]:

 

import pandas as pd
import numpy as np

In [2]:

 

# 文件路径
filepath = r'C:\Users\ML Learning\Projects\第四章-数据分析预习内容\第四章-数据分析预习内容\第二节-数据分析工具pandas高阶\3_lesson_06\lesson_06\examples\datasets\2016_happiness.csv'

读取文件

In [3]:

 

 
data = pd.read_csv(filepath,usecols=['Country','Region','Happiness Rank','Happiness Score'])

In [7]:

 

# 数据预览
data.head()

Out[7]:

  Happiness RankHappiness Score
RegionCountry  
Western EuropeDenmark17.526
Switzerland27.509
Iceland37.501
Norway47.498
Finland57.413

设置多个索引列

In [5]:

 

 
data.set_index(['Region','Country'],inplace=True)

In [9]:

 

data

Out[9]:

  Happiness RankHappiness Score
RegionCountry  
Western EuropeDenmark17.526
Switzerland27.509
Iceland37.501
Norway47.498
Finland57.413
............
Sub-Saharan AfricaBenin1533.484
Southern AsiaAfghanistan1543.360
Sub-Saharan AfricaTogo1553.303
Middle East and Northern AfricaSyria1563.069
Sub-Saharan AfricaBurundi1572.905

157 rows × 2 columns

选取子集

In [11]:

 

 
# 外层选取
data.loc['Australia and New Zealand','New Zealand']

Out[11]:

Happiness Rank     8.000
Happiness Score    7.334
Name: (Australia and New Zealand, New Zealand), dtype: float64

交换层级顺序

In [12]:

 

 
data.swaplevel()

Out[12]:

  Happiness RankHappiness Score
CountryRegion  
DenmarkWestern Europe17.526
SwitzerlandWestern Europe27.509
IcelandWestern Europe37.501
NorwayWestern Europe47.498
FinlandWestern Europe57.413
............
BeninSub-Saharan Africa1533.484
AfghanistanSouthern Asia1543.360
TogoSub-Saharan Africa1553.303
SyriaMiddle East and Northern Africa1563.069
BurundiSub-Saharan Africa1572.905

157 rows × 2 columns

层级索引排序

In [14]:

 

data.sort_index()

Out[14]:

  Happiness RankHappiness Score
RegionCountry  
Australia and New ZealandAustralia97.313
New Zealand87.334
Central and Eastern EuropeAlbania1094.655
Armenia1214.360
Azerbaijan815.291
............
Western EuropePortugal945.123
Spain376.361
Sweden107.291
Switzerland27.509
United Kingdom236.725

157 rows × 2 columns

groupby()

In [15]:

 

 
# 按单列分组
obj1 = data.groupby('Region')
print(type(obj1))
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>

In [16]:

 

 
#按多列分组
obj2 = data.groupby(['Region','Country'])
print(type(obj2))
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>

常用的聚合操作

In [17]:

 

 
obj1.mean()

Out[17]:

 Happiness RankHappiness Score
Region  
Australia and New Zealand8.5000007.323500
Central and Eastern Europe78.4482765.370690
Eastern Asia67.1666675.624167
Latin America and Caribbean48.3333336.101750
Middle East and Northern Africa78.1052635.386053
North America9.5000007.254000
Southeastern Asia80.0000005.338889
Southern Asia111.7142864.563286
Sub-Saharan Africa129.6578954.136421
Western Europe29.1904766.685667

In [18]:

 

 
type(obj1.mean())

Out[18]:

pandas.core.frame.DataFrame

In [19]:

 

 
obj1.max()

Out[19]:

 Happiness RankHappiness Score
Region  
Australia and New Zealand97.334
Central and Eastern Europe1296.596
Eastern Asia1016.379
Latin America and Caribbean1367.087
Middle East and Northern Africa1567.267
North America137.404
Southeastern Asia1406.739
Southern Asia1545.196
Sub-Saharan Africa1575.648
Western Europe997.526

In [20]:

 

 
obj1.size()

Out[20]:

Region
Australia and New Zealand           2
Central and Eastern Europe         29
Eastern Asia                        6
Latin America and Caribbean        24
Middle East and Northern Africa    19
North America                       2
Southeastern Asia                   9
Southern Asia                       7
Sub-Saharan Africa                 38
Western Europe                     21
dtype: int64

In [23]:

 

 
obj1.count()  # 按列统计,获取非空值

Out[23]:

 Happiness RankHappiness Score
Region  
Australia and New Zealand22
Central and Eastern Europe2929
Eastern Asia66
Latin America and Caribbean2424
Middle East and Northern Africa1919
North America22
Southeastern Asia99
Southern Asia77
Sub-Saharan Africa3838
Western Europe2121

In [24]:

 

obj2.size()

Out[24]:

Region                      Country       
Australia and New Zealand   Australia         1
                            New Zealand       1
Central and Eastern Europe  Albania           1
                            Armenia           1
                            Azerbaijan        1
                                             ..
Western Europe              Portugal          1
                            Spain             1
                            Sweden            1
                            Switzerland       1
                            United Kingdom    1
Length: 157, dtype: int64

自定义分组

In [25]:

 

 
# 自定义分组
def get_score_group(score):
    if score <= 4:
        score_group = 'low'
    elif score <= 6:
        score_group = 'middle'
    else:
        score_group = 'high'
    return score_group

In [26]:

 

 
# 方法1:传入自定义的函数进行分组按单列分组
data2 = data.set_index('Happiness Score')
data2.groupby(get_score_group).size()

Out[26]:

high      47
low       21
middle    89
dtype: int64

In [27]:

 

# 方法2:人为构造出一个分组列
data['score group'] = data['Happiness Score'].apply(get_score_group)
data.head()

Out[27]:

  Happiness RankHappiness Scorescore group
RegionCountry   
Western EuropeDenmark17.526high
Switzerland27.509high
Iceland37.501high
Norway47.498high
Finland57.413high

In [28]:

 

 
data.groupby('Region').max()

Out[28]:

 Happiness RankHappiness Scorescore group
Region   
Australia and New Zealand97.334high
Central and Eastern Europe1296.596middle
Eastern Asia1016.379middle
Latin America and Caribbean1367.087middle
Middle East and Northern Africa1567.267middle
North America137.404high
Southeastern Asia1406.739middle
Southern Asia1545.196middle
Sub-Saharan Africa1575.648middle
Western Europe997.526middle

In [31]:

 

data.groupby('Region').agg(np.max)

Out[31]:

 Happiness RankHappiness Scorescore group
Region   
Australia and New Zealand97.334high
Central and Eastern Europe1296.596middle
Eastern Asia1016.379middle
Latin America and Caribbean1367.087middle
Middle East and Northern Africa1567.267middle
North America137.404high
Southeastern Asia1406.739middle
Southern Asia1545.196middle
Sub-Saharan Africa1575.648middle
Western Europe997.526middle

In [33]:

 

 
# 传入包含多个函数的列表
data.groupby('Region')['Happiness Score'].agg([np.max,np.min,np.mean])

Out[33]:

 amaxaminmean
Region   
Australia and New Zealand7.3347.3137.323500
Central and Eastern Europe6.5964.2175.370690
Eastern Asia6.3794.9075.624167
Latin America and Caribbean7.0874.0286.101750
Middle East and Northern Africa7.2673.0695.386053
North America7.4047.1047.254000
Southeastern Asia6.7393.9075.338889
Southern Asia5.1963.3604.563286
Sub-Saharan Africa5.6482.9054.136421
Western Europe7.5265.0336.685667

In [34]:

 

 
# 通过字典为每个列指定不同的操作方法
data.groupby('Region').agg({'Happiness Score':np.mean,'Happiness Rank': np.max})

Out[34]:

 Happiness ScoreHappiness Rank
Region  
Australia and New Zealand7.3235009
Central and Eastern Europe5.370690129
Eastern Asia5.624167101
Latin America and Caribbean6.101750136
Middle East and Northern Africa5.386053156
North America7.25400013
Southeastern Asia5.338889140
Southern Asia4.563286154
Sub-Saharan Africa4.136421157
Western Europe6.68566799

In [35]:

 

 
# 传入自定义函数
def max_min_diff(x):
    return x.max() - x.min()
data.groupby('Region')['Happiness Rank'].agg(max_min_diff)

Out[35]:

Region
Australia and New Zealand            1
Central and Eastern Europe         102
Eastern Asia                        67
Latin America and Caribbean        122
Middle East and Northern Africa    145
North America                        7
Southeastern Asia                  118
Southern Asia                       70
Sub-Saharan Africa                  91
Western Europe                      98
Name: Happiness Rank, dtype: int64

In [ ]:

 

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值