33【数据的合并和分组聚合】03数据分组聚合

例题

现在我们有一组关于全球星巴克店铺的统计数据,如果我想知道美国的星巴克数量和中国的哪个多,或者我想知道中国每个省份星巴克的数量的情况,那么应该怎么办?

数据来源:

https://www.kaggle.com/starbucks/store-locations/data

在这里插入图片描述

df=df=pd.read_csv("../data/directory.csv",encoding="gbk")  # 读取文件信息
df.info()  # 文件的概要信息
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25600 entries, 0 to 25599
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Brand           25600 non-null  object 品牌 
 1   Store Number    25600 non-null  object 店铺编号
 2   Store Name      25600 non-null  object 店铺名字
 3   Ownership Type  25600 non-null  object 所有权类型
 4   Street Address  25598 non-null  object 街道地址 缺失
 5   City            25585 non-null  object 城市 缺失
 6   State/Province  25600 non-null  object/7   Country         25600 non-null  object 国家
 8   Postcode        24078 non-null  object 邮政编码 缺失
 9   Phone Number    18739 non-null  object 电话号码 缺失
 10  Timezone        25600 non-null  object 时区
 11  Longitude       25599 non-null  float64 经度 缺失
 12  Latitude        25599 non-null  float64 纬度 缺失
dtypes: float64(2), object(11)
memory usage: 2.5+ MB

在这里插入图片描述
在这里插入图片描述

分组和聚合

df.groupby(by=列名)  # 按照指定的列进行分组

在这里插入图片描述
1.可以进行遍历

# 遍历
for i in df.groupby(by="Country"):
    print(i)
    print("*"*100)
('AD',        Brand  Store Number     Store Name Ownership Type     Street Address  \
0  Starbucks  47370-257954  Meritxell, 96       Licensed  Av. Meritxell, 96   

               City State/Province Country Postcode Phone Number  \
0  Andorra la Vella              7      AD    AD500    376818720   

                  Timezone  Longitude  Latitude  
0  GMT+1:00 Europe/Andorra       1.53     42.51  )
****************************************************************************************************
('AE',          Brand  Store Number                      Store Name Ownership Type  \
1    Starbucks  22331-212325                Ajman Drive Thru       Licensed   
2    Starbucks  47089-256771                       Dana Mall       Licensed   
3    Starbucks  22126-218024                      Twofour 54       Licensed   
4    Starbucks  17127-178586                    Al Ain Tower       Licensed   
5    Starbucks  17688-182164        Dalma Mall, Ground Floor       Licensed   
..         ...           ...                             ...            ...   
140  Starbucks   34253-62541                  Bukhatir Tower       Licensed   
141  Starbucks   1359-138434                  Qanat Al Qasba       Licensed   
142  Starbucks   34259-54260                   Sahara Center       Licensed   
143  Starbucks   34217-27108  American University of Sharjah       Licensed   
144  Starbucks  22697-223524                  UAQ Drive Thru       Licensed   

                         Street Address           City State/Province Country  \
1                  1 Street 69, Al Jarf          Ajman             AJ      AE   
2          Sheikh Khalifa Bin Zayed St.          Ajman             AJ      AE   
3                       Al Salam Street      Abu Dhabi             AZ      AE   
4       Khaldiya Area, Abu Dhabi Island      Abu Dhabi             AZ      AE   
5                  Dalma Mall, Mussafah      Abu Dhabi             AZ      AE   
..                                  ...            ...            ...     ...   
140            Sharjah Buheira Corniche        Sharjah             SH      AE   
141                      Al Taawun Road        Sharjah             SH      AE   
142                             Alnahda        Sharjah             SH      AE   
143  Airport Road, Universities Complex        Sharjah             SH      AE   
144                 King Faisal Highway  Umm Al Quwain             UQ      AE   

    Postcode Phone Number              Timezone  Longitude  Latitude  
1        NaN          NaN  GMT+04:00 Asia/Dubai      55.47     25.42  
2        NaN          NaN  GMT+04:00 Asia/Dubai      55.47     25.39  
3        NaN          NaN  GMT+04:00 Asia/Dubai      54.38     24.48  
4        NaN          NaN  GMT+04:00 Asia/Dubai      54.54     24.51  
5        NaN          NaN  GMT+04:00 Asia/Dubai      54.49     24.40  
..       ...          ...                   ...        ...       ...  
140      NaN   06-5560572  GMT+04:00 Asia/Dubai      55.38     25.33  
141      NaN   06-5560318  GMT+04:00 Asia/Dubai      55.38     25.32  
142      NaN   06-5315823  GMT+04:00 Asia/Dubai      55.37     25.30  
143      NaN   06-5585722  GMT+04:00 Asia/Dubai      55.48     25.30  
144      NaN          NaN  GMT+04:00 Asia/Dubai      55.54     25.53  

[144 rows x 13 columns])
****************************************************************************************************
('AR',          Brand  Store Number        Store Name Ownership Type  \
145  Starbucks   1278-139526   Alto Avelleneda       Licensed   
146  Starbucks  16436-159304            Soleil       Licensed   
147  Starbucks  47294-254164           Arevalo       Licensed   
148  Starbucks  25823-198067  Terrazas de Mayo       Licensed   
149  Starbucks  15008-157047        San Isidro       Licensed   
..         ...           ...               ...            ...   
248  Starbucks  47289-254786           Estrada       Licensed   
249  Starbucks  16161-168806            Canada       Licensed   
250  Starbucks  22131-168807   Dinosaurio Mall       Licensed   
251  Starbucks  16290-163049       Nuevocentro       Licensed   
252  Starbucks  16302-163050  Cordoba Shopping       Licensed   

                        Street Address          City State/Province Country  \
145           Avenida Gral. Guemes 897    Avellaneda              B      AR   
146                      Yrigiyen 2647      Boulogne              B      AR   
147               Nicaragua 6045, CABA  Buenos Aires              B      AR   
148     Cruce Ruta 8 y 202, San Miguel  Buenos Aires              B      AR   
149           Belgrano, 263 San Isidro  Buenos Aires              B      AR   
..                                 ...           ...            ...     ...   
248              Estrada 159 , Cordoba       Cordoba              X      AR   
249                      Av. Colon 608       Cordoba              X      AR   
250  Av. Fuerza Aerea Argentina 1700 B       Cordoba              X      AR   
251                 Duarte Quiroz 1400       Cordoba              X      AR   
252              Jose de Goyechea 2851       Cordoba              X      AR   

    Postcode Phone Number                        Timezone  Longitude  Latitude  
145    B1870    4204-3785  GMT-03:00 America/Argentina/Bu     -58.37    -34.68  
146     1609          NaN  GMT-03:00 America/Argentina/Bu     -57.87    -34.88  
147      NaN          NaN  GMT-03:00 America/Argentina/Bu     -58.44    -34.58  
148    C1663          NaN  GMT-03:00 America/Argentina/Bu     -58.70    -34.53  
149    B1642          NaN  GMT-03:00 America/Argentina/Bu     -58.51    -34.47  
..       ...          ...                             ...        ...       ...  
248     5000          NaN  GMT-03:00 America/Argentina/Bu     -64.20    -31.44  
249     5000          NaN  GMT-03:00 America/Argentina/Bu     -64.19    -31.41  
250    C5000          NaN  GMT-03:00 America/Argentina/Bu     -64.24    -31.36  
251     5000          NaN  GMT-03:00 America/Argentina/Bu     -64.20    -31.41  
252     5000          NaN  GMT-03:00 America/Argentina/Bu     -64.20    -31.44  

[108 rows x 13 columns])
****************************************************************************************************
# 当我们对这个DataFrameGroupBy对象进行遍历的时候,返回一个元组
# (A,B)
# A:分组的依据
# B:满足这个分组依据的dataframe
# 遍历返回的元组
for i,j in df.groupby(by="Country"):
    print(i)
    print("="*50)
    print(j)
    print("*"*100)
AD
==================================================
       Brand  Store Number     Store Name Ownership Type     Street Address  \
0  Starbucks  47370-257954  Meritxell, 96       Licensed  Av. Meritxell, 96   

               City State/Province Country Postcode Phone Number  \
0  Andorra la Vella              7      AD    AD500    376818720   

                  Timezone  Longitude  Latitude  
0  GMT+1:00 Europe/Andorra       1.53     42.51  
****************************************************************************************************
AE
==================================================
         Brand  Store Number                      Store Name Ownership Type  \
1    Starbucks  22331-212325                Ajman Drive Thru       Licensed   
2    Starbucks  47089-256771                       Dana Mall       Licensed   
3    Starbucks  22126-218024                      Twofour 54       Licensed   
4    Starbucks  17127-178586                    Al Ain Tower       Licensed   
5    Starbucks  17688-182164        Dalma Mall, Ground Floor       Licensed   
..         ...           ...                             ...            ...   
140  Starbucks   34253-62541                  Bukhatir Tower       Licensed   
141  Starbucks   1359-138434                  Qanat Al Qasba       Licensed   
142  Starbucks   34259-54260                   Sahara Center       Licensed   
143  Starbucks   34217-27108  American University of Sharjah       Licensed   
144  Starbucks  22697-223524                  UAQ Drive Thru       Licensed   

                         Street Address           City State/Province Country  \
1                  1 Street 69, Al Jarf          Ajman             AJ      AE   
2          Sheikh Khalifa Bin Zayed St.          Ajman             AJ      AE   
3                       Al Salam Street      Abu Dhabi             AZ      AE   
4       Khaldiya Area, Abu Dhabi Island      Abu Dhabi             AZ      AE   
5                  Dalma Mall, Mussafah      Abu Dhabi             AZ      AE   
..                                  ...            ...            ...     ...   
140            Sharjah Buheira Corniche        Sharjah             SH      AE   
141                      Al Taawun Road        Sharjah             SH      AE   
142                             Alnahda        Sharjah             SH      AE   
143  Airport Road, Universities Complex        Sharjah             SH      AE   
144                 King Faisal Highway  Umm Al Quwain             UQ      AE   

    Postcode Phone Number              Timezone  Longitude  Latitude  
1        NaN          NaN  GMT+04:00 Asia/Dubai      55.47     25.42  
2        NaN          NaN  GMT+04:00 Asia/Dubai      55.47     25.39  
3        NaN          NaN  GMT+04:00 Asia/Dubai      54.38     24.48  
4        NaN          NaN  GMT+04:00 Asia/Dubai      54.54     24.51  
5        NaN          NaN  GMT+04:00 Asia/Dubai      54.49     24.40  
..       ...          ...                   ...        ...       ...  
140      NaN   06-5560572  GMT+04:00 Asia/Dubai      55.38     25.33  
141      NaN   06-5560318  GMT+04:00 Asia/Dubai      55.38     25.32  
142      NaN   06-5315823  GMT+04:00 Asia/Dubai      55.37     25.30  
143      NaN   06-5585722  GMT+04:00 Asia/Dubai      55.48     25.30  
144      NaN          NaN  GMT+04:00 Asia/Dubai      55.54     25.53  

[144 rows x 13 columns]
****************************************************************************************************
AR
==================================================
         Brand  Store Number        Store Name Ownership Type  \
145  Starbucks   1278-139526   Alto Avelleneda       Licensed   
146  Starbucks  16436-159304            Soleil       Licensed   
147  Starbucks  47294-254164           Arevalo       Licensed   
148  Starbucks  25823-198067  Terrazas de Mayo       Licensed   
149  Starbucks  15008-157047        San Isidro       Licensed   
..         ...           ...               ...            ...   
248  Starbucks  47289-254786           Estrada       Licensed   
249  Starbucks  16161-168806            Canada       Licensed   
250  Starbucks  22131-168807   Dinosaurio Mall       Licensed   
251  Starbucks  16290-163049       Nuevocentro       Licensed   
252  Starbucks  16302-163050  Cordoba Shopping       Licensed   

                        Street Address          City State/Province Country  \
145           Avenida Gral. Guemes 897    Avellaneda              B      AR   
146                      Yrigiyen 2647      Boulogne              B      AR   
147               Nicaragua 6045, CABA  Buenos Aires              B      AR   
148     Cruce Ruta 8 y 202, San Miguel  Buenos Aires              B      AR   
149           Belgrano, 263 San Isidro  Buenos Aires              B      AR   
..                                 ...           ...            ...     ...   
248              Estrada 159 , Cordoba       Cordoba              X      AR   
249                      Av. Colon 608       Cordoba              X      AR   
250  Av. Fuerza Aerea Argentina 1700 B       Cordoba              X      AR   
251                 Duarte Quiroz 1400       Cordoba              X      AR   
252              Jose de Goyechea 2851       Cordoba              X      AR   

    Postcode Phone Number                        Timezone  Longitude  Latitude  
145    B1870    4204-3785  GMT-03:00 America/Argentina/Bu     -58.37    -34.68  
146     1609          NaN  GMT-03:00 America/Argentina/Bu     -57.87    -34.88  
147      NaN          NaN  GMT-03:00 America/Argentina/Bu     -58.44    -34.58  
148    C1663          NaN  GMT-03:00 America/Argentina/Bu     -58.70    -34.53  
149    B1642          NaN  GMT-03:00 America/Argentina/Bu     -58.51    -34.47  
..       ...          ...                             ...        ...       ...  
248     5000          NaN  GMT-03:00 America/Argentina/Bu     -64.20    -31.44  
249     5000          NaN  GMT-03:00 America/Argentina/Bu     -64.19    -31.41  
250    C5000          NaN  GMT-03:00 America/Argentina/Bu     -64.24    -31.36  
251     5000          NaN  GMT-03:00 America/Argentina/Bu     -64.20    -31.41  
252     5000          NaN  GMT-03:00 America/Argentina/Bu     -64.20    -31.44  

[108 rows x 13 columns]
****************************************************************************************************
# 选择美国的数据
# 方法一
for i,j in df.groupby(by="Country"):
    if i=="US":
        print(j)
           Brand  Store Number                        Store Name  \
11964  Starbucks   3513-125945           Safeway-Anchorage #1809   
11965  Starbucks   74352-84449           Safeway-Anchorage #2628   
11966  Starbucks  12449-152385         Safeway - Anchorage #1813   
11967  Starbucks  24936-233524          100th & C St - Anchorage   
11968  Starbucks    8973-85630              Old Seward & Diamond   
...          ...           ...                               ...   
25567  Starbucks   74385-87621             Safeway-Laramie #2466   
25568  Starbucks   73320-24375          Ridley's - Laramie #1131   
25569  Starbucks  22425-219024            Laramie - Grand & 30th   
25570  Starbucks  10849-103163      I-80 & Dewar Dr-Rock Springs   
25571  Starbucks  10769-102454  Coffeen & Brundage Lane-Sheridan   

      Ownership Type                                     Street Address  \
11964       Licensed                               5600 Debarr Rd Ste 9   
11965       Licensed                                     1725 Abbott Rd   
11966       Licensed                                    1501 Huffman Rd   
11967  Company Owned  320 W. 100th Ave, 100, Southgate Shopping Ctr ...   
11968  Company Owned                                 1005 E Dimond Blvd   
...              ...                                                ...   
25567       Licensed                                       554 N 3rd St   
25568       Licensed                                      3112 E. Grand   
25569  Company Owned                                     3021 Grand Ave   
25570  Company Owned                                   118 Westland Way   
25571  Company Owned                                   2208 Coffeen Ave   

               City State/Province Country   Postcode    Phone Number  \
11964     Anchorage             AK      US  995042300    907-339-0900   
11965     Anchorage             AK      US  995073444    907-339-2800   
11966     Anchorage             AK      US  995153596    907-339-1300   
11967     Anchorage             AK      US      99515  (907) 227-9631   
11968     Anchorage             AK      US  995152050    907-344-4160   
...             ...            ...     ...        ...             ...   
25567       Laramie             WY      US  820723012    307-721-5107   
25568       Laramie             WY      US  820705141    307-742-8146   
25569       Laramie             WY      US      82070    307-742-3262   
25570  Rock Springs             WY      US  829015751    307-362-7145   
25571     Sheridian             WY      US  828016213    307-672-5129   

                          Timezone  Longitude  Latitude  
11964  GMT-09:00 America/Anchorage    -149.78     61.21  
11965  GMT-09:00 America/Anchorage    -149.84     61.14  
11966  GMT-09:00 America/Anchorage    -149.85     61.11  
11967  GMT-09:00 America/Anchorage    -149.89     61.13  
11968  GMT-09:00 America/Anchorage    -149.86     61.14  
...                            ...        ...       ...  
25567     GMT-07:00 America/Denver    -105.59     41.32  
25568     GMT-07:00 America/Denver    -105.56     41.31  
25569     GMT-07:00 America/Denver    -105.56     41.31  
25570     GMT-07:00 America/Denver    -109.25     41.58  
25571     GMT-07:00 America/Denver    -106.94     44.77  

[13608 rows x 13 columns]
# 选择美国的数据
# 方法二
df[df["Country"]=="US"].head()

在这里插入图片描述

2.调用聚合方法

# 调用聚合方法
grouped=df.groupby(by="Country")
# 统计个数
grouped.count().head()
# 返回一个dataframe
# 把每一列的数据都统计了
# 如果数据没有缺失的话,同一行的数据应该是一样的
# 但是由于存在数据缺失,所以在有些行中,即使是同一行,数据也不一样

在这里插入图片描述

# 注意:由于存在数据缺失,在有些行中,即使是同一行,数据也不一样
# 所以我们在统计个数的时候,不应该是统计一整个dataframe
# 而是统计这个dataframe中没有缺失值的那一列
# 这样得到的结果才是准确的
grouped["Brand"].count()

在这里插入图片描述

解一

代码

# -*- coding: utf-8 -*-

'''
@Time    : 2020/12/18 12:19
@Author  : yuhui
@Email   : 3476237164@qq.com
@FileName: pandas_5.py
@Software: PyCharm
'''

"""
现在我们有一组关于全球星巴克店铺的统计数据,
如果我想知道美国的星巴克数量和中国的哪个多,
或者我想知道中国每个省份星巴克的数量的情况,那么应该怎么办?
"""

"""33【数据的合并和分组聚合】03数据分组聚合
如果我想知道美国的星巴克数量和中国的哪个多,"""

import pandas as pd

file_path="../data/directory.csv"

df=pd.read_csv(file_path,encoding="gbk")
# print(df.head())
# print(df.info())

# 按照国家进行分组
grouped=df.groupby(by="Country")

# 统计个数
country_count=grouped["Brand"].count()
# print(country_count)

# 选择中国和美国的数据
# 中国CN  美国US
CN_count=country_count["CN"]
US_count=country_count["US"]
print(CN_count,US_count)

运行结果

在这里插入图片描述

解二

代码

# -*- coding: utf-8 -*-

'''
@Time    : 2020/12/21 17:28
@Author  : yuhui
@Email   : 3476237164@qq.com
@FileName: pandas_5_2.py
@Software: PyCharm
'''

"""33【数据的合并和分组聚合】03数据分组聚合
我想知道中国每个省份星巴克的数量的情况"""

import pandas as pd

file_path="../data/directory.csv"

df=pd.read_csv(file_path,encoding="gbk")

# 选择中国的数据
df=df[df["Country"]=="CN"]

# 按照省份进行分组
grouped=df.groupby(by="State/Province")

# 统计数量(取其中某一列(不含缺失值))
grouped_count=grouped["Brand"].count()
print(grouped_count)

运行结果

D:\Python\Installation\python.exe D:/Python/数据分析/数据分析2/code/pandas_5_2.py
State/Province
11    236
12     58
13     24
14      8
15      8
21     57
22     13
23     16
31    551
32    354
33    315
34     26
35     75
36     13
37     75
41     21
42     76
43     35
44    333
45     21
46     16
50     41
51    104
52      9
53     24
61     42
62      3
63      3
64      2
91    162
92     13
Name: Brand, dtype: int64

Process finished with exit code 0

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值