例题
现在我们有一组关于全球星巴克店铺的统计数据,如果我想知道美国的星巴克数量和中国的哪个多,或者我想知道中国每个省份星巴克的数量的情况,那么应该怎么办?
数据来源:
https://www.kaggle.com/starbucks/store-locations/data
df=df=pd.read_csv("../data/directory.csv",encoding="gbk") # 读取文件信息
df.info() # 文件的概要信息
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25600 entries, 0 to 25599
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Brand 25600 non-null object 品牌
1 Store Number 25600 non-null object 店铺编号
2 Store Name 25600 non-null object 店铺名字
3 Ownership Type 25600 non-null object 所有权类型
4 Street Address 25598 non-null object 街道地址 缺失
5 City 25585 non-null object 城市 缺失
6 State/Province 25600 non-null object 州/省
7 Country 25600 non-null object 国家
8 Postcode 24078 non-null object 邮政编码 缺失
9 Phone Number 18739 non-null object 电话号码 缺失
10 Timezone 25600 non-null object 时区
11 Longitude 25599 non-null float64 经度 缺失
12 Latitude 25599 non-null float64 纬度 缺失
dtypes: float64(2), object(11)
memory usage: 2.5+ MB
分组和聚合
df.groupby(by=列名) # 按照指定的列进行分组
1.可以进行遍历
# 遍历
for i in df.groupby(by="Country"):
print(i)
print("*"*100)
('AD', Brand Store Number Store Name Ownership Type Street Address \
0 Starbucks 47370-257954 Meritxell, 96 Licensed Av. Meritxell, 96
City State/Province Country Postcode Phone Number \
0 Andorra la Vella 7 AD AD500 376818720
Timezone Longitude Latitude
0 GMT+1:00 Europe/Andorra 1.53 42.51 )
****************************************************************************************************
('AE', Brand Store Number Store Name Ownership Type \
1 Starbucks 22331-212325 Ajman Drive Thru Licensed
2 Starbucks 47089-256771 Dana Mall Licensed
3 Starbucks 22126-218024 Twofour 54 Licensed
4 Starbucks 17127-178586 Al Ain Tower Licensed
5 Starbucks 17688-182164 Dalma Mall, Ground Floor Licensed
.. ... ... ... ...
140 Starbucks 34253-62541 Bukhatir Tower Licensed
141 Starbucks 1359-138434 Qanat Al Qasba Licensed
142 Starbucks 34259-54260 Sahara Center Licensed
143 Starbucks 34217-27108 American University of Sharjah Licensed
144 Starbucks 22697-223524 UAQ Drive Thru Licensed
Street Address City State/Province Country \
1 1 Street 69, Al Jarf Ajman AJ AE
2 Sheikh Khalifa Bin Zayed St. Ajman AJ AE
3 Al Salam Street Abu Dhabi AZ AE
4 Khaldiya Area, Abu Dhabi Island Abu Dhabi AZ AE
5 Dalma Mall, Mussafah Abu Dhabi AZ AE
.. ... ... ... ...
140 Sharjah Buheira Corniche Sharjah SH AE
141 Al Taawun Road Sharjah SH AE
142 Alnahda Sharjah SH AE
143 Airport Road, Universities Complex Sharjah SH AE
144 King Faisal Highway Umm Al Quwain UQ AE
Postcode Phone Number Timezone Longitude Latitude
1 NaN NaN GMT+04:00 Asia/Dubai 55.47 25.42
2 NaN NaN GMT+04:00 Asia/Dubai 55.47 25.39
3 NaN NaN GMT+04:00 Asia/Dubai 54.38 24.48
4 NaN NaN GMT+04:00 Asia/Dubai 54.54 24.51
5 NaN NaN GMT+04:00 Asia/Dubai 54.49 24.40
.. ... ... ... ... ...
140 NaN 06-5560572 GMT+04:00 Asia/Dubai 55.38 25.33
141 NaN 06-5560318 GMT+04:00 Asia/Dubai 55.38 25.32
142 NaN 06-5315823 GMT+04:00 Asia/Dubai 55.37 25.30
143 NaN 06-5585722 GMT+04:00 Asia/Dubai 55.48 25.30
144 NaN NaN GMT+04:00 Asia/Dubai 55.54 25.53
[144 rows x 13 columns])
****************************************************************************************************
('AR', Brand Store Number Store Name Ownership Type \
145 Starbucks 1278-139526 Alto Avelleneda Licensed
146 Starbucks 16436-159304 Soleil Licensed
147 Starbucks 47294-254164 Arevalo Licensed
148 Starbucks 25823-198067 Terrazas de Mayo Licensed
149 Starbucks 15008-157047 San Isidro Licensed
.. ... ... ... ...
248 Starbucks 47289-254786 Estrada Licensed
249 Starbucks 16161-168806 Canada Licensed
250 Starbucks 22131-168807 Dinosaurio Mall Licensed
251 Starbucks 16290-163049 Nuevocentro Licensed
252 Starbucks 16302-163050 Cordoba Shopping Licensed
Street Address City State/Province Country \
145 Avenida Gral. Guemes 897 Avellaneda B AR
146 Yrigiyen 2647 Boulogne B AR
147 Nicaragua 6045, CABA Buenos Aires B AR
148 Cruce Ruta 8 y 202, San Miguel Buenos Aires B AR
149 Belgrano, 263 San Isidro Buenos Aires B AR
.. ... ... ... ...
248 Estrada 159 , Cordoba Cordoba X AR
249 Av. Colon 608 Cordoba X AR
250 Av. Fuerza Aerea Argentina 1700 B Cordoba X AR
251 Duarte Quiroz 1400 Cordoba X AR
252 Jose de Goyechea 2851 Cordoba X AR
Postcode Phone Number Timezone Longitude Latitude
145 B1870 4204-3785 GMT-03:00 America/Argentina/Bu -58.37 -34.68
146 1609 NaN GMT-03:00 America/Argentina/Bu -57.87 -34.88
147 NaN NaN GMT-03:00 America/Argentina/Bu -58.44 -34.58
148 C1663 NaN GMT-03:00 America/Argentina/Bu -58.70 -34.53
149 B1642 NaN GMT-03:00 America/Argentina/Bu -58.51 -34.47
.. ... ... ... ... ...
248 5000 NaN GMT-03:00 America/Argentina/Bu -64.20 -31.44
249 5000 NaN GMT-03:00 America/Argentina/Bu -64.19 -31.41
250 C5000 NaN GMT-03:00 America/Argentina/Bu -64.24 -31.36
251 5000 NaN GMT-03:00 America/Argentina/Bu -64.20 -31.41
252 5000 NaN GMT-03:00 America/Argentina/Bu -64.20 -31.44
[108 rows x 13 columns])
****************************************************************************************************
# 当我们对这个DataFrameGroupBy对象进行遍历的时候,返回一个元组
# (A,B)
# A:分组的依据
# B:满足这个分组依据的dataframe
# 遍历返回的元组
for i,j in df.groupby(by="Country"):
print(i)
print("="*50)
print(j)
print("*"*100)
AD
==================================================
Brand Store Number Store Name Ownership Type Street Address \
0 Starbucks 47370-257954 Meritxell, 96 Licensed Av. Meritxell, 96
City State/Province Country Postcode Phone Number \
0 Andorra la Vella 7 AD AD500 376818720
Timezone Longitude Latitude
0 GMT+1:00 Europe/Andorra 1.53 42.51
****************************************************************************************************
AE
==================================================
Brand Store Number Store Name Ownership Type \
1 Starbucks 22331-212325 Ajman Drive Thru Licensed
2 Starbucks 47089-256771 Dana Mall Licensed
3 Starbucks 22126-218024 Twofour 54 Licensed
4 Starbucks 17127-178586 Al Ain Tower Licensed
5 Starbucks 17688-182164 Dalma Mall, Ground Floor Licensed
.. ... ... ... ...
140 Starbucks 34253-62541 Bukhatir Tower Licensed
141 Starbucks 1359-138434 Qanat Al Qasba Licensed
142 Starbucks 34259-54260 Sahara Center Licensed
143 Starbucks 34217-27108 American University of Sharjah Licensed
144 Starbucks 22697-223524 UAQ Drive Thru Licensed
Street Address City State/Province Country \
1 1 Street 69, Al Jarf Ajman AJ AE
2 Sheikh Khalifa Bin Zayed St. Ajman AJ AE
3 Al Salam Street Abu Dhabi AZ AE
4 Khaldiya Area, Abu Dhabi Island Abu Dhabi AZ AE
5 Dalma Mall, Mussafah Abu Dhabi AZ AE
.. ... ... ... ...
140 Sharjah Buheira Corniche Sharjah SH AE
141 Al Taawun Road Sharjah SH AE
142 Alnahda Sharjah SH AE
143 Airport Road, Universities Complex Sharjah SH AE
144 King Faisal Highway Umm Al Quwain UQ AE
Postcode Phone Number Timezone Longitude Latitude
1 NaN NaN GMT+04:00 Asia/Dubai 55.47 25.42
2 NaN NaN GMT+04:00 Asia/Dubai 55.47 25.39
3 NaN NaN GMT+04:00 Asia/Dubai 54.38 24.48
4 NaN NaN GMT+04:00 Asia/Dubai 54.54 24.51
5 NaN NaN GMT+04:00 Asia/Dubai 54.49 24.40
.. ... ... ... ... ...
140 NaN 06-5560572 GMT+04:00 Asia/Dubai 55.38 25.33
141 NaN 06-5560318 GMT+04:00 Asia/Dubai 55.38 25.32
142 NaN 06-5315823 GMT+04:00 Asia/Dubai 55.37 25.30
143 NaN 06-5585722 GMT+04:00 Asia/Dubai 55.48 25.30
144 NaN NaN GMT+04:00 Asia/Dubai 55.54 25.53
[144 rows x 13 columns]
****************************************************************************************************
AR
==================================================
Brand Store Number Store Name Ownership Type \
145 Starbucks 1278-139526 Alto Avelleneda Licensed
146 Starbucks 16436-159304 Soleil Licensed
147 Starbucks 47294-254164 Arevalo Licensed
148 Starbucks 25823-198067 Terrazas de Mayo Licensed
149 Starbucks 15008-157047 San Isidro Licensed
.. ... ... ... ...
248 Starbucks 47289-254786 Estrada Licensed
249 Starbucks 16161-168806 Canada Licensed
250 Starbucks 22131-168807 Dinosaurio Mall Licensed
251 Starbucks 16290-163049 Nuevocentro Licensed
252 Starbucks 16302-163050 Cordoba Shopping Licensed
Street Address City State/Province Country \
145 Avenida Gral. Guemes 897 Avellaneda B AR
146 Yrigiyen 2647 Boulogne B AR
147 Nicaragua 6045, CABA Buenos Aires B AR
148 Cruce Ruta 8 y 202, San Miguel Buenos Aires B AR
149 Belgrano, 263 San Isidro Buenos Aires B AR
.. ... ... ... ...
248 Estrada 159 , Cordoba Cordoba X AR
249 Av. Colon 608 Cordoba X AR
250 Av. Fuerza Aerea Argentina 1700 B Cordoba X AR
251 Duarte Quiroz 1400 Cordoba X AR
252 Jose de Goyechea 2851 Cordoba X AR
Postcode Phone Number Timezone Longitude Latitude
145 B1870 4204-3785 GMT-03:00 America/Argentina/Bu -58.37 -34.68
146 1609 NaN GMT-03:00 America/Argentina/Bu -57.87 -34.88
147 NaN NaN GMT-03:00 America/Argentina/Bu -58.44 -34.58
148 C1663 NaN GMT-03:00 America/Argentina/Bu -58.70 -34.53
149 B1642 NaN GMT-03:00 America/Argentina/Bu -58.51 -34.47
.. ... ... ... ... ...
248 5000 NaN GMT-03:00 America/Argentina/Bu -64.20 -31.44
249 5000 NaN GMT-03:00 America/Argentina/Bu -64.19 -31.41
250 C5000 NaN GMT-03:00 America/Argentina/Bu -64.24 -31.36
251 5000 NaN GMT-03:00 America/Argentina/Bu -64.20 -31.41
252 5000 NaN GMT-03:00 America/Argentina/Bu -64.20 -31.44
[108 rows x 13 columns]
****************************************************************************************************
# 选择美国的数据
# 方法一
for i,j in df.groupby(by="Country"):
if i=="US":
print(j)
Brand Store Number Store Name \
11964 Starbucks 3513-125945 Safeway-Anchorage #1809
11965 Starbucks 74352-84449 Safeway-Anchorage #2628
11966 Starbucks 12449-152385 Safeway - Anchorage #1813
11967 Starbucks 24936-233524 100th & C St - Anchorage
11968 Starbucks 8973-85630 Old Seward & Diamond
... ... ... ...
25567 Starbucks 74385-87621 Safeway-Laramie #2466
25568 Starbucks 73320-24375 Ridley's - Laramie #1131
25569 Starbucks 22425-219024 Laramie - Grand & 30th
25570 Starbucks 10849-103163 I-80 & Dewar Dr-Rock Springs
25571 Starbucks 10769-102454 Coffeen & Brundage Lane-Sheridan
Ownership Type Street Address \
11964 Licensed 5600 Debarr Rd Ste 9
11965 Licensed 1725 Abbott Rd
11966 Licensed 1501 Huffman Rd
11967 Company Owned 320 W. 100th Ave, 100, Southgate Shopping Ctr ...
11968 Company Owned 1005 E Dimond Blvd
... ... ...
25567 Licensed 554 N 3rd St
25568 Licensed 3112 E. Grand
25569 Company Owned 3021 Grand Ave
25570 Company Owned 118 Westland Way
25571 Company Owned 2208 Coffeen Ave
City State/Province Country Postcode Phone Number \
11964 Anchorage AK US 995042300 907-339-0900
11965 Anchorage AK US 995073444 907-339-2800
11966 Anchorage AK US 995153596 907-339-1300
11967 Anchorage AK US 99515 (907) 227-9631
11968 Anchorage AK US 995152050 907-344-4160
... ... ... ... ... ...
25567 Laramie WY US 820723012 307-721-5107
25568 Laramie WY US 820705141 307-742-8146
25569 Laramie WY US 82070 307-742-3262
25570 Rock Springs WY US 829015751 307-362-7145
25571 Sheridian WY US 828016213 307-672-5129
Timezone Longitude Latitude
11964 GMT-09:00 America/Anchorage -149.78 61.21
11965 GMT-09:00 America/Anchorage -149.84 61.14
11966 GMT-09:00 America/Anchorage -149.85 61.11
11967 GMT-09:00 America/Anchorage -149.89 61.13
11968 GMT-09:00 America/Anchorage -149.86 61.14
... ... ... ...
25567 GMT-07:00 America/Denver -105.59 41.32
25568 GMT-07:00 America/Denver -105.56 41.31
25569 GMT-07:00 America/Denver -105.56 41.31
25570 GMT-07:00 America/Denver -109.25 41.58
25571 GMT-07:00 America/Denver -106.94 44.77
[13608 rows x 13 columns]
# 选择美国的数据
# 方法二
df[df["Country"]=="US"].head()
2.调用聚合方法
# 调用聚合方法
grouped=df.groupby(by="Country")
# 统计个数
grouped.count().head()
# 返回一个dataframe
# 把每一列的数据都统计了
# 如果数据没有缺失的话,同一行的数据应该是一样的
# 但是由于存在数据缺失,所以在有些行中,即使是同一行,数据也不一样
# 注意:由于存在数据缺失,在有些行中,即使是同一行,数据也不一样
# 所以我们在统计个数的时候,不应该是统计一整个dataframe
# 而是统计这个dataframe中没有缺失值的那一列
# 这样得到的结果才是准确的
grouped["Brand"].count()
解一
代码
# -*- coding: utf-8 -*-
'''
@Time : 2020/12/18 12:19
@Author : yuhui
@Email : 3476237164@qq.com
@FileName: pandas_5.py
@Software: PyCharm
'''
"""
现在我们有一组关于全球星巴克店铺的统计数据,
如果我想知道美国的星巴克数量和中国的哪个多,
或者我想知道中国每个省份星巴克的数量的情况,那么应该怎么办?
"""
"""33【数据的合并和分组聚合】03数据分组聚合
如果我想知道美国的星巴克数量和中国的哪个多,"""
import pandas as pd
file_path="../data/directory.csv"
df=pd.read_csv(file_path,encoding="gbk")
# print(df.head())
# print(df.info())
# 按照国家进行分组
grouped=df.groupby(by="Country")
# 统计个数
country_count=grouped["Brand"].count()
# print(country_count)
# 选择中国和美国的数据
# 中国CN 美国US
CN_count=country_count["CN"]
US_count=country_count["US"]
print(CN_count,US_count)
运行结果
解二
代码
# -*- coding: utf-8 -*-
'''
@Time : 2020/12/21 17:28
@Author : yuhui
@Email : 3476237164@qq.com
@FileName: pandas_5_2.py
@Software: PyCharm
'''
"""33【数据的合并和分组聚合】03数据分组聚合
我想知道中国每个省份星巴克的数量的情况"""
import pandas as pd
file_path="../data/directory.csv"
df=pd.read_csv(file_path,encoding="gbk")
# 选择中国的数据
df=df[df["Country"]=="CN"]
# 按照省份进行分组
grouped=df.groupby(by="State/Province")
# 统计数量(取其中某一列(不含缺失值))
grouped_count=grouped["Brand"].count()
print(grouped_count)
运行结果
D:\Python\Installation\python.exe D:/Python/数据分析/数据分析2/code/pandas_5_2.py
State/Province
11 236
12 58
13 24
14 8
15 8
21 57
22 13
23 16
31 551
32 354
33 315
34 26
35 75
36 13
37 75
41 21
42 76
43 35
44 333
45 21
46 16
50 41
51 104
52 9
53 24
61 42
62 3
63 3
64 2
91 162
92 13
Name: Brand, dtype: int64
Process finished with exit code 0