python 之 pandas 03统计星巴克信息

最新推荐文章于 2023-06-08 20:44:12 发布

小白和雨薇

最新推荐文章于 2023-06-08 20:44:12 发布

阅读量1.4k

点赞数 2

本文链接：https://blog.csdn.net/weixin_46109199/article/details/104198254

版权

有一组全球星巴克店铺的统计数据，分析美国与中国星巴克的数量，并得出中国每个省份星巴克的数量。
在这里插入图片描述
导入数据并对数据以国家进行分组，使用groupby()函数，groupby可以对数据进行分组，但不能输出结果。
可以对grouped中数据进行遍历，得出结果为：每个国家的数据组成一个元组，元组第一个元素为国家名字，第二个元素为该国家的其他信息，类型为DataFrame。
注：从info()可看出“Brand”这一列无缺省值，因此分组后可以通过计数Brand的值，来得出有多少家星巴克；当然，统计Country这一列也可以。

import pandas as pd
import numpy as np
file_path = "./starbucks_store_worldwide.csv"
df = pd.read_csv(file_path)
grouped = df.groupby(by="Country")
print(grouped)
for i in grouped:
    print(i)
    print("*"*30)

#一个国家信息
[25 rows x 13 columns])
******************************
('ZA',            Brand  Store Number  ... Longitude Latitude
25597  Starbucks  47608-253804  ...     28.04   -26.15
25598  Starbucks  47640-253809  ...     28.28   -25.79
25599  Starbucks  47609-253286  ...     28.11   -26.02

代码

import pandas as pd
import numpy as np
file_path = "./starbucks_store_worldwide.csv"
df = pd.read_csv(file_path)
print(df.info())
grouped = df.groupby(by="Country")
country_count = grouped["Brand"].count()
print(country_count["US"])
print(country_count["CN"])
china_data = df[df["Country"]=="CN"]
grouped1 = china_data.groupby(by="State/Province").count()["Brand"]
print(grouped1)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25600 entries, 0 to 25599
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Brand           25600 non-null  object 
 1   Store Number    25600 non-null  object 
 2   Store Name      25600 non-null  object 
 3   Ownership Type  25600 non-null  object 
 4   Street Address  25598 non-null  object 
 5   City            25585 non-null  object 
 6   State/Province  25600 non-null  object 
 7   Country         25600 non-null  object 
 8   Postcode        24078 non-null  object 
 9   Phone Number    18739 non-null  object 
 10  Timezone        25600 non-null  object 
 11  Longitude       25599 non-null  float64
 12  Latitude        25599 non-null  float64
dtypes: float64(2), object(11)
memory usage: 2.5+ MB
None
13608
2734
State/Province
11    236
12     58
13     24
14      8
15      8
21     57
22     13
23     16
31    551
32    354
33    315
34     26
35     75
36     13
37     75
41     21
42     76
43     35
44    333
45     21
46     16
50     41
51    104
52      9
53     24
61     42
62      3
63      3
64      2
91    162
92     13
Name: Brand, dtype: int64

#另一种写法
#按多个条件进行分组
grouped1 = df["Brand"].groupby(by=[df["Country"],df["State/Province"]]).count()["CN"]
grouped2 = df["Brand"].groupby(by=df["Country"]).count()["CN"]
print(grouped1)
print(grouped2)

State/Province
11    236
12     58
13     24
14      8
15      8
21     57
22     13
23     16
31    551
32    354
33    315
34     26
35     75
36     13
37     75
41     21
42     76
43     35
44    333
45     21
46     16
50     41
51    104
52      9
53     24
61     42
62      3
63      3
64      2
91    162
92     13
Name: Brand, dtype: int64
2734

小白和雨薇

关注

2
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
python 之 pandas 03统计星巴克信息

有一组全球星巴克店铺的统计数据，分析美国与中国星巴克的数量，并得出中国每个省份星巴克的数量。导入数据并对数据以国家进行分组，使用groupby()函数，groupby可以对数据进行分组，但不能输出结果。可以对grouped中数据进行遍历，得出结果为：每个国家的数据组成一个元组，元组第一个元素为国家名字，第二个元素为该国家的其他信息，类型为DataFrame。import pandas as...
复制链接

扫一扫