DataFrame 具有的 groupby 方法能够轻松的将数据分组
数据格式:
Brand Store Number Store Name Ownership Type Street Address \
0 Starbucks 47370-257954 Meritxell, 96 Licensed Av. Meritxell, 96
City State/Province Country Postcode Phone Number \
0 Andorra la Vella 7 AD AD500 376818720
Timezone Longitude Latitude
0 GMT+1:00 Europe/Andorra 1.53 42.51
import pandas as pd
import numpy as np
file_path = '数据'
pd.set_option('display.max_columns', 20)
df = pd.read_csv(file_path)
print(df.head(1))
# print(df.info())
# 返回一个元祖,第一个值为要分组的组名,第二个值为数据
grouped = df.groupby(by='Country')
# print(grouped)
# for i, j in grouped:
# print(i)
# print('*'*100)
# print(j)
# print('-'*100)
# 比较美国和中国星巴克的数量
# country_count = grouped['Brand'].count()
# print(country_count['US'])
# print(country_count['CN'])
# 统计中国每个省份的星巴克数量
china_data = df[df['Country'] == 'CN']
# 按照 'State/Province' 分组,并统计 'Brand' 的数量
grouped = china_data.groupby(by='State/Province').count()['Brand']
# 输出处理之后的数据
print(grouped.sort_values())
结果 index为各个省份的代码
State/Province
64 2
63 3
62 3
14 8
15 8
52 9
92 13
22 13
36 13
23 16
46 16
45 21
41 21
13 24
53 24
34 26
43 35
50 41
61 42
21 57
12 58
35 75
37 75
42 76
51 104
91 162
11 236
33 315
44 333
32 354
31 551
Name: Brand, dtype: int64
Process finished with exit code 0