练习3-数据分组
探索酒类消费数据
步骤1 导入必要的库
运行以下代码
import pandas as pd
步骤2 从以下地址导入数据
运行以下代码
path3 =‘…/input/pandas_exercise/pandas_exercise/exercise_data/drinks.csv’ #‘drinks.csv’
步骤3 将数据框命名为drinks
运行以下代码
drinks = pd.read_csv(path3)
drinks.head()
country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent
0 Afghanistan 0 0 0 0.0 AS
1 Albania 89 132 54 4.9 EU
2 Algeria 25 0 14 0.7 AF
3 Andorra 245 138 312 12.4 EU
4 Angola 217 57 45 5.9 AF
步骤4 哪个大陆(continent)平均消耗的啤酒(beer)更多?
运行以下代码
drinks.groupby(‘continent’).beer_servings.mean()
continent
AF 61.471698
AS 37.045455
EU 193.777778
OC 89.687500
SA 175.083333
Name: beer_servings, dtype: float64
步骤5 打印出每个大陆(continent)的红酒消耗(wine_servings)的描述性统计值
运行以下代码
drinks.groupby(‘continent’).wine_servings.describe()
count mean std min 25% 50% 75% max
continent
AF 53.0 16.264151 38.846419 0.0 1.0 2.0 13.00 233.0
AS 44.0 9.068182 21.667034 0.0 0.0 1.0 8.00 123.0
EU 45.0 142.222222 97.421738 0.0 59.0 128.0 195.00 370.0
OC 16.0 35.625000 64.555790 0.0 1.0 8.5 23.25 212.0
SA 12.0 62.416667 88.620189 1.0 3.0 12.0 98.50 221.0
步骤6 打印出每个大陆每种酒类别的消耗平均值
运行以下代码
drinks.groupby(‘continent’).mean()
beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol
continent
AF 61.471698 16.339623 16.264151 3.007547
AS 37.045455 60.840909 9.068182 2.170455
EU 193.777778 132.555556 142.222222 8.617778
OC 89.687500 58.437500 35.625000 3.381250
SA 175.083333 114.750000 62.416667 6.308333
步骤7 打印出每个大陆每种酒类别的消耗中位数
运行以下代码
drinks.groupby(‘continent’).median()
beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol
continent
AF 32.0 3.0 2.0 2.30
AS 17.5 16.0 1.0 1.20
EU 219.0 122.0 128.0 10.00
OC 52.5 37.0 8.5 1.75
SA 162.5 108.5 12.0 6.85
步骤8 打印出每个大陆对spirit饮品消耗的平均值,最大值和最小值
运行以下代码
drinks.groupby(‘continent’).spirit_servings.agg([‘mean’, ‘min’, ‘max’])
mean min max
continent
AF 16.339623 0 152
AS 60.840909 0 326
EU 132.555556 0 373
OC 58.437500 0 254
SA 114.750000 25 302