pandas分组聚合

最新推荐文章于 2023-12-25 16:05:32 发布

BRUIN.

最新推荐文章于 2023-12-25 16:05:32 发布

阅读量462

点赞数

分类专栏： python数据处理和分析文章标签： python 数据分析 csv 可视化大数据

本文链接：https://blog.csdn.net/I_I___LO_VE___YA/article/details/106005257

版权

python数据处理和分析专栏收录该内容

14 篇文章 0 订阅

订阅专栏

现有一组关于全球星巴克店铺的统计数据，想知道排行最多的星巴克店铺数量的10国家。

读取数据如下：

import pandas as pd

# 设置显示所有列
pd.set_option("display.max_columns", None)
# 读取csv文件
df = pd.read_csv("starbucks_store_worldwide.csv")
df.head(5)

在这里插入图片描述

查看数据的大致情况，可以看到一共有2万多条数据，但有些数据是缺失的

# 查看表格整体情况
df.info()

在这里插入图片描述

根据country进行分类聚合

# 分类聚合 计算每个国家星巴克店铺数量
countries_num = df.groupby(by="Country").count()["Brand"]
countries_num = countries_num.sort_values(ascending=False)[:10]
countries_num

在这里插入图片描述

得到结果后画图，可以看到，作为美国本土的一个品牌，星巴克的门店再美国是最多的，而中国其次~

from matplotlib import pyplot as plt
import matplotlib 

# 设置中文显示

font = {
    'family' : 'simhei',
    'weight' : 'bold',
    'size'   : 16
}

matplotlib.rc("font", **font)

x = range(len(countries_num))
x_label = countries_num.index
y = countries_num.values

# 设置图形大小
plt.figure(figsize=(12,8), dpi=60)
plt.bar(x, y, color="g")

plt.xticks(x, x_label, fontsize=18)

for i,j in zip(x,y):
    plt.text(i-0.2,j+100, j, color="grey")
    
plt.title("星巴克店铺最多的前十个国家", fontsize=20)
plt.show()

在这里插入图片描述

现在，第二个问题，想知道中国每个省份星巴克的数量的情况，那么应该怎么办？

先获取country为CN的数据

cn_df = df[df["Country"] == "CN"]
cn_df.head()

在这里插入图片描述

根据省区的编号进行分类聚合

city_shop_num = cn_df.groupby(by="State/Province").count()["Brand"].sort_values(ascending=False)
city_shop_num = pd.DataFrame(city_shop_num.values,index=city_shop_num.index.astype("int"), columns=["num"])
city_shop_num

在这里插入图片描述

然后我在网上找到了编号对应的省区，就是有两个编号对应的省区百度不到

country_codes = """
	北京市（京）:11
	
	天津市（津）:12
	
	上海市（沪）:31
	
	重庆市（渝）:50
	
	河北省（冀）:13
	
	河南省（豫）:41
	
	云南省（云）:53
	
	辽宁省（辽）:21
	
	黑龙江省（黑）:23
	
	湖南省（湘）:43
	
	安徽省（皖）:34
	
	山东省（鲁）:37
	
	新疆维吾尔（新）:65
	
	江苏省（苏）:32
	
	浙江省（浙）:33
	
	江西省（赣）:36
	
	湖北省（鄂）:42
	
	广西壮族（桂）:45
	
	甘肃省（甘）:62
	
	山西省（晋）:14
	
	内蒙古（蒙）:15
	
	陕西省（陕）:61
	
	吉林省（吉）:22
	
	福建省（闽）:35
	
	贵州省（贵）:52
	
	广东省（粤）:44
	
	青海省（青）:63
	
	西藏（藏）: 54
	
	四川省（川）:51
	
	宁夏回族（宁）:64
	
	海南省（琼）:46
"""
country_codes = country_codes.split("\n\n")
country_codes = {c.split(":")[0]: int(c.split(":")[1]) for c in country_codes}
country_codes

在这里插入图片描述

将编号对应的省区转换成DataFrame

counties = country_codes.keys()
codes = country_codes.values()
country_codes = pd.DataFrame(counties,index=codes)
country_codes = pd.DataFrame(country_codes[0].str.slice(-2,-1))
country_codes

在这里插入图片描述

再将数据根据索引，即对应省区编号，进行拼接

city_shop_num = city_shop_num.join(country_codes)
city_shop_num

在这里插入图片描述

处理为nan的数据

city_shop_num.dropna(inplace=True)

画图，最后结果可以看到，星巴克店铺大部分分布在上海、江苏、浙江、北京、广东这些比较富饶的地区，嘿嘿，没钱还真消费不起

x = range(len(city_shop_num))
x_label = city_shop_num[0]
y = city_shop_num["num"]

plt.figure(figsize=(12,8),dpi=90)
plt.bar(x, y, color="g")

plt.xticks(x, x_label)

for i,j in zip(x,y):
    plt.text(i-0.4,j+4,j, fontsize=12)
    

plt.title("中国各省星巴克门店数量")
plt.show()

在这里插入图片描述

BRUIN.

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
pandas分组聚合

现有一组关于全球星巴克店铺的统计数据，想知道排行最多的星巴克店铺数量的10国家。读取数据如下：import pandas as pd# 设置显示所有列pd.set_option("display.max_columns", None)# 读取csv文件df = pd.read_csv("starbucks_store_worldwide.csv")df.head(5)查看数据的大致情况，可以看到一共有2万多条数据，但有些数据是缺失的# 查看表格整体情况df.info()
复制链接

扫一扫