练习（在图中添加文字）

最新推荐文章于 2024-06-12 14:11:48 发布

weixin_44457930

最新推荐文章于 2024-06-12 14:11:48 发布

阅读量115

点赞数

分类专栏：数据科学库文章标签：数据分析

本文链接：https://blog.csdn.net/weixin_44457930/article/details/114921213

版权

数据科学库专栏收录该内容

9 篇文章 0 订阅

订阅专栏

1

（1）对于一组电影数据（IMDB-Movie-Data.csv），如果我们想rating，runtime的分布情况，应该如何呈现数据？

代码

import pandas as pd
import math
from matplotlib import pyplot as plt

file_path = "./IMDB-Movie-Data.csv"
df = pd.read_csv(file_path)

rating_index= df["Rating"].values
max_runtime = rating_index.max()
min_runtime = rating_index.min()

# 计算组数
print(max_runtime-min_runtime)
num_bin = math.ceil((max_runtime-min_runtime)//0.5)

plt.figure(figsize=(16, 12), dpi=80)
plt.hist(rating_index, num_bin)
plt.show()

输出

7.1

在这里插入图片描述
可以看到，由于对组数进行了取整操作，因此组距很难区分，这个时候使用列表人为确定组距最佳
可以看到，评价在3分以下的数量很少，所以可以设定第一组为0-3，接下来每隔0.5做一个分组
最高分是9，因此最后一个分点是9

import pandas as pd
from matplotlib import pyplot as plt

file_path = "./IMDB-Movie-Data.csv"
df = pd.read_csv(file_path)

rating_index= df["Rating"].values
seg_point = [0] + [3 + 0.5*i for i in range(13)]
print(seg_point)
plt.hist(rating_index, seg_point)
x = [0+i*0.5 for i in range(20)]
plt.xticks(x)	# 要显示的刻度
plt.show()

输出
在这里插入图片描述
也可以使用不等宽的组距

import pandas as pd
from matplotlib import pyplot as plt

file_path = "./IMDB-Movie-Data.csv"
df = pd.read_csv(file_path)
rating_data= df["Rating"].values
max_rating = rating_data.max()
min_rating = rating_data.min()
print(min_rating, max_rating)

# 设置不等宽的组距，hist方法中取到的会是一个左闭右开的区间[1.9,3.5)
num_bin_list = [1.9,3.5]
i=3.5
while i<=max_rating:      # 如果等于号不加，那么将取不到9.0
    i += 0.5
    num_bin_list.append(i)
print(num_bin_list)

# 设置图形的大小
plt.figure(figsize=(12,8),dpi=80)
plt.hist(rating_data,num_bin_list)

# xticks让之前的组距能够对应上
plt.xticks(num_bin_list)

plt.show()

输出

1.9 9.0
[1.9, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5]

在这里插入图片描述

2 星巴克数据

（1）使用matplotlib呈现出店铺总数排名前10的国家
（2）使用matplotlib呈现出中国店铺数量前十五名的城市

（1）绘制店铺总数排名前10的国家的条形图

代码

import pandas as pd
from matplotlib import pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei'] # 解决中文显示的问题

file_path = "starbucks_store_worldwide.csv"
data = pd.read_csv("starbucks_store_worldwide.csv")
# print(data.info())

num_country = data.groupby(by="Country").count()["Brand"]
# print(num_country)
# print(type(num_country))

sort_num = num_country.sort_values(ascending=False)
top_10 =sort_num.head(10)
# print(top_10)
# print(top_10.values)

_xticks = top_10.index
x = range(len(_xticks))

plt.figure(figsize=(12, 8), dpi=80)

plt.xticks(x, _xticks)
plt.xlabel("国家简称")
plt.ylabel("门店数量")
plt.title("星巴克门店数量前十名的国家")

x = plt.bar(x, top_10, color = "orange")
# 这个画条形图的命令有返回值

# 在条形图上标上数字
for i in x:         # 遍历条形图的返回值，每次迭代的结果都是一个条柱
    height = i.get_height()
    # 获得条形的高度
    plt.text(i.get_x() + i.get_width() / 2,height, str(height),
             fontsize=15, va="bottom", ha="center", color='blue')
    # plt.text是添加文字标签，前两个参数(x, y)表示文字要放置的位置
    # i.get_x()是获得条柱的左边缘位置，i.get_width()获得条柱的宽度
    # height 表示文字要放置的高度
    # str(height)是要标注的文字，这里我们通过将height转为字符串得到，
    # 也可以通过top_10.value得到，但需加上循环计数变量，因为value是列表
    # fontsize=15, va="bottom", ha="center", color='blue'定义字体
    # ha是horizontal alignment， 及水平对齐方式，同样，va表示竖直对齐方式
    # ha='center' 表示水平居中对齐，即让注释的横坐标中点和x一致，
    # va='bottom' 表示竖直向下对齐，即让注释的纵坐标最低点和y一致

plt.show()

输出
在这里插入图片描述

（2）使用matplotlib呈现出每个中国每个城市的店铺数量

代码

import pandas as pd
from matplotlib import pyplot as plt
plt.rcParams['font.sans-serif'] = "SimHei"

data = pd.read_csv("starbucks_store_worldwide.csv")
# print(data.info())

num_country = data.groupby(by=["Country", "City"]).count()
num_city_CN = num_country.loc["CN", "Brand"]
# print(num_city_CN)

top_15 = num_city_CN.sort_values(ascending=False).head(15)
# print(top_15)

plt.figure(figsize=(12, 8), dpi=80)
plt.xticks(range(15), top_15.index)
plt.xlabel("城市")
plt.ylabel("数量")
plt.title("店铺数量前十五名中国城市")

x = plt.bar(range(15), top_15.values, color='orange')
for i in x:
    plt.text(i.get_x()+i.get_width()/2, i.get_height(),
             str(i.get_height()), fontsize=15, ha="center")

plt.show()

输出
在这里插入图片描述

weixin_44457930

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
练习（在图中添加文字）

对于这一组电影数据，如果我们想rating，runtime的分布情况，应该如何呈现数据？import pandas as pdimport mathfrom matplotlib import pyplot as pltfile_path = "./IMDB-Movie-Data.csv"df = pd.read_csv(file_path)rating_index= df["Rating"].valuesmax_runtime = rating_index.max()min_runti
复制链接

扫一扫