scatter_matrix&df.plot&sns.boxplot

Dataset

本文的数据集hollywood_movies.csv是好莱坞2007到2011年的电影的信息,可视化的目标是更好的理解好莱坞的基本经济以及探索电影成功的异常性质。

  • 下面是csv文件中某些重要的属性

Year: the year the movie was released.
Critic Rating: average rating by the critics.
Audience Rating: average rating by the audience.
Genre: the genre the movie belongs to. (电影类型)
Budget: the movie’s budget, in millions of dollars.(国内(美国)收入)
Domestic Gross: domestic (U.S.) revenue, in millions of dollars.(国外收入)
Worldwide Gross: total revenue worldwide, in millions of dollars.(全球总收入)
Profitability: ratio of Budget to Worldwide Gross.

  • 因为exclude这一列都是空值,因此用drop函数将其删掉
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

hollywood_movies = pd.read_csv("hollywood_movies.csv")
hollywood_movies = hollywood_movies.drop("exclude", axis=1)

Scatter Plots - Profitability And Audience Ratings

  • 观察电影的收益与观众评论的关系
fig = plt.figure(figsize=(6,10))
ax1 = fig.add_subplot(2,1,1)
ax2 = fig.add_subplot(2,1,2)
ax1.scatter(hollywood_movies["Profitability"], hollywood_movies["Audience Rating"])
ax1.set_ylabel("Audience Rating")
ax1.set_xlabel("Profitability")
ax1.set_title("Hollywood Movies, 2007 - 2011")
ax2.scatter(hollywood_movies["Audience Rating"], hollywood_movies["Profitability"])
ax2.set_ylabel("Profitability")
ax2.set_xlabel("Audience Rating")
ax2.set_title("Hollywood Movies, 2007 - 2011")
plt.show()

这里写图片描述

Scatter Matrix - Profitability And Critic Ratings

上面两个图中都有一个离群点,因此需要过滤掉这个电影然后对其他的数据进行可视化。

  • scatter_matrix函数优点类似Seaborn中的Pairplot函数,就是描述属性的两两匹配,但这里不同的是,只有两个属性,但是匹配的方式是调换横纵坐标并且刻画散点图和柱状图,总共产生四个图。
from pandas.tools.plotting import scatter_matrix
normal_movies = hollywood_movies[hollywood_movies["Profitability"] < 10000]
scatter_matrix(normal_movies[["Profitability", "Audience Rating"]], figsize=(6,6))

这里写图片描述

Box Plot - Audience And Critic Ratings

  • 这里调用的是DataFrame自己的可视化工具,当然也是基于matplotlib的。
normal_movies[["Critic Rating", "Audience Rating"]].plot(kind="box")

这里写图片描述

Box Plot - Critic Vs Audience Ratings Per Year

  • 可视化每年的观众评论和专家评论的盒图,利用Seaborn
fig = plt.figure(figsize=(8,4))
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
normal_movies = normal_movies.sort("Year")
sns.boxplot(data=normal_movies[pd.notnull(normal_movies["Genre"])], x="Year", y="Critic Rating", ax=ax1)
sns.boxplot(data=normal_movies[pd.notnull(normal_movies["Genre"])], x="Year", y="Audience Rating", ax=ax2)
plt.show()

这里写图片描述

Box Plots - Profitable Vs Unprofitable Movies

  • 可视化盈利电影和非盈利电影,创建了一个属性Profitable:是否盈利
def is_profitable(row):
    if row["Profitability"] <= 1.0:
        return False
    return True
normal_movies["Profitable"] = normal_movies.apply(is_profitable, axis=1)
fig = plt.figure(figsize=(12,6))
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
sns.boxplot(x="Profitable", y="Audience Rating", data=normal_movies, ax=ax1)
sns.boxplot(x="Profitable", y="Critic Rating", data=normal_movies, ax=ax2)

这里写图片描述

你可以在创建FigureCanvasTkAgg时,将其参数resize设置为True,这样就可以自动调整大小,避免图片被压缩显示。同时,你需要在scatter_frame上添加鼠标滚动事件的绑定,用来处理滚轮事件。 修改后的代码如下: ```python def scatter(self): self.scatter_frame = Frame(self.init_window_name) # 创建一个竖直滚动条 scrollbar = Scrollbar(self.scatter_frame, orient=tk.VERTICAL) scrollbar.pack(side=tk.RIGHT, fill=tk.Y) self.scatter_frame.place(x=10, y=460, width=750, height=310) fig = plt.figure() ax = fig.add_subplot(111) pd.plotting.scatter_matrix(self.df,alpha = 0.3,figsize = (10,10),grid = True,ax = ax) # 将resize设置为True self.scatter_view = FigureCanvasTkAgg(fig,master = self.scatter_frame, resize=True) self.scatter_view.draw() self.scatter_view.get_tk_widget().pack(side=tk.TOP, fill=tk.BOTH, expand=1) # 将滚动条绑定到FigureCanvasTkAgg上 self.scatter_view.configure(yscrollcommand=scrollbar.set) scrollbar.config(command=self.scatter_view.yview) # 添加鼠标滚动事件的绑定 self.scatter_view.mpl_connect('scroll_event', self.on_scroll) def on_scroll(self, event): # 获取当前滚轮的方向 if event.button == 'up': direction = 1 elif event.button == 'down': direction = -1 else: direction = 0 # 根据滚轮方向调整缩放比例 scale = 1.1 if direction: x, y = event.x, event.y ax = self.scatter_view.figure.axes[0] if direction > 0: # 放大 ax.set_xlim(xdata - scale * (xdata - ax.get_xlim()[0]), xdata + scale * (ax.get_xlim()[1] - xdata)) ax.set_ylim(ydata - scale * (ydata - ax.get_ylim()[0]), ydata + scale * (ax.get_ylim()[1] - ydata)) else: # 缩小 ax.set_xlim(xdata - scale * (xdata - ax.get_xlim()[0]), xdata + scale * (ax.get_xlim()[1] - xdata)) ax.set_ylim(ydata - scale * (ydata - ax.get_ylim()[0]), ydata + scale * (ax.get_ylim()[1] - ydata)) self.scatter_view.draw() ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值