数据可视化分析票房数据报告_票房收入分析和可视化

最新推荐文章于 2024-06-15 00:34:10 发布

weixin_26746401

最新推荐文章于 2024-06-15 00:34:10 发布

阅读量3.1k

点赞数

文章标签：可视化数据可视化 python 数据分析 java

原文链接：https://towardsdatascience.com/box-office-revenue-analysis-and-visualization-ce5b81a636d7

版权

数据可视化分析票房数据报告

Welcome back to my 100 Days of Data Science Challenge Journey. On day 4 and 5, I work on TMDB Box Office Prediction Dataset available on Kaggle.

欢迎回到我的100天数据科学挑战之旅。在第4天和第5天，我将研究Kaggle上提供的TMDB票房预测数据集。

I’ll start by importing some useful libraries that we need in this task.

我将从导入此任务中需要的一些有用的库开始。

import pandas as pd# for visualizations
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use('dark_background')

数据加载与探索 (Data Loading and Exploration)

Once you downloaded data from the Kaggle, you will have 3 files. As this is a prediction competition, you have train, test, and sample_submission file. For this project, my motive is only to perform data analysis and visuals. I am going to ignore test.csv and sample_submission.csv files.

从Kaggle下载数据后，您将拥有3个文件。由于这是一场预测比赛，因此您具有训练，测试和sample_submission文件。对于这个项目，我的动机只是执行数据分析和视觉效果。我将忽略test.csv和sample_submission.csv文件。

Let’s load train.csv in data frame using pandas.

让我们使用熊猫在数据框中加载train.csv。

%time train = pd.read_csv('./data/tmdb-box-office-prediction/train.csv')# output
CPU times: user 258 ms, sys: 132 ms, total: 389 ms
Wall time: 403 ms

关于数据集： (About the dataset:)

id: Integer unique id of each moviebelongs_to_collection: Contains the TMDB Id, Name, Movie Poster, and Backdrop URL of a movie in JSON format.budget: Budget of a movie in dollars. Some row contains 0 values, which mean unknown.genres: Contains all the Genres Name & TMDB Id in JSON Format.homepage: Contains the official URL of a movie.imdb_id: IMDB id of a movie (string).original_language: Two-digit code of the original language, in which the movie was made.original_title: The original title of a movie in original_language.overview: Brief description of the movie.popularity: Popularity of the movie.poster_path: Poster path of a movie. You can see full poster image by adding URL after this link → https://image.tmdb.org/t/p/original/production_companies: All production company name and TMDB id in JSON format of a movie.production_countries: Two-digit code and the full name of the production company in JSON format.release_date: The release date of a movie in mm/dd/yy format.runtime: Total runtime of a movie in minutes (Integer).spoken_languages: Two-digit code and the full name of the spoken language.status: Is the movie released or rumored?tagline: Tagline of a movietitle: English title of a movieKeywords: TMDB Id and name of all the keywords in JSON format.cast: All cast TMDB id, name, character name, gender (1 = Female, 2 = Male) in JSON formatcrew: Name, TMDB id, profile path of various kind of crew members job like Director, Writer, Art, Sound, etc.revenue: Total revenue earned by a movie in dollars.

Let’s have a look at the sample data.

让我们看一下样本数据。

train.head()

As we can see that some features have dictionaries, hence I am dropping all such columns for now.

如我们所见，某些功能具有字典，因此我暂时删除所有此类列。

train = train.drop(['belongs_to_collection', 'genres', 'crew',
'cast', 'Keywords', 'spoken_languages', 'production_companies', 'production_countries', 'tagline','overview','homepage'], axis=1)

Now it time to have a look at statistics of the data.

现在该看一下数据统计了。

print("Shape of data is ")
train.shape# OutputShape of data is
(3000, 12)

Dataframe information.

数据框信息。

最低0.47元/天解锁文章

weixin_26746401

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
数据可视化分析票房数据报告_票房收入分析和可视化

数据可视化分析票房数据报告Welcome back to my 100 Days of Data Science Challenge Journey. On day 4 and 5, I work on TMDB Box Office Prediction Dataset available on Kaggle. 欢迎回到我的100天数据科学挑战之旅。在第4天和第5天，我将研究Kaggle上提供...
复制链接

扫一扫