Kaggle——TMDB 5000 Movie Dataset电影数据分析

一、碎碎念

    因为工作上有用到Excel做数据分析,之后慢慢接触到了Python做分析,做挖掘等。再然后就遇到了Kaggle这个网站,发现这里真是让人提升技能的圣地。一直在找些可以提升自己数据分析技能、思维的项目来练习,下面主要会展示一些自己的分析思路,可视化图表,以及代码。
    我的分析环境是win7 64位 ,anaconda-spyder(Python3.6)

    看了kaggle上这个项目各路大神的代码思路,然后自己也跃跃欲试要操刀一练。分析完这个项目,给自己的领悟是对于部分的语法,函数,有个进一步的理解,对于分析项目应该怎么一步步的分析也有了更新,在初识这个项目的时候是直接杠正面各种尝试,各种推理,毕竟一个人闭门造车有时候会造不出车。

    写这篇文章也算是对这个项目的一个回顾和总结。

    提醒一点,在kaggle上注册账号是可以的,但是要通过邮箱激活账户的时候需要一个VPN。(我用的163的邮箱)

二、项目背景

本文中用到的数据文件:tmdb_5000_movies.csvtmdb_5000_credits.csv是Kaggle平台上的项目TMDB(The Movie Database),共计4803部电影,主要为美国地区一百年间(1916-2017)的电影作品。

本文通过对电影数据的分析,利用数据可视化的方法,发现流行趋势,找到投资方向,为本行业新入局者提供一定参考建议。同时也为了提升自己的数据分析能力,在遇到类似项目可以触类旁通。

三、项目概览

点击下图可以直接链接到Kaggle对应的项目:

下面是官网内容简介:

Background
What can we say about the success of a movie before it is released? Are there certain companies (Pixar?) that have found a consistent formula? Given that major films costing over $100 million to produce can still flop, this question is more important than ever to the industry. Film aficionados might have different interests. Can we predict which films will be highly rated, whether or not they are a commercial success?

This is a great place to start digging in to those questions, with data on the plot, cast, crew, budget, and revenues of several thousand films.

We (Kaggle) have removed the original version of this dataset per a DMCA takedown request from IMDB. In order to minimize the impact, we're replacing it with a similar set of films and data fields from The Movie Database (TMDb) in accordance with their terms of use. The bad news is that kernels built on the old dataset will most likely no longer work.

The good news is that:

  • You can port your existing kernels over with a bit of editing. This kernel offers functions and examples for doing so. You can also find a general introduction to the new format here.

  • The new dataset contains full credits for both the cast and the crew, rather than just the first three actors.

  • Actor and actresses are now listed in the order they appear in the credits. It's unclear what ordering the original dataset used; for the movies I spot checked it didn't line up with either the credits order or IMDB's stars order.

  • The revenues appear to be more current. For example, IMDB's figures for Ava

  • 17
    点赞
  • 139
    收藏
    觉得还不错? 一键收藏
  • 16
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 16
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值