python数据分析案例2：电影评分数据集的分析

置顶

JavaNewbie__

已于 2022-08-15 20:57:03 修改

阅读量1.1w

点赞数 11

分类专栏： Python大数据分析文章标签： python 数据分析 pandas

于 2020-06-13 22:32:57 首次发布

本文链接：https://blog.csdn.net/baidu_38591365/article/details/106736184

版权

本文是南京财经大学Mooc课程的学习笔记，介绍如何使用Python的pandas库处理电影评分数据集。首先从grouplens官网获取数据，接着分别读取并解析user、data和item数据文件，特别处理了item数据集中的中文编码问题。然后将三个数据集连接，进行数据分析，包括查看不同职业和性别的平均评分，电影评分排名，并对评分次数过少的电影进行过滤。

摘要由CSDN通过智能技术生成

这里是南京财经大学的Mooc课程的个人学习笔记，课程网址是：https://www.icourse163.org/course/NJUE-1458311167，课程是免费的，老师讲的很好很认真，欢迎学习。

数据集的获取：
1、grouplens官方网址
在这里插入图片描述
或者

2、

链接: https://files.grouplens.org/datasets/movielens/ml-100k.zip

使用上述压缩包中的以下三个文件
在这里插入图片描述
有关上述三个文件的介绍：

u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.
              Each user has rated at least 20 movies.  Users and items are
              numbered consecutively from 1.  The data is randomly
              ordered. This is a tab separated list of 
	         user id | item id | rating | timestamp. 
              The time stamps are unix seconds since 1/1/1970 UTC   


u.item     -- Information about the items (movies); this is a tab separated
              list of
              movie id | movie title | release date | video release date |
              IMDb URL | unknown | Action | Adventure | Animation |
              Children's | Comedy | Crime | Documentary | Drama | Fantasy |
              Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
              Thriller | War | Western |
              The last 19 fields are the genres, a 1 indicates the movie
              is of that genre, a 0 indicates it is not; movies can be in
              several genres at once.
              The movie ids are the ones used in the u.data data