【机器学习开放项目】Netflix评分数据集

在这里插入图片描述

Netflix评分数据集提供了1亿条“用户X在2005年2月12日将电影Y评分为4.0”的记录。

The Netflix Prize data set gives 100 million records of the form “user X rated movie Y a 4.0 on 2/12/05”.

项目思路:

基于一位用户的历史评分行为,你能预测该用户对未来某部电影的评分吗?

Can you predict the rating a user will give on a movie from the movies that user has rated in the past, as well as the ratings similar users have given similar movies?

你能找出类似的电影或用户群体吗?

Can you discover clusters of similar movies or users?

你能预测2006年哪些用户给哪些电影的评分吗?

Can you predict which users rated which movies in 2006?

换言之,你的任务是预测2006年每一对被评分的可能性。

In other words, your task is to predict the probability that each pair was rated in 2006.

请注意,实际的评分是不相关的,我们只想知道2006年某个时候该用户是否对这部电影进行了评分。

Note that the actual rating is irrelevant, and we just want whether the movie was rated by that user sometime in 2006.

2006年用户给出评分的具体日期也无关紧要。

The date in 2006 when the rating was given is also irrelevant.

测试数据可以在以下网站找到:

The test data can be found at this website.

https://www.netflixprize.com/

更多精彩文章请关注微信号:在这里插入图片描述

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
著名的Netflix 智能推荐 百万美金大奖赛使用是数据集. 因为竞赛关闭, Netflix官网上已无法下载. Netflix provided a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each training rating is a quadruplet of the form . The user and movie fields are integer IDs, while grades are from 1 to 5 (integral) stars.[3] The qualifying data set contains over 2,817,131 triplets of the form , with grades known only to the jury. A participating team's algorithm must predict grades on the entire qualifying set, but they are only informed of the score for half of the data, the quiz set of 1,408,342 ratings. The other half is the test set of 1,408,789, and performance on this is used by the jury to determine potential prize winners. Only the judges know which ratings are in the quiz set, and which are in the test set—this arrangement is intended to make it difficult to hill climb on the test set. Submitted predictions are scored against the true grades in terms of root mean squared error (RMSE), and the goal is to reduce this error as much as possible. Note that while the actual grades are integers in the range 1 to 5, submitted predictions need not be. Netflix also identified a probe subset of 1,408,395 ratings within the training data set. The probe, quiz, and test data sets were chosen to have similar statistical properties. In summary, the data used in the Netflix Prize looks as follows: Training set (99,072,112 ratings not including the probe set, 100,480,507 including the probe set) Probe set (1,408,395 ratings) Qualifying set (2,817,131 ratings) consisting of: Test set (1,408,789 ratings), used to determine winners Quiz set (1,408,342 ratings), used to calculate leaderboard scores For each movie, title and year of release are provided in a separate dataset. No information at all is provided about users. In order to protect the privacy of customers, "some of the rating data for some customers in the training and qualifyin
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值