基于MovieLens的电影推荐系统

This summer I was privileged to collaborate with Made With ML to experience a meaningful incubation towards data science. I chose the awesome MovieLens dataset and managed to create a movie recommendation system that somehow simulates some of the most successful recommendation engine products, such as TikTok, YouTube, and Netflix.

Ť他的夏天,我有幸与合作由具有ML体验对数据的科学有意义的孵化。 我选择了很棒的MovieLens数据集,并设法创建了一个电影推荐系统,该系统以某种方式模拟了一些最成功的推荐引擎产品,例如TikTok,YouTube和Netflix。

This article is going to explain how I worked throughout the entire life cycle of this project, and provide my solutions to some technical issues.

本文将解释我在该项目的整个生命周期中的工作方式,并提供一些技术问题的解决方案。

主意 (Ideas)

At first glance at the dataset, there are three tables in total:

乍一看数据集,总共有三个表:

  • movies.csv: This is the table that contains all the information about the movies, including title, tagline, description, etc. There are 21 features/columns totally, so we candidates can either just focus on some of them or try utilizing all of them.

    films.csv :此表包含有关电影的所有信息,包括标题,标语,描述等。总共有21个功能/列,因此我们的候选人可以只专注于其中某些功能,也可以尝试利用所有这些功能/列他们。

  • ratings_small.csv: A table that records all the users’ rating behaviors, covering their rates and the time stamp when they posted the rates.

    rating_small .csv:该表记录所有用户的评分行为,包括其费率和发布费率时的时间戳。

  • links.csv: A table that records each movie’s unique ID on two respective movie database: IMDB and TMDB.

    links.csv :一个表,用于在两个相应的电影数据库(IMDB和TMDB)上记录每个电影的唯一ID。

There are two common recommendation filtering techniques: collaborative filtering and content filtering. Collaborative filtering requires the model to learn the connections/similarity between users so that it can generate the best recommendation options based on users’ previous choices, preferences, or tastes. And content filtering needs the profile of both the users and the items so that the system can determine the recommendation according to users’ and items’ common properties.

有两种常见的推荐过滤技术:协作过滤和内容过滤。 协作过滤要求模型学习用户之间的联系/相似性,以便它可以根据用户的先前选择,偏好或喜好生成最佳推荐选项。 内容过滤需要用户和项目的配置文件,以便系统可以根据用户和项目的共同属性确定推荐。

Now I am going to try both of them step by step.

现在,我将逐步尝试它们。

协同过滤 (Collaborative Filtering)

Collaborative filtering just requires me to keep track of users’ previous behaviors, say, how much they preferred a movie in the past. And fortunately, we are already provided with this sort of information because the data in table ratings_small.csv exactly reflects this. To implement this technique, I applied the wonderful Python Library Surprise. It provides a set of built-in algorithms that are commonly used in recommendation system development. I chose 5 methods to compare their accuracy with RMSE as the measure and the result is as follows:

协作过滤仅要求我跟踪用户以前的行为,例如他们过去喜欢电影的程度。 幸运的是,已经为我们提供了这类信息,因为表ratings_small.csv中的数据恰好反映了这一点 为了实现此技术,我应用了精彩的Python Library Surprise 。 它提供了一组推荐系统开发中常用的内置算法。 我选择了5种方法,将它们的精度与RMSE进行比较,结果如下:

  • 1
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值