基于MovieLens的电影推荐系统

最新推荐文章于 2024-01-09 14:39:32 发布

weixin_26752765

最新推荐文章于 2024-01-09 14:39:32 发布

阅读量5k

点赞数 1

文章标签： python java linux 推荐系统人工智能

原文链接：https://towardsdatascience.com/movie-recommendation-system-based-on-movielens-ef0df580cd0e

版权

本文介绍了作者如何使用MovieLens数据集构建一个类似TikTok、YouTube和Netflix的电影推荐系统。文章探讨了数据集的结构，包括movies.csv、ratings_small.csv和links.csv，并讨论了两种常见的推荐技术——协作过滤和内容过滤。作者计划逐步实施这两种技术。

摘要由CSDN通过智能技术生成

This summer I was privileged to collaborate with Made With ML to experience a meaningful incubation towards data science. I chose the awesome MovieLens dataset and managed to create a movie recommendation system that somehow simulates some of the most successful recommendation engine products, such as TikTok, YouTube, and Netflix.

Ť他的夏天，我有幸与合作由具有ML体验对数据的科学有意义的孵化。我选择了很棒的MovieLens数据集，并设法创建了一个电影推荐系统，该系统以某种方式模拟了一些最成功的推荐引擎产品，例如TikTok，YouTube和Netflix。

This article is going to explain how I worked throughout the entire life cycle of this project, and provide my solutions to some technical issues.

本文将解释我在该项目的整个生命周期中的工作方式，并提供一些技术问题的解决方案。

主意 (Ideas)

At first glance at the dataset, there are three tables in total:

乍一看数据集，总共有三个表：

movies.csv: This is the table that contains all the information about the movies, including title, tagline, description, etc. There are 21 features/columns totally, so we candidates can either just focus on some of them or try utilizing all of them.
films.csv ：此表包含有关电影的所有信息，包括标题，标语，描述等。总共有21个功能/列，因此我们的候选人可以只专注于其中某些功能，也可以尝试利用所有这些功能/列他们。
ratings_small.csv: A table that records all the users’ rating behaviors, covering their rates and the time stamp when they posted the rates.
rating_small .csv：该表记录所有用户的评分行为，包括其费率和发布费率时的时间戳。
links.csv: A table that records each movie’s unique ID on two respective movie database: IMDB and TMDB.
links.csv ：一个表，用于在两个相应的电影数据库(IMDB和TMDB)上记录每个电影的唯一ID。

There are two common recommendation filtering techniques: collaborative filtering and content filtering. Collaborative filtering requires the model to learn the connections/similarity between users so that it can generate the best recommendation options based on users’ previous choices, preferences, or tastes. And content filtering needs the profile of both the users and the items so that the system can determine the recommendation according to users’ and items’ common properties.

有两种常见的推荐过滤技术：协作过滤和内容过滤。协作过滤要求模型学习用户之间的联系/相似性，以便它可以根据用户的先前选择，偏好或喜好生成最佳推荐选项。内容过滤需要用户和项目的配置文件，以便系统可以根据用户和项目的共同属性确定推荐。

Now I am going to try both of them step by step.

现在，我将逐步尝试它们。

协同过滤 (Collaborative Filtering)

Collaborative filtering just requires me to keep track of users’ previous behaviors, say, how much they preferred a movie in the past. And fortunately, we are already provided with this sort of information because the data in table ratings_small.csv exactly reflects this. To implement this technique, I applied the wonderful Python Library Surprise. It provides a set of built-in algorithms that are commonly used in recommendation system development. I chose 5 methods to compare their accuracy with RMSE as the measure and the result is as follows:

协作过滤仅要求我跟踪用户以前的行为，例如他们过去喜欢电影的程度。幸运的是，已经为我们提供了这类信息，因为表ratings_small.csv中的数据恰好反映了这一点。为了实现此技术，我应用了精彩的Python Library Surprise 。它提供了一组推荐系统开发中常用的内置算法。我选择了5种方法，将它们的精度与RMSE进行比较，结果如下：