推荐系统（二）：使用Tensorflow构造隐语义模型 —— MovieLens电影评分

最新推荐文章于 2023-04-13 20:36:47 发布

InitialHeart2021

最新推荐文章于 2023-04-13 20:36:47 发布

阅读量261

点赞数

分类专栏：八【推荐系统实战】基于Tensorflow 文章标签： tensorflow python 推荐系统

本文链接：https://blog.csdn.net/ITCLSJ/article/details/115728986

版权

一、理论知识

1.1 论文

论文题目：《Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model 》

发表时间：KDD 2008

论文作者及单位：Yehuda Koren （AT&T Labs – Research）

论文地址：https://dl.acm.org/citation.cfm?id=1401944&preflayout=flat

论文译文：待整理

1.2 理解

① 均值、用户偏差、电影偏差
② 例如：

二、数据简介

推荐系统研究中常用的九大数据集

推荐系统数据中的几个类别：
Item：即我们要推荐的东西，如产品、电影、网页或者一条信息片段
User：对item进行评分以及接受推荐系统推荐的项目的人
Rating：用户对item的偏好的表达。评分可以是二分类的（如喜欢和不喜欢），也可以是整数（如1到5星）或连续（某个间隔的任何值）。另外，还有一些隐反馈，只记录一个用户是否与一个项目进行了交互。

2.1 描述

These files contain 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000.

1.ratings file description：ratings.dat

UserID	MovieID	Rating	Timestamp
1	1193	5	978300760
1	661	3	978302109
…	…	…	…

UserIDs range between 1 and 6040

MovieIDs range between 1 and 3952

Ratings are made on a 5-star scale (whole-star ratings only)

Timestamp is represented in seconds since the epoch as returned by time(2)

Each user has at least 20 ratings

2.users file description：users.dat

UserID	Gender	Age	Occupation	Zip-code
1	F	1	10	48067
2	M	56	16	70072
…	…	…	…	…

Gender is denoted by a “M” for male and “F” for female

Age is chosen from the following ranges：
* 1: “Under 18”
* 18: “18-24”
* 25: “25-34”
* 35: “35-44”
* 45: “45-49”
* 50: “50-55”
* 56: “56+”

Occupation is chosen from the following choices：
* 0: “other” or not specified
* 1: “academic/educator”
* 2: “artist”
* 3: “clerical/admin”
* 4: “college/grad student”
* 5: “customer service”
* 6: “doctor/health care”
* 7: “executive/managerial”
* 8: “farmer”
* 9: “homemaker”
* 10: “K-12 student”
* 11: “lawyer”
* 12: “programmer”
* 13: “retired”
* 14: “sales/marketing”
* 15: “scientist”
* 16: “self-employed”
* 17: “technician/engineer”
* 18: “tradesman/craftsman”
* 19: “unemployed”
* 20: “writer”

3.movies file description：movies.dat

MovieID	Title	Genres
1	Toy Story (1995)	Animation
2	Jumanji (1995)	Adventure
…	…	…

Titles are identical to titles provided by the IMDB (includingyear of release)

Genres are pipe-separated and are selected from the following genres：
* Action
* Adventure
* Animation
* Children’s
* Comedy
* Crime
* Documentary
* Drama
* Fantasy
* Film-Noir
* Horror
* Musical
* Mystery
* Romance
* Sci-Fi
* Thriller
* War
* Western

Some MovieIDs do not correspond to a movie due to accidental duplicate entries and/or test entries

Movies are mostly entered by hand, so errors and inconsistencies may exist

三、代码实现

3.1 数据介绍

3900个电影 6040个用户

数据简介：http://files.grouplens.org/datasets/movielens/ml-1m-README.txt

数据下载地址：http://files.grouplens.org/datasets/movielens/ml-1m.zip

tensorflow下载地址：http://www.lfd.uci.edu/~gohlke/pythonlibs/#tensorflow

# Imports for data io operations
from collections import deque
from six import next
# 调用 reader.py
import reader

# Main imports for training
import tensorflow as tf
import numpy as np

# Evaluate train times per epoch
import time

# Constant seed for replicating training results
np.random.seed(42)

# Number of users in the dataset
u_num = 6040

# Number of movies in the dataset
i_num = 3952

# Number of samples per batch
batch_size = 1000

# Dimensions of the data, 15
dims = 5

# Number of times the network sees all the training data
max_epochs = 50

# Device used for all computations
place_device = '/cpu:0'