版权声明:本文为博主原创文章,未经博主允许不得转载。
目录
一、理论知识
1.1 论文
- 论文题目:《Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model 》
- 发表时间:KDD 2008
- 论文作者及单位:Yehuda Koren (AT&T Labs – Research)
- 论文地址:https://dl.acm.org/citation.cfm?id=1401944&preflayout=flat
- 论文译文:待整理
1.2 理解
① 均值、用户偏差、电影偏差
② 例如:
二、数据简介
- 推荐系统研究中常用的九大数据集
- 推荐系统数据中的几个类别:
Item: 即我们要推荐的东西,如产品、电影、网页或者一条信息片段
User:对item进行评分以及接受推荐系统推荐的项目的人
Rating:用户对item的偏好的表达。评分可以是二分类的(如喜欢和不喜欢),也可以是整数(如1到5星)或连续(某个间隔的任何值)。 另外,还有一些隐反馈,只记录一个用户是否与一个项目进行了交互。
2.1 描述
These files contain 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000.
1.ratings file description:ratings.dat
UserID | MovieID | Rating | Timestamp |
---|---|---|---|
1 | 1193 | 5 | 978300760 |
1 | 661 | 3 | 978302109 |
… | … | … | … |
- UserIDs range between 1 and 6040
- MovieIDs range between 1 and 3952
- Ratings are made on a 5-star scale (whole-star ratings only)
- Timestamp is represented in seconds since the epoch as returned by time(2)
- Each user has at least 20 ratings
2.users file description:users.dat
UserID | Gender | Age | Occupation | Zip-code |
---|---|---|---|---|
1 | F | 1 | 10 | 48067 |
2 | M | 56 | 16 | 70072 |
… | … | … | … | … |
- Gender is denoted by a “M” for male and “F” for female
- Age is chosen from the following ranges:
* 1: “Under 18”
* 18: “18-24”
* 25: “25-34”
* 35: “35-44”
* 45: “45-49”
* 50: “50-55”
* 56: “56+”- Occupation is chosen from the following choices:
* 0: “other” or not specified
* 1: “academic/educator”
* 2: “artist”
* 3: “clerical/admin”
* 4: “college/grad student”
* 5: “customer service”
* 6: “doctor/health care”
* 7: “executive/managerial”
* 8: “farmer”
* 9: “homemaker”
* 10: “K-12 student”
* 11: “lawyer”
* 12: “programmer”
* 13: “retired”
* 14: “sales/marketing”
* 15: “scientist”
* 16: “self-employed”
* 17: “technician/engineer”
* 18: “tradesman/craftsman”
* 19: “unemployed”
* 20: “writer”
3.movies file description:movies.dat
MovieID | Title | Genres |
---|---|---|
1 | Toy Story (1995) | Animation |
2 | Jumanji (1995) | Adventure |
… | … | … |
- Titles are identical to titles provided by the IMDB (includingyear of release)
- Genres are pipe-separated and are selected from the following genres:
* Action
* Adventure
* Animation
* Children’s
* Comedy
* Crime
* Documentary
* Drama
* Fantasy
* Film-Noir
* Horror
* Musical
* Mystery
* Romance
* Sci-Fi
* Thriller
* War
* Western- Some MovieIDs do not correspond to a movie due to accidental duplicate entries and/or test entries
- Movies are mostly entered by hand, so errors and inconsistencies may exist
三、代码实现
3.1 数据介绍
- 3900个电影 6040个用户
- 数据简介:http://files.grouplens.org/datasets/movielens/ml-1m-README.txt
- 数据下载地址:http://files.grouplens.org/datasets/movielens/ml-1m.zip
- tensorflow下载地址:http://www.lfd.uci.edu/~gohlke/pythonlibs/#tensorflow
# Imports for data io operations
from collections import deque
from six import next
# 调用 reader.py
import reader
# Main imports for training
import tensorflow as tf
import numpy as np
# Evaluate train times per epoch
import time
# Constant seed for replicating training results
np.random.seed(42)
# Number of users in the dataset
u_num = 6040
# Number of movies in the dataset
i_num = 3952
# Number of samples per batch
batch_size = 1000
# Dimensions of the data, 15
dims = 5
# Number of times the network sees all the training data
max_epochs = 50
# Device used for all computations
place_device = '/cpu:0'