推荐系统(二):使用Tensorflow构造隐语义模型 —— MovieLens电影评分

版权声明:本文为博主原创文章,未经博主允许不得转载



一、理论知识

1.1 论文

  • 论文题目:《Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model 》
  • 发表时间:KDD 2008
  • 论文作者及单位:Yehuda Koren (AT&T Labs – Research)
  • 论文地址:https://dl.acm.org/citation.cfm?id=1401944&preflayout=flat
  • 论文译文:待整理

1.2 理解

均值、用户偏差、电影偏差在这里插入图片描述
② 例如:在这里插入图片描述

二、数据简介

  • 推荐系统研究中常用的九大数据集
  • 推荐系统数据中的几个类别:
    Item: 即我们要推荐的东西,如产品、电影、网页或者一条信息片段
    User:对item进行评分以及接受推荐系统推荐的项目的人
    Rating:用户对item的偏好的表达。评分可以是二分类的(如喜欢和不喜欢),也可以是整数(如1到5星)或连续(某个间隔的任何值)。 另外,还有一些隐反馈,只记录一个用户是否与一个项目进行了交互。

2.1 描述

These files contain 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000.

1.ratings file description:ratings.dat

UserID MovieID Rating Timestamp
1 1193 5 978300760
1 661 3 978302109
  • UserIDs range between 1 and 6040
  • MovieIDs range between 1 and 3952
  • Ratings are made on a 5-star scale (whole-star ratings only)
  • Timestamp is represented in seconds since the epoch as returned by time(2)
  • Each user has at least 20 ratings

2.users file description:users.dat

UserID Gender Age Occupation Zip-code
1 F 1 10 48067
2 M 56 16 70072
  • Gender is denoted by a “M” for male and “F” for female
  • Age is chosen from the following ranges:
         * 1: “Under 18”
         * 18: “18-24”
         * 25: “25-34”
         * 35: “35-44”
         * 45: “45-49”
         * 50: “50-55”
         * 56: “56+”
  • Occupation is chosen from the following choices:
         * 0: “other” or not specified
         * 1: “academic/educator”
         * 2: “artist”
         * 3: “clerical/admin”
         * 4: “college/grad student”
         * 5: “customer service”
         * 6: “doctor/health care”
         * 7: “executive/managerial”
         * 8: “farmer”
         * 9: “homemaker”
         * 10: “K-12 student”
         * 11: “lawyer”
         * 12: “programmer”
         * 13: “retired”
         * 14: “sales/marketing”
         * 15: “scientist”
         * 16: “self-employed”
         * 17: “technician/engineer”
         * 18: “tradesman/craftsman”
         * 19: “unemployed”
         * 20: “writer”

3.movies file description:movies.dat

MovieID Title Genres
1 Toy Story (1995) Animation
2 Jumanji (1995) Adventure
  • Titles are identical to titles provided by the IMDB (includingyear of release)
  • Genres are pipe-separated and are selected from the following genres:
         * Action
         * Adventure
         * Animation
         * Children’s
         * Comedy
         * Crime
         * Documentary
         * Drama
         * Fantasy
         * Film-Noir
         * Horror
         * Musical
         * Mystery
         * Romance
         * Sci-Fi
         * Thriller
         * War
         * Western
  • Some MovieIDs do not correspond to a movie due to accidental duplicate entries and/or test entries
  • Movies are mostly entered by hand, so errors and inconsistencies may exist

三、代码实现

3.1 数据介绍

# Imports for data io operations
from collections import deque
from six import next
# 调用 reader.py
import reader

# Main imports for training
import tensorflow as tf
import numpy as np

# Evaluate train times per epoch
import time
# Constant seed for replicating training results
np.random.seed(42)

# Number of users in the dataset
u_num = 6040

# Number of movies in the dataset
i_num = 3952

# Number of samples per batch
batch_size = 1000

# Dimensions of the data, 15
dims = 5

# Number of times the network sees all the training data
max_epochs = 50

# Device used for all computations
place_device = '/cpu:0'

3.2 加载数据、划

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

InitialHeart2021

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值