利用LightFM实现电影推荐系统——代码详解

1 篇文章 0 订阅
1 篇文章 0 订阅

有关本文的资源如下:

LightFM官方文档 -> Quickstar

Recommendation Systems - Learn Python for Data Science  by Siraj Raval


import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM

# fetch data and format it
data = fetch_movielens(min_rating=4.0)  # only collect the movies with a rating of 4 or higher

# print training and testing data
print(repr(data['train']))
print(repr(data['test']))
'''repr()函数将对象转化为供解释器读取的形式'''

# create model
model = LightFM(loss='warp')  # warp = weighted approximate-rank pairwise
'''
warp helps us create recommendations for each user by looking at the existing user rating pairs
and predicting rankings for each, it uses the gradient descent algorithm to iteratively find the
weights that improve our prediction over time. This takes into account both the user's past rating
history content based and similar user ratings collaborative, it's a hybrid system.

WARP is an implicit feedback model: all interactions in the training matrix are treated as positive 
signals, and products that users did not interact with they implicitly do not like. The goal of the 
model is to score these implicit positives highly while assigning low scores to implicit negatives.

'''
# train model
model.fit(data['train'], epochs=30, num_threads=2)
'''
parameters: the data set we want to train it on,
            the number of epochs we want to run the training for,
            the number of threads we want to run this on
Model training is accomplished via SGD (stochastic gradient descent). This means that for every pass through 
the data — an epoch — the model learns to fit the data more and more closely. We’ll run it for 10 epochs in 
this example. We can also run it on multiple cores, so we’ll set that to 2. (The dataset in this example is 
too small for that to make a difference, but it will matter on bigger datasets.)
'''

def sample_recommendation(model, data, user_ids):
    # our model, our data and a list of user ids(these are users we want to generate recommendations for)

    # number of users and movies in training data
    n_users, n_items = data['train'].shape

    # generate recommendation for each user we input
    '''
    iterate through every user id that we input and say that we want the list of known positives for each line
    if M considers ratings that are 5 positive and ratings that are 4 or below negative to make the problem binary 
    much simplers
    '''
    for user_id in user_ids:

        # movies they already like
        known_positives = data['item_labels'][data['train'].tocsr()[user_id].indices]
        '''
        data['item_labels']的类型是  <class 'numpy.ndarray'>
        data['train']的类型是  <class 'scipy.sparse.coo.coo_matrix'> 即 坐标形式的一种稀疏矩阵
            # tocsr() 的作用是  Return a copy of this matrix in Compressed Sparse Row format
            # coo_matrix.tocsr() 将把coo_matrix转化为csr_matrix,所以,
        data['train'].tocsr()的类型是 <class 'scipy.sparse.csr.csr_matrix'> 即 压缩的行稀疏矩阵
        data['train'].tocsr()[user_id] 的类型也是 <class 'scipy.sparse.csr.csr_matrix'>
        data['train'].tocsr()[user_id].indices 的类型是 <class 'numpy.ndarray'>
            # indices属性的作用是返回	CSR format index array of the matrix
            
        总之,data['train'].tocsr()[2].indices 获取  user_id=2 的观众打分为5的电影索引数组
        data['item_labels'][...]  根据索引数组,输出对应的电影名称
        
        '''
        # movies our model predicts they will like
        scores = model.predict(user_id, np.arange(n_items))
        '''np.arange()用于创建等差数组,返回一个array对象'''
        # rank them in order of most liked to least
        top_items = data['item_labels'][np.argsort(-scores)]
        '''np.argsort(x)返回数组值从小到大的索引值,np.argsort(-x)按降序排列'''

        # print out the results
        print("User %s" % user_id)
        print("      Known positives:")

        for x in known_positives[:3]:
            print("         %s" % x)
        print("      Recommended:")
        for x in top_items[:3]:
            print("         %s" % x)


sample_recommendation(model, data, [3,25,450])



  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
一、机器学习与推荐系统课程简介伴随着大数据时代的到来,作为发掘数据规律的重要手段,机器学习已经受到了越来越多的关注。而作为机器学习算法在大数据上的典型应用,推荐系统已成为各行业互联网公司营销体系中不可或缺的一部分,而且已经带来了真实可见的收益。目前,推荐系统和机器学习已经成为各大公司的发力重点,众多知名公司(如亚马逊、netflix、facebook、阿里巴巴、京东、腾讯、新浪、头条等)都在着眼于将蕴含在庞大数据中的宝藏发掘出来,懂机器学习算法的大数据工程师也成为了新时代最紧缺的人才。尚硅谷精心打造出了机器学习与推荐系统课程,将机器学习理论与推荐系统项目实战并重,对机器学习和推荐系统基础知识做了系统的梳理和阐述,并通过电影推荐网站的具体项目进行了实战演练。为有志于增加大数据项目经验、扩展机器学习发展方向的工程师提供最好的学习平台。二、课程内容和目标本课程主要分为两部分,机器学习和推荐系统基础,与电影推荐系统项目实战。第一部分主要是机器学习和推荐系统基础理论的讲解,涉及到各种重要概念和基础算法,并对一些算法用python做了实现;第二部分以电影网站作为业务应用场景,介绍推荐系统的开发实战。其中包括了如统计推荐、基于LFM的离线推荐、基于模型的实时推荐、基于内容的推荐等多个模块的代码实现,并与各种工具进行整合互接,构成完整的项目应用。通过理论和实际的紧密结合,可以使学员对推荐系统这一大数据应用有充分的认识和理解,在项目实战中对大数据的相关工具和知识做系统的回顾,并且可以掌握基本算法,入门机器学习这一前沿领域,为未来发展提供更多的选择,打开通向算法工程师的大门。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值