利用LightFM实现电影推荐系统——代码详解

最新推荐文章于 2024-02-22 07:24:03 发布

小饼干超人

最新推荐文章于 2024-02-22 07:24:03 发布

阅读量5.1k

点赞数

分类专栏： python LightFM 推荐系统文章标签：推荐系统 LightFM

本文链接：https://blog.csdn.net/m0_37586991/article/details/79943400

版权

python 同时被 3 个专栏收录

82 篇文章 3 订阅

订阅专栏

LightFM

1 篇文章 0 订阅

订阅专栏

推荐系统

1 篇文章 0 订阅

订阅专栏

有关本文的资源如下：

LightFM官方文档 -> Quickstar

Recommendation Systems - Learn Python for Data Science by Siraj Raval

import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM

# fetch data and format it
data = fetch_movielens(min_rating=4.0)  # only collect the movies with a rating of 4 or higher

# print training and testing data
print(repr(data['train']))
print(repr(data['test']))
'''repr()函数将对象转化为供解释器读取的形式'''

# create model
model = LightFM(loss='warp')  # warp = weighted approximate-rank pairwise
'''
warp helps us create recommendations for each user by looking at the existing user rating pairs
and predicting rankings for each, it uses the gradient descent algorithm to iteratively find the
weights that improve our prediction over time. This takes into account both the user's past rating
history content based and similar user ratings collaborative, it's a hybrid system.

WARP is an implicit feedback model: all interactions in the training matrix are treated as positive 
signals, and products that users did not interact with they implicitly do not like. The goal of the 
model is to score these implicit positives highly while assigning low scores to implicit negatives.

'''
# train model
model.fit(data['train'], epochs=30, num_threads=2)
'''
parameters: the data set we want to train it on,
            the number of epochs we want to run the training for,
            the number of threads we want to run this on
Model training is accomplished via SGD (stochastic gradient descent). This means that for every pass through 
the data — an epoch — the model learns to fit the data more and more closely. We’ll run it for 10 epochs in 
this example. We can also run it on multiple cores, so we’ll set that to 2. (The dataset in this example is 
too small for that to make a difference, but it will matter on bigger datasets.)
'''

def sample_recommendation(model, data, user_ids):
    # our model, our data and a list of user ids(these are users we want to generate recommendations for)

    # number of users and movies in training data
    n_users, n_items = data['train'].shape

    # generate recommendation for each user we input
    '''
    iterate through every user id that we input and say that we want the list of known positives for each line
    if M considers ratings that are 5 positive and ratings that are 4 or below negative to make the problem binary 
    much simplers
    '''
    for user_id in user_ids:

        # movies they already like
        known_positives = data['item_labels'][data['train'].tocsr()[user_id].indices]
        '''
        data['item_labels']的类型是  <class 'numpy.ndarray'>
        data['train']的类型是  <class 'scipy.sparse.coo.coo_matrix'> 即 坐标形式的一种稀疏矩阵
            # tocsr() 的作用是  Return a copy of this matrix in Compressed Sparse Row format
            # coo_matrix.tocsr() 将把coo_matrix转化为csr_matrix，所以，
        data['train'].tocsr()的类型是 <class 'scipy.sparse.csr.csr_matrix'> 即 压缩的行稀疏矩阵
        data['train'].tocsr()[user_id] 的类型也是 <class 'scipy.sparse.csr.csr_matrix'>
        data['train'].tocsr()[user_id].indices 的类型是 <class 'numpy.ndarray'>
            # indices属性的作用是返回	CSR format index array of the matrix
            
        总之，data['train'].tocsr()[2].indices 获取  user_id=2 的观众打分为5的电影索引数组
        data['item_labels'][...]  根据索引数组，输出对应的电影名称
        
        '''
        # movies our model predicts they will like
        scores = model.predict(user_id, np.arange(n_items))
        '''np.arange()用于创建等差数组，返回一个array对象'''
        # rank them in order of most liked to least
        top_items = data['item_labels'][np.argsort(-scores)]
        '''np.argsort(x)返回数组值从小到大的索引值,np.argsort(-x)按降序排列'''

        # print out the results
        print("User %s" % user_id)
        print("      Known positives:")

        for x in known_positives[:3]:
            print("         %s" % x)
        print("      Recommended:")
        for x in top_items[:3]:
            print("         %s" % x)


sample_recommendation(model, data, [3,25,450])