利用LightFM实现电影推荐系统——代码详解

1 篇文章 0 订阅
1 篇文章 0 订阅

有关本文的资源如下:

LightFM官方文档 -> Quickstar

Recommendation Systems - Learn Python for Data Science  by Siraj Raval


import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM

# fetch data and format it
data = fetch_movielens(min_rating=4.0)  # only collect the movies with a rating of 4 or higher

# print training and testing data
print(repr(data['train']))
print(repr(data['test']))
'''repr()函数将对象转化为供解释器读取的形式'''

# create model
model = LightFM(loss='warp')  # warp = weighted approximate-rank pairwise
'''
warp helps us create recommendations for each user by looking at the existing user rating pairs
and predicting rankings for each, it uses the gradient descent algorithm to iteratively find the
weights that improve our prediction over time. This takes into account both the user's past rating
history content based and similar user ratings collaborative, it's a hybrid system.

WARP is an implicit feedback model: all interactions in the training matrix are treated as positive 
signals, and products that users did not interact with they implicitly do not like. The goal of the 
model is to score these implicit positives highly while assigning low scores to implicit negatives.

'''
# train model
model.fit(data['train'], epochs=30, num_threads=2)
'''
parameters: the data set we want to train it on,
            the number of epochs we want to run the training for,
            the number of threads we want to run this on
Model training is accomplished via SGD (stochastic gradient descent). This means that for every pass through 
the data — an epoch — the model learns to fit the data more and more closely. We’ll run it for 10 epochs in 
this example. We can also run it on multiple cores, so we’ll set that to 2. (The dataset in this example is 
too small for that to make a difference, but it will matter on bigger datasets.)
'''

def sample_recommendation(model, data, user_ids):
    # our model, our data and a list of user ids(these are users we want to generate recommendations for)

    # number of users and movies in training data
    n_users, n_items = data['train'].shape

    # generate recommendation for each user we input
    '''
    iterate through every user id that we input and say that we want the list of known positives for each line
    if M considers ratings that are 5 positive and ratings that are 4 or below negative to make the problem binary 
    much simplers
    '''
    for user_id in user_ids:

        # movies they already like
        known_positives = data['item_labels'][data['train'].tocsr()[user_id].indices]
        '''
        data['item_labels']的类型是  <class 'numpy.ndarray'>
        data['train']的类型是  <class 'scipy.sparse.coo.coo_matrix'> 即 坐标形式的一种稀疏矩阵
            # tocsr() 的作用是  Return a copy of this matrix in Compressed Sparse Row format
            # coo_matrix.tocsr() 将把coo_matrix转化为csr_matrix,所以,
        data['train'].tocsr()的类型是 <class 'scipy.sparse.csr.csr_matrix'> 即 压缩的行稀疏矩阵
        data['train'].tocsr()[user_id] 的类型也是 <class 'scipy.sparse.csr.csr_matrix'>
        data['train'].tocsr()[user_id].indices 的类型是 <class 'numpy.ndarray'>
            # indices属性的作用是返回	CSR format index array of the matrix
            
        总之,data['train'].tocsr()[2].indices 获取  user_id=2 的观众打分为5的电影索引数组
        data['item_labels'][...]  根据索引数组,输出对应的电影名称
        
        '''
        # movies our model predicts they will like
        scores = model.predict(user_id, np.arange(n_items))
        '''np.arange()用于创建等差数组,返回一个array对象'''
        # rank them in order of most liked to least
        top_items = data['item_labels'][np.argsort(-scores)]
        '''np.argsort(x)返回数组值从小到大的索引值,np.argsort(-x)按降序排列'''

        # print out the results
        print("User %s" % user_id)
        print("      Known positives:")

        for x in known_positives[:3]:
            print("         %s" % x)
        print("      Recommended:")
        for x in top_items[:3]:
            print("         %s" % x)


sample_recommendation(model, data, [3,25,450])



评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值