基于用户的协同过滤算法---《推荐系统实践》---Python源码(9)

一、总体说明

本代码以《推荐系统实践》这本书的代码为框架,进行改写。数据集为点击打开链接中的ratings.dat数据。为了方便起见,所有代码都写在一个文件中,能够保证无需任何更改,直接执行,就能出结果

二、符号说明

参数名

类型

说明

data

list

读取ratings.dat的全部数据

M

int

将data近似分为M份,M-1为训练数据,1份为测试数据

seed

int

将data分成训练数据和测试数据函数中,随机数的种子

N

int

N个推荐结果

K

int

基于邻域算法选择K个邻居

W

dict<int,dict<int ,int>>

相似度矩阵,例如,储存用户u和用户v的相似度s,为W[u][v] = s

allrank

dict<int,list>

记录了所有用户的推荐结果。以<用户,推荐结果>的形式储存,推荐结果为一个list

三、函数说明

函数名

说明

SplitData

将data分为训练数据和测试数据

list2dic

将测试数据和训练数据的类型从list转为dict

UserSimilarity1

计算用户u和用户v的余弦相似度,计算速度未优化

Recommand

为所有用户推荐物品

Precision

计算推荐结果的准确度

四、Python源码

import random as rd , math as mt, operator as op
"""
SplitData(data, M, k, seed) approximately split data into M-1 train data and 1 test data.
This function should be call M times on the conditon that K varies from 0 to M-1 
and keep seed constant.

"""
def SplitData(data, M, k, seed):
    test = []
    train = []
    rd.seed(seed)
    for user,item in data:
        if rd.randint(0,M) == k:#generate a uniform random number in [0,M]
            test.append([user,item])
        else:
            train.append([user,item])
    return train, test
"""
the data structure of train data, test data are express a dictionary like {key:set}.
key is user. set is a set of movies that the users have rated.
"""
def list2dic(listdata):
    dicdata = dict()
    for user,item in listdata:
        if user not in dicdata.keys():
            dicdata[user] = set()
            dicdata[user].add(item)
        else:
            dicdata[user].add(item)
    return dicdata
    
def UserSimilarity1(train):
    W = dict()
    for u in train.keys():
        W[u]=dict()
        for v in train.keys():
            if u==v:
                continue
            W[u][v] = len(train[u] & train[v])
            W[u][v] /= mt.sqrt(len(train[u]) * len(train[v]) * 1.0)
    return W

# top-N list for user u 
def Recommand(train, W, K, N):
    allrank = dict()
    for u in train.keys(): 
        rank = dict()
        for v, wuv in sorted(W[u].items(), key = op.itemgetter(1), reverse = True)[0:K]:
            for i in train[v]:
                if i not in train[u]:
                    if i not in rank.keys():
                        rank[i] = wuv;
                    else:
                        rank[i] += wuv * 1
        ranklist = []        
        for  item, sims in sorted(rank.items(), key = op.itemgetter(1), reverse = True)[0:N]:
            ranklist.append(item)
        allrank[u] = ranklist      
    return allrank
# compute precision
def Precision(allrank, test, N):
    hit = 0
    all = 0
    for user in test.keys():
        tu = test[user]
        if user in allrank.keys():
            for item in allrank[user]:
                if item in tu:
                    hit+=1
            all += N
    return hit / (all*1.0)
"""
main function
"""
filestring = '/home/sysu-hgavin/文档/ml-1m/ratings.dat'
f = open(filestring, 'r')
data = [] 
while 1:
    line = f.readline()#read data 
    if not line:
        break
    line = line.split("::")[:2]
    line[0] = int (line[0])
    line[1] = int (line[1])
    data.append(line)
f.close()
M = 8
seed = 1
N = 10 #top-N
K = 20 #the number of neighbors
#for k in range(M-1):
train, test = SplitData(data, M, 1, seed)#generate train data and test data
train = list2dic(train)
test = list2dic(test)

W = UserSimilarity1(train)
allrank = Recommand(train, W, K, N)
precision = Precision(allrank, test, N)
print (precision)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值