Personalized Ranking Metric Embedding for Nest New POI Recommendation

最新推荐文章于 2024-09-26 16:45:16 发布

Poke_Z

最新推荐文章于 2024-09-26 16:45:16 发布

阅读量2.5k

点赞数

文章标签：推荐算法预测 poi

本文链接：https://blog.csdn.net/zdwccc/article/details/78385420

版权

介绍(Introduction):

本篇论文主要利用距离嵌入(Metric Embedding)将每个POI映射到一个低维的欧拉空间当中，有效地利用马尔科夫链模型预测POI的变化，用两个POI的欧拉距离衡量两者的序列关系，并且进一步提出了成对排序(pair-wise ranking)的距离嵌入，可以对空间中潜在的POI进行排序，最后提出了个性化的距离嵌入排名(PRME)算法，综合考虑序列信息和个人喜好，因为人们都倾向于拜访距离他们位置比较近的POI，所以考虑空间因素，将模型拓展为PRME-G模型。

论文原理：

论文使用了两个数据集，FourSquare在新加坡内的数据和Gowalla在加利福尼亚和内华达的数据，在使用前对数据集进行预处理，将访问少于10个POI的用户删除，以及将少于10个用户访问的POI删去。通过对数据的统计可以得到以下三个结论：

用户有探索新POI的倾向
时间局部性，用户访问两个POI的时间间隔不会很长
空间局部性，用户连续访问的两个POI的距离不会很远

当在短时间内发生两个check-in时，可以相信存在马尔科夫链的属性，也就是下一个POI很大程度上受当前POI的影响。基于这种短时间内的马尔科夫属性和人们探索新POI的倾向，我们可以定义本论文涉及推荐问题：给定一个用户u和他当前所处的位置l，从用户u没有访问过的POI中选择一个新的推荐给用户u。如果只是推荐一个POI，那么推荐用户u访问最频繁的POI就可能得到较高的正确率，但是我们要推荐新的POI，所以这种方法并不适用，它要使用更稀疏的历史数据推测转移概率，所以下一个新POI的推荐要比下一个POI推荐更难。

我们首先介绍使用成对排名的距离嵌入算法来对位置变换进行建模。距离嵌入模型适用于处理稀疏的数据和未观测到的数据。我们用高维空间的一个点表示现实世界的POI，用两个POI在高维空间中的欧拉距离表示两个POI转换的概率，距离越小，概率越大，把所有的POI嵌入到高维隐空间，我们的模型可以推测位置转换的概率，并且也可以用来给没有观测到的转换赋予有意义的概率。在距离嵌入模型中，每个POI在K维空间中用一个K维向量表示位置，我们的任务，就是通过访问序列来推测出表示POI的K维向量，转换概率如下所示：
这里写图片描述

上述式子只能表示已经观测到的POI转换关系，因为被观测到的数据非常稀疏，为了让学习到的向量关系符合POI转换的概率关系，我们需要充分利用没有观测到的数据，我们假设观测到的下一个POI和当前的POI更有关系，没有观测到POI影响更小，所以能够观测到的POI的排名应该比没有观测到的POI排名高，以此作为排名推测的依据。

POI推荐的目标就是提供对所有POI的排名，推荐排名最高的一项。我们可以进一步简化上面的概率表示：
这里写图片描述

接下来介绍个性化排名距离嵌入算法，下一个POI推荐不仅与当前位置有关，而且与用户的喜好有关，我们引入一个新的高维空间，将用户和POI嵌入到这个高维空间，用户u和位置l在空间中的欧拉距离表示u对l的喜爱程度，距离越近，喜爱程度越高，去的可能越大，综合考虑序列信息和个人喜好，用户将l作为下一个访问的POI的概率可以表示为：
这里写图片描述

根据之前提到的，马尔科夫链属性在两次短时间访问时才能凸显，所以当下一次访问和当前访问时间差距比较大时，可以不考虑序列信息，只考虑用户的喜好，所以可以改善表示为：
这里写图片描述

最后将地理因素考虑进模型，我们用当前POI的位置和下一次访问的位置之间的距离计算地理因素系数w，位置越近越近，w越小，可能性越大，同样，当两次时间差过大时，不考虑当前POI对下一次访问的影响，地理因子同样不需要考虑，所以最终的概率表示为：
这里写图片描述

该模型的最优化标准参考贝叶斯个性化推荐(BPR)的方法，最大化后验概率来推测参数，使用logistic函数表示条件概率，对参数使用高斯前验，最后加正则化参数，防止过拟合，损失函数为：
这里写图片描述

算法实现：

如果直接对上面的表达式利用梯度下降计算最值时的参数，计算量比较大，所以采用之前提到的排名原则，对用户u，当前位置lc，观测到的下一访问li，随机选择一个没有观测到过的位置lj，用户u在位置lc访问观测到的li的概率应该大于没有观测到的lj的概率，所以我们最小化的目标变为
这里写图片描述
当z最小时，前一项最小，概率大，后一项最大，概率小，符合预期，所以梯度下降算法用下列方式进行参数更新：

在算法实现过程中，首先获得数据元组，包括用户，当前位置，下一观测到的位置，随机选择一个没有观测到的位置，然后用期望为0，方差为0.01的正态分布随机初始化用户和POI在高维空间中向量位置，一个表示序列关系的空间，一个表示用户喜好的空间，然后用上面的参数更新方法更新参数，直到收敛，即损失函数最小，收敛后返回用户和POI在两个空间中的高维坐标。

在测试时，如果要推测用户u下一刻要访问哪一个POI，需要对所有未观测到的POI利用之前训练出的两个空间中的坐标计算出D，按D值进行排序，将D值最小的POI推荐给用户。

需要的数据集可以从http://www.ntu.edu.sg/home/gaocong/data/poidata.zip下载，代码如下所示：

import os
import numpy as np
from math import radians, cos, sin, asin, sqrt, pow, log

def getUser():
    fr=open("user.txt",'r')
    user=[]
    for line in fr.readlines():
        user.append(line.strip())
    fr.close()
    return user

def getShop():
    fr=open("shop.txt",'r')
    shop=[]
    for line in fr.readlines():
        shop.append(line.strip())
    fr.close()
    return shop

def getTrainTuple(fileName):
    data=[]
    observedPOI={}
    exUser=''
    exShop=''
    exTime=''
    fr=open(fileName)
    for line in fr.readlines():
        lineArr=line.strip().split('\t')
        user=lineArr[0]
        shop=lineArr[1]
        time=float(lineArr[4])*24+float(lineArr[3].split(':')[0])+float(lineArr[3].split(':')[1])/60.0
        if user==exUser:
            newTuple=[user,exShop,shop,exTime,time]
            data.append(newTuple)
            if user not in observedPOI.keys():
                observedPOI[user]={}
            if exShop not in observedPOI[user].keys():
                observedPOI[user][exShop]=[]
            observedPOI[user][exShop].append(shop)
            exShop=shop
            exTime=time
        else:
            exUser=user
            exShop=shop
            exTime=time
    fr.close()
    return data,observedPOI

def getTestTuple(fileName):
    data=[]
    exUser=''
    exShop=''
    exTime=''
    fr=open(fileName)
    for line in fr.readlines():
        lineArr=line.strip().split('\t')
        user=lineArr[0]
        shop=lineArr[1]
        time=float(lineArr[4])*24+float(lineArr[3].split(':')[0])+float(lineArr[3].split(':')[1])/60.0 
        if user==exUser:
            newTuple=[user,exShop,shop,exTime,time]
            data.append(newTuple)
            exShop=shop
            exTime=time
        else:
            exUser=user
            exShop=shop
            exTime=time
    fr.close()
    return data

def initVec():
    userP={}
    shopP={}
    shopS={}
    user=getUser()
    shop=getShop()
    for item in user:
        userP[item]=np.random.normal(0,0.01,60)
    for item in shop:
        shopP[item]=np.random.normal(0,0.01,60)
        shopS[item]=np.random.normal(0,0.01,60)
    return userP,shopP,shopS

def loadFileWithDic(fileName):
    fr=open(fileName,'r')
    data={}
    i=0
    arr=[]
    key=''
    for line in fr.readlines():
        if i==0:
            key=line.strip().split('\t')[0]
            temp=line.strip().split('\t')[1][1:].split(' ')
            for item in temp:
                if item!='':
                    arr.append(float(item))
            i=1
        else:
            temp=line.strip().split(' ')
            for item in temp:
                if item!='' and item!=']':
                    if item[-1]==']':
                        arr.append(float(item[:-1]))
                    else:
                        arr.append(float(item))
            if len(arr)==60:
                i=0
                data[key]=np.array(arr)
                arr=[]
    fr.close()
    return data

def getVisited(fileName):
    fr=open(fileName,'r')
    visited={}
    for line in fr.readlines():
        lineArr=line.strip().split('\t')
        user=lineArr[0]
        shop=lineArr[1]
        if user not in visited.keys():
            visited[user]=[]
        if shop not in visited[user]:
            visited[user].append(shop)
    fr.close()
    return visited

def sigmoid(x):
    return 1.0/(1.0+np.exp(float(-x)))

def Edis(a,b):
    sum=0.0
    for i in range(len(a)):
        sum=sum+(a[i]-b[i])*(a[i]-b[i])
    return sum

def train():
    userP,shopP,shopS=initVec()
    data,observedPOI=getTrainTuple('train.txt')
    shop=getShop()
    for i in range(500):
        print("The "+str(i+1)+" is done!")
        for item in data:
            (user,exShop,Cshop,exTime,time)=item
            shopJ=shop[int(np.random.uniform(len(shop)))]
            while shopJ==exShop or shopJ in observedPOI[user][exShop]:
                shopJ=shop[int(np.random.uniform(len(shop)))]
            if time-exTime<6:
                z=0.2*(Edis(userP[user],shopP[shopJ])-Edis(userP[user],shopP[Cshop]))+0.8*(Edis(shopS[exShop],shopS[shopJ])-Edis(shopS[exShop],shopS[Cshop]))
                d=1-sigmoid(z)
                userP[user]=userP[user]+0.005*(d*0.4*(shopP[Cshop]-shopP[shopJ])-0.006*userP[user])
                shopP[Cshop]=shopP[Cshop]+0.005*(d*0.4*(userP[user]-shopP[Cshop])-0.006*shopP[Cshop])
                shopP[shopJ]=shopP[shopJ]+0.005*(d*0.4*(shopP[shopJ]-userP[user])-0.006*shopP[shopJ])
                shopS[exShop]=shopS[exShop]+0.005*(d*1.6*(shopS[Cshop]-shopS[shopJ])-0.006*shopS[exShop])
                shopS[Cshop]=shopS[Cshop]+0.005*(d*1.6*(shopS[exShop])-shopS[Cshop]-0.006*shopS[Cshop])
                shopS[shopJ]=shopS[shopJ]+0.005*(d*1.6*(shopS[shopJ]-shopS[exShop])-0.006*shopS[shopJ])
            else:
                z=Edis(userP[user],shopP[shopJ])-Edis(userP[user],shopP[Cshop])
                d=1-sigmoid(z)
                userP[user]=userP[user]+0.005*(d*2*(shopP[Cshop]-shopP[shopJ])-0.006*userP[user])
                shopP[Cshop]=shopP[Cshop]+0.005*(d*2*(userP[user]-shopP[Cshop])-0.006*shopP[Cshop])
                shopP[shopJ]=shopP[shopJ]+0.005*(d*2*(shopP[shopJ]-userP[user])-0.006*shopP[shopJ])
    fr=open('userP1000.txt','w')
    for key in userP.keys():
        fr.write(str(key)+'\t'+str(userP[key])+'\n')
    fr.close()
    fr=open('shopP1000.txt','w')
    for key in shopP.keys():
        fr.write(str(key)+'\t'+str(shopP[key])+'\n')
    fr.close()
    fr=open('shopS1000.txt','w')
    for key in shopS.keys():
        fr.write(str(key)+'\t'+str(shopS[key])+'\n')
    fr.close()
    return userP,shopP,shopS

def test():
    userP,shopP,shopS=train()
    #userP=loadFileWithDic('userP.txt')
    #shopS=loadFileWithDic('shopS.txt')
    #shopP=loadFileWithDic('shopP.txt')
    data=getTestTuple("test.txt")
    visited=getVisited("train.txt")
    user=getUser()
    shop=getShop()
    allNum=0
    corNum=0
    count=0
    for item in data:
        (Cuser,exShop,Cshop,exTime,time)=item
        if Cuser not in user or exShop not in shop or Cshop not in shop or Cshop in visited[Cuser] or Cshop==exShop:
            continue
        allNum=allNum+1
        if exShop not in visited[Cuser]:
            visited[Cuser].append(exShop)
        poss={}
        count=count+1
        for pShop in shop:
            if pShop in visited[Cuser] or pShop==exShop:
                continue
            if (time-exTime)<6:
                poss[pShop]=0.2*Edis(userP[Cuser],shopP[pShop])+0.8*Edis(shopS[exShop],shopS[pShop])
            else:
                poss[pShop]=Edis(userP[Cuser],shopP[pShop])
        ans=min(poss.items(), key=lambda x: x[1])[0]
        if ans==Cshop:
            corNum=corNum+1
            print(str(corNum)+" : "+str(count))
    print("The currect rate is "+str((100.0*float(corNum))/float(allNum))+"%.")

def haversine(lon1, lat1, lon2, lat2): 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])  

    dlon = lon2 - lon1   
    dlat = lat2 - lat1   
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2  
    c = 2 * asin(sqrt(a))   
    r = 6371 
    return c * r

def getPosition():
    fileList=['New/FourSquare/train.txt','New/FourSquare/test.txt','New/FourSquare/tune.txt']
    position={}
    for fileName in fileList:
        fr=open(fileName,'r')
        for line in fr.readlines():
            shop=line.strip().split('\t')[1]
            if shop not in position.keys():
                lat=float(line.strip().split('\t')[2].split(',')[0])
                lon=float(line.strip().split('\t')[2].split(',')[1])
                position[shop]={'lat':lat,'lon':lon}
        fr.close()
    return position

def trainG():
    userP,shopP,shopS=initVec()
    data,observedPOI=getTrainTuple('train.txt')
    position=getPosition()
    shop=getShop()
    for i in range(500):
        print("The "+str(i+1)+" is done!")
        for item in data:
            (user,exShop,Cshop,exTime,time)=item
            shopJ=shop[int(np.random.uniform(len(shop)))]
            while shopJ==exShop or shopJ in observedPOI[user][exShop]:
                shopJ=shop[int(np.random.uniform(len(shop)))]
            if time-exTime<6:
                d1=haversine(position[exShop]['lat'],position[exShop]['lon'],position[Cshop]['lat'],position[Cshop]['lon'])
                d2=haversine(position[exShop]['lat'],position[exShop]['lon'],position[shopJ]['lat'],position[shopJ]['lon'])
                w1=pow(1+d1,0.25)
                w2=pow(1+d2,0.25)
                z=0.2*(w2*Edis(userP[user],shopP[shopJ])-w1*Edis(userP[user],shopP[Cshop]))+0.8*(w2*Edis(shopS[exShop],shopS[shopJ])-w1*Edis(shopS[exShop],shopS[Cshop]))
                d=1-sigmoid(z)
                userP[user]=userP[user]+0.005*(d*0.4*(w1*shopP[Cshop]-w2*shopP[shopJ])-0.006*userP[user])
                shopP[Cshop]=shopP[Cshop]+0.005*(d*0.4*w1*(userP[user]-shopP[Cshop])-0.006*shopP[Cshop])
                shopP[shopJ]=shopP[shopJ]+0.005*(d*0.4*w2*(shopP[shopJ]-userP[user])-0.006*shopP[shopJ])
                shopS[exShop]=shopS[exShop]+0.005*(d*1.6*(w1*shopS[Cshop]-w2*shopS[shopJ])-0.006*shopS[exShop])
                shopS[Cshop]=shopS[Cshop]+0.005*(d*1.6*w1*(shopS[exShop])-shopS[Cshop]-0.006*shopS[Cshop])
                shopS[shopJ]=shopS[shopJ]+0.005*(d*1.6*w2*(shopS[shopJ]-shopS[exShop])-0.006*shopS[shopJ])
            else:
                z=Edis(userP[user],shopP[shopJ])-Edis(userP[user],shopP[Cshop])
                d=1-sigmoid(z)
                userP[user]=userP[user]+0.005*(d*2*(shopP[Cshop]-shopP[shopJ])-0.006*userP[user])
                shopP[Cshop]=shopP[Cshop]+0.005*(d*2*(userP[user]-shopP[Cshop])-0.006*shopP[Cshop])
                shopP[shopJ]=shopP[shopJ]+0.005*(d*2*(shopP[shopJ]-userP[user])-0.006*shopP[shopJ])
    fr=open('userP.txt','w')
    for key in userP.keys():
        fr.write(str(key)+'\t'+str(userP[key])+'\n')
    fr.close()
    fr=open('shopP.txt','w')
    for key in shopP.keys():
        fr.write(str(key)+'\t'+str(shopP[key])+'\n')
    fr.close()
    fr=open('shopS.txt','w')
    for key in shopS.keys():
        fr.write(str(key)+'\t'+str(shopS[key])+'\n')
    fr.close()
    return userP,shopP,shopS

def testG():
    userP,shopP,shopS=trainG()
    #userP=loadFileWithDic('userP.txt')
    #shopS=loadFileWithDic('shopS.txt')
    #shopP=loadFileWithDic('shopP.txt')
    data=getTestTuple("test.txt")
    visited=getVisited("train.txt")
    user=getUser()
    shop=getShop()
    allNum=0
    corNum=0
    count=0
    for item in data:
        (Cuser,exShop,Cshop,exTime,time)=item
        if Cuser not in user or exShop not in shop or Cshop not in shop or Cshop in visited[Cuser] or Cshop==exShop:
            continue
        allNum=allNum+1
        if exShop not in visited[Cuser]:
            visited[Cuser].append(exShop)
        poss={}
        count=count+1
        for pShop in shop:
            if pShop in visited[Cuser] or pShop==exShop:
                continue
            if (time-exTime)<6:
                d=haversine(position[exShop]['lat'],position[exShop]['lon'],position[pshop]['lat'],position[pshop]['lon'])
                w=pow(1+d1,0.25)
                poss[pShop]=w*(0.2*Edis(userP[Cuser],shopP[pShop])+0.8*Edis(shopS[exShop],shopS[pShop]))
            else:
                poss[pShop]=Edis(userP[Cuser],shopP[pShop])
        ans=min(poss.items(), key=lambda x: x[1])[0]
        if ans==Cshop:
            corNum=corNum+1
            print(str(corNum)+" : "+str(count))
    print("The currect rate is "+str((100.0*float(corNum))/float(allNum))+"%.")