【转载】个性化PageRank的直观解释及其在推荐中的应用

在此博客文章中,我将讨论个性化页面排名(personalized pagerank),其定义和应用。 让我们从一些基本术语和定义开始。

定义

1. 随机游走

给定一个图,随机游走是一个从随机顶点开始的迭代过程,并且在每个步骤中,要么跟随当前顶点的随机输出边缘,要么跳到随机顶点。 跳跃部分很重要,因为某些顶点可能没有任何向外的边缘,因此遍历将在那些位置终止而不会跳转到另一个顶点。

在下图中,示例随机游走可以从A开始,沿着A到B的输出边缘,沿着B到E的输出边缘,沿着E的一条向外的边缘到达B,跳到D,然后沿着A的向外边缘 D到F,然后跳到C。

在这里插入图片描述

2. PageRank (PR)

PageRank (PR) 衡量一种特定类型的随机游走的平稳分布,该随机游走从随机顶点开始,并且在每次迭代中均以预定义的概率 p p p 跳至随机顶点,并且概率 1 − p 1-p 1p 跟随当前顶点的随机输出边缘。

PageRank通常是在具有均质边的图上进行的,例如,具有“A linksTo B”、“A reference B”、“A likes B”、“A endors B”,或者 “A readsBlogsWrittenBy B”、“A hasImpactOn B”等形式的边的图。

在图上运行PageRank算法将生成顶点的排名(PR值),并且数字PR值可以视为顶点的“重要性”或“相关性”。 通常将具有高PR值的顶点比具有低PR值的顶点更“重要”或更“有影响力”或具有更高的“相关性”。

3. 个性化PageRank (PPR)

个性化PageRank (PPR) 与PR相同,不同之处在于跳转会返回到给定的一组起始顶点之一。 在某种程度上,与PR中执行的随机遍历相比,PPR中的遍历偏向于这组起始顶点(或者说更个性化),并且更加局限。

推荐系统中的PPR

现在,我们已经了解了随机游走、PR和PPR的术语。 让我们看一下如何将PPR用作推荐。 假设我们有以下客户购买产品图(带有反向边)。 假设我们要从用户John开始PPR。 从直觉上讲,PPR中的随机游走很可能会接触到John购买的产品以及购买这些产品的其他用户,以及这些用户购买的产品,依此类推。 从某种意义上讲,这种游走方式能够吸引与John相似的用户,因为他们购买了相同(或相似或相关)的产品。 此外,PPR中的漫游会发现相似/相关产品,因为它们是由相同(或相似或相关)用户购买的。

在这里插入图片描述

使用Oracle大数据空间和图属性图的推荐工作流程

以下Java代码段可以在Oracle Big Data Spatial和Graph支持的Groovy环境中执行。

// First, execute gremlin-opg-hbase.sh or gremlin-opg-nosql.sh

// ...

// Get an instance of OraclePropertyGraph which is a key Java

// class to manage property graph data

opg = OraclePropertyGraph.getInstance(cfg);


opg.clearRepository();   // remove all vertices and edges


// Add vertices for the users and products

vJohn=opg.addVertex(1l); vJohn.setProperty("name","John");
vJohn.setProperty("age",10i);

vMary=opg.addVertex(2l); vMary.setProperty("name","Mary");
vMary.setProperty("sex","F");

vJill=opg.addVertex(3l); vJill.setProperty("name","Jill");
vJill.setProperty("city","Boston");

vTodd=opg.addVertex(4l); vTodd.setProperty("name","Todd");
vTodd.setProperty("student",true);



sdf = new java.text.SimpleDateFormat("mm/dd/yyyy");

vPhone=opg.addVertex(10l); vPhone.setProperty("type","Prod");
vPhone.setProperty("desc","iPhone5");

                          
vPhone.setProperty("released",sdf.parse("02/21/2012"));

vKindle=opg.addVertex(11l);
vKindle.setProperty("type","Prod");
vKindle.setProperty("desc","Kindle Fire");

vFitbit=opg.addVertex(12l);
vFitbit.setProperty("type","Prod");
vFitbit.setProperty("desc","Fitbit Flex Wireless");

                           
vFitbit.setProperty("rating","****");

vPotter=opg.addVertex(13l);
vPotter.setProperty("type","Prod");
vPotter.setProperty("desc","Harry Potter");

vHobbit=opg.addVertex(14l);
vHobbit.setProperty("type","Prod");
vHobbit.setProperty("desc","Hobbit");


// List the vertices

opg.getVertices()


==>Vertex ID 14 {type:str:Prod, desc:str:Hobbit}

==>Vertex ID 2 {name:str:Mary, sex:str:F}

==>Vertex ID 4 {name:str:Todd, student:bol:true}

==>Vertex ID 11 {type:str:Prod, desc:str:Kindle Fire}

==>Vertex ID 10 {type:str:Prod, desc:str:iPhone5,
released:dat:Sat Jan 21 00:02:00 EST 2012}

==>Vertex ID 12 {type:str:Prod, desc:str:Fitbit Flex
Wireless, rating:str:****}

==>Vertex ID 13 {type:str:Prod, desc:str:Harry Potter}

==>Vertex ID 1 {name:str:John, age:int:10}

==>Vertex ID 3 {name:str:Jill, city:str:Boston}


// Add edges with Blueprints Java APIs. Note that if the number of

// edges is much bigger, then use the parallel data loader & flat file format.


opg.addEdge(1l,  vJohn, vPhone, "purchased");

opg.addEdge(2l,  vPhone, vJohn, "purchased by");

opg.addEdge(3l,  vJohn, vKindle, "purchased");

opg.addEdge(4l,  vKindle, vJohn, "purchased by");

opg.addEdge(5l,  vMary, vPhone, "purchased");

opg.addEdge(6l,  vPhone, vMary, "purchased by");

opg.addEdge(7l,  vMary, vKindle, "purchased");

opg.addEdge(8l,  vKindle, vMary, "purchased by");

opg.addEdge(9l,  vMary, vFitbit, "purchased");

opg.addEdge(10l, vFitbit, vMary, "purchased by");

opg.addEdge(11l,  vJill, vPhone, "purchased");

opg.addEdge(12l,  vPhone, vJill, "purchased by");

opg.addEdge(13l,  vJill, vKindle, "purchased");

opg.addEdge(14l,  vKindle, vJill, "purchased by");

opg.addEdge(15l,  vJill, vFitbit, "purchased");

opg.addEdge(16l, vFitbit, vJill, "purchased by");

opg.addEdge(17l, vTodd, vFitbit, "purchased");

opg.addEdge(18l, vFitbit, vTodd, "purchased by");

opg.addEdge(19l, vTodd, vPotter, "purchased");

opg.addEdge(20l, vPotter, vTodd, "purchased by");

opg.addEdge(21l, vTodd, vHobbit, "purchased");

opg.addEdge(22l, vHobbit, vTodd, "purchased by");

opg.commit();

// Create in-memory analytics session and analyst

session=Pgx.createSession("session_ID_1");

analyst=session.createAnalyst();

// Read the graph from database into memory

pgxGraph = session.readGraphWithProperties(opg.getConfig());


// We are going to get a recommendation for user John.

// Find this vertex with a simple query
v=opg.getVertices("name", "John").iterator().next();


// Add John to the start vertex set
vertexSet=pgxGraph.createVertexSet();

vertexSet.addAll(v.getId());


// Run personalized page rank using John as the start vertex

ppr=analyst.personalizedPagerank(pgxGraph, vertexSet, \

            0.0001 /*maxError*/, 0.85 /*dampingFactor*/, 1000);


// Examine the top 9 entries of the PPR output

// The vertices for John, iPhone5, and Kindle Fire have the highest PR

// values (shown below in the first column) because John is the starting point of PPR

// and iPhone5 and Kindle Fire were purchased by John.

// Mary and Jill also have high PR values because they made similar purchases as John.

//

it=ppr.getTopKValues(9).iterator(); while
(it.hasNext()) {

     entry=it.next(); vid=entry.getKey().getId();

     System.out.format("ppr=%.4f  vertex=%s\n",
entry.getValue(), opg.getVertex(vid));}

==>

ppr=0.2496  vertex=Vertex ID 1 {name:str:John, age:int:10}

ppr=0.1758  vertex=Vertex ID 11 {type:str:Prod,
desc:str:Kindle Fire}

ppr=0.1758  vertex=Vertex ID 10 {type:str:Prod,
desc:str:iPhone5, released:dat:Sat Jan 21 00:02:00 EST 2012}

ppr=0.1229  vertex=Vertex ID 3 {name:str:Jill,
city:str:Boston}

ppr=0.1229  vertex=Vertex ID 2 {name:str:Mary, sex:str:F}

ppr=0.0824  vertex=Vertex ID 12 {type:str:Prod,
desc:str:Fitbit Flex Wireless, rating:str:****}

ppr=0.0451  vertex=Vertex ID 4 {name:str:Todd,
student:bol:true}

ppr=0.0128  vertex=Vertex ID 13 {type:str:Prod, desc:str:Harry
Potter}

ppr=0.0128  vertex=Vertex ID 14 {type:str:Prod,
desc:str:Hobbit}


// Now, let's filter out users from this list 
// (assume we only want to recommend products for John, 
// not other similar buyers)

// If we exclude the top 2 products that John purchased
// before, the remaining three, Fitbit, Harry Potter,

// and Hobbit, are our recommendations for John, in that order.
// Note that for John, Fitbit has a much higher PR value
// than Harry Potter and Hobbit.

//

it=ppr.getTopKValues(9).iterator(); while (it.hasNext()) {

     entry=it.next(); vid=entry.getKey().getId();
vertex=opg.getVertex(vid);

     if ("Prod".equals(vertex.getProperty("type")))

     System.out.format("ppr=%.4f  vertex=%s\n",
entry.getValue(), opg.getVertex(vid));}

=>

ppr=0.1758  vertex=Vertex ID 11 {type:str:Prod,
desc:str:Kindle Fire}

ppr=0.1758  vertex=Vertex ID 10 {type:str:Prod,
desc:str:iPhone5, released:dat:Sat Jan 21 00:02:00 EST 2012}

ppr=0.0824  vertex=Vertex ID 12 {type:str:Prod,
desc:str:Fitbit Flex Wireless, rating:str:****}

ppr=0.0128  vertex=Vertex ID 13 {type:str:Prod, desc:str:Harry
Potter}

ppr=0.0128  vertex=Vertex ID 14 {type:str:Prod,
desc:str:Hobbit}


// So the final recommended products for John are,
Fitbit, Harry Potter, and Hobbit
ppr=0.0824  vertex=Vertex ID 12 {type:str:Prod,
desc:str:Fitbit Flex Wireless, rating:str:****}

ppr=0.0128  vertex=Vertex ID 13 {type:str:Prod, desc:str:Harry
Potter}

ppr=0.0128  vertex=Vertex ID 14 {type:str:Prod,
desc:str:Hobbit}
  • 4
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值