一、算法
二、python实现
# import the required package
import numpy as np
from scipy import sparse
# Approximating the k-hop Neighborhood Using Random Walks
def approximate_k():
# build a graph
n_entity = 10
# build the adjacency of graph
mat_a = sparse.dok_matrix((n_entity, n_entity), dtype=int)
mat_a[0, 1] = 1
mat_a[0, 6] = 1
mat_a[0, 7] = 1
mat_a[1, 0] = 1
mat_a[1, 2] = 1
mat_a[1, 6] = 1
mat_a[1, 9] = 1
mat_a[2, 1] = 1
mat_a[2, 3] = 1
mat_a[3, 2] = 1
mat_a[3, 4] = 1
mat_a[4, 3] = 1
mat_a[4, 5] = 1
mat_a[5, 4] = 1
mat_a[5, 9] = 1
mat_a[6, 0] = 1
mat_a[6, 1] = 1
mat_a[6, 7] = 1
mat_a[7, 0] = 1
mat_a[7, 6] = 1
mat_a[7, 9] = 1
mat_a[9, 1] = 1
mat_a[9, 5] = 1
mat_a[9, 7] = 1
mat_a = mat_a.tocsr()
# initialize the sparse k-hop approximate adjacency matrix k
mat_k = sparse.dok_matrix((n_entity, n_entity), dtype=int)
n_rw = 7 # ω
k_hop = 2 # k
for i in range(0, n_entity):
# get neighborhoods of e
neighbors = mat_a[i]
if len(neighbors.indices) == 0:
# generate up to ω random entities
walker = np.random.randint(n_entity, size=n_rw)
mat_k[i, walker] = 1
else:
# random walk algorithm
for _ in range(0, n_rw):
walker = i
for _ in range(0, k_hop):
idx = np.random.randint(len(neighbors.indices))
walker = neighbors.indices[idx]
neighbors = mat_a[walker]
mat_k[i, walker] += 1
mat_k = mat_k.tocsr()
return mat_k
对应的图(Graph):
三、讨论
在算法中,当遍历到没有邻居节点的实体e时,我认为不必要增加else部分。但是算法仍然增加了那样的设置,是因为上述算法的目的是为了满足负采样(参考1中专业术语)的特殊需求,如果仅仅是为了近似,则建议删掉else部分的代码。
经过实验,发现效果不是特别好。另外,随机性也很强,每次运行的结果都不一定好,有时能够近似,但大多数近似的结果很差。主要原因可能是图太过小,参数(k、ω)调得可能也不是特别好。如下图所示,近似的效果(与进行对比,0表示近似值与真实值一样,其他则表示近似值与真实值的差),截了比较好的一个。
四、参考/附件
2.算法设计文档