基于 PageRank 的链接分析
1. PageRank理论
一般的,对于网页
A
A
A 的PageRank值,我们可以按照以下公式迭代计算:
P
R
n
(
A
)
=
(
1
−
d
)
/
N
+
d
×
(
∑
i
=
1
m
P
R
n
−
1
(
T
i
)
C
(
T
i
)
)
PR_n(A)=(1-d)/N+d\times(\sum_{i=1}^m\frac{PR_{n-1}(T_i)}{C(T_i)})
PRn(A)=(1−d)/N+d×(i=1∑mC(Ti)PRn−1(Ti))
其中
N
N
N 为网页总数,
P
R
n
−
1
(
T
i
)
PR_{n-1}(T_i)
PRn−1(Ti) 是指向网页
A
A
A 的网页
T
i
T_i
Ti 在第
n
−
1
n-1
n−1 次迭代时的值,
C
(
T
i
)
C(T_i)
C(Ti) 是指网页
T
i
T_i
Ti 的外链数量,
d
d
d 是平滑因子。
对于以上公式,稍加变换,很容易的,我们可以得到PageRank计算的矩阵形式:
P
R
n
=
1
−
d
N
I
+
d
×
T
P
R
n
−
1
PR_n=\frac{1-d}{N}\boldsymbol{I}+d \times \boldsymbol{T}PR_{n-1}
PRn=N1−dI+d×TPRn−1
其中
I
\boldsymbol{I}
I 为单位矩阵,
T
\boldsymbol{T}
T 为转移矩阵。
由此我们就构建了批量求解PageRank值的方法:
2. PageRank代码实现
-
首先我们构建一个
ConstructMatrix()
函数,用于将输入的网页关联矩阵转化为转移矩阵:def ConstructMatrix(adj): return adj.T / adj.sum(axis=1)
-
紧接着按照公式 ( 2 ) (2) (2) 的矩阵形式,我们很容易写出PageRank算法:
def ComputeRank(matrix, d, threshold): n = matrix.shape[0] v = np.ones(n) / n iteration = 0 print('PageRank Runs....') while True: u = np.matmul(matrix, v) * d + (1 - d) * np.ones(n) / n delt = np.linalg.norm(u-v, ord=1) v = u iteration += 1 print('iteration = {0}, delt = {1}'.format(iteration, delt)) if delt - threshold < 0: break print('Train Finished!') return v
其中,其中
matrix
为转移矩阵,d
为转移因子,threshold
为阈值,控制模型收敛。 -
之后我们给定的网络图:
运行代码,得到运行结果:adj = np.array([[0, 1, 1], [0, 0, 1], [1, 0, 0]]) matrix = ConstructMatrix(adj) ComputeRank(matrix, 0.5, 1e-10)
-
结果显示,模型在迭代22轮后收敛,结果为:
PageRank Runs.... iteration = 1, delt = 0.16666666666666663 iteration = 2, delt = 0.08333333333333331 iteration = 3, delt = 0.04166666666666663 iteration = 4, delt = 0.010416666666666685 iteration = 5, delt = 0.0026041666666666297 iteration = 6, delt = 0.0013020833333333703 iteration = 7, delt = 0.0006510416666666852 iteration = 8, delt = 0.00016276041666674068 iteration = 9, delt = 4.069010416668517e-05 iteration = 10, delt = 2.034505208331483e-05 iteration = 11, delt = 1.017252604162966e-05 iteration = 12, delt = 2.5431315104351704e-06 iteration = 13, delt = 6.357828775671592e-07 iteration = 14, delt = 3.1789143883909077e-07 iteration = 15, delt = 1.5894571941954538e-07 iteration = 16, delt = 3.9736429924275285e-08 iteration = 17, delt = 9.934107481068821e-09 iteration = 18, delt = 4.967053712778835e-09 iteration = 19, delt = 2.483526828633842e-09 iteration = 20, delt = 6.208817349140361e-10 iteration = 21, delt = 1.552203920951456e-10 iteration = 22, delt = 7.761025155872403e-11 Train Finished! array([0.35897436, 0.25641026, 0.38461538])
至此,PageRank介绍完毕。
本文为作者原创,转载请注明来源:https://blog.csdn.net/BrilliantAntonio/article/details/117165930