随机游走(random walk)
Flow Formulation
当某个网页的输入链接越多时,说明该网页越重要(Page is more important if it has more out-links)
首先定义某一页的rank:
r j = ∑ i → j r i d i r_{j}=\sum_{i \rightarrow j} \frac{r_{i}}{\mathrm{d}_{i}} rj=i→j∑diri
其中: d i d_{i} di为结点 i i i的出度和
对于上图而言,其方程为:
r y = r y / 2 + r a / 2 r a = r y / 2 + r m r m = r a / 2 \begin{array}{l}{\mathbf{r}_{\mathbf{y}}=\mathbf{r}_{\mathbf{y}} / 2+\mathbf{r}_{\mathbf{a}} / \mathbf{2}} \\ {\mathbf{r}_{\mathbf{a}}=\mathbf{r}_{\mathbf{y}} / \mathbf{2}+\mathbf{r}_{\mathbf{m}}} \\ {\mathbf{r}_{\mathbf{m}}=\mathbf{r}_{\mathbf{a}} / 2}\end{array} ry=ry/2+ra/2ra=ry/2+rmrm=ra/2
为了使得方程有解,还需加上如下条件:
r y + r a + r m = 1 r_{y}+r_{a}+r_{m}=1 ry+ra+rm=1
但是上述求解方式对于大规模图而言,求解上述方程的时间复杂度会较高
Matrix Formulation
定义矩阵 M M M,假设页面 i i i有 d i d_{i} di个输出 l i n k link link,则如果页面 i i i指向页面 j j j,则 M j i = 1 d i M_{ji}=\frac{1}{d_{i}} Mji=di1;否则, M j i = 0 M_{ji}=0 Mji=0。向量 r r r表示每个页面的分数,且满足 ∑ i r i = 1 \sum_{i}r_{i}=1 ∑iri=1。
r = M ⋅ r \boldsymbol{r}=\boldsymbol{M} \cdot \boldsymbol{r} r=M⋅r
通过上式,不难发现, r \boldsymbol{r} r为矩阵 M \boldsymbol{M} M的特征向量,且其特征值为 1 1 1,且是其最大特征根,因为 M r ≤ 1 \boldsymbol{M} \boldsymbol{r}\leq 1 Mr≤1???有疑问为何就是最大的了
使用迭代法(power iteration)进行求解,求解过程如下:
Suppose there are N web pages
Initialize: r(0) = [1/N,….,1/N]T
Iterate: r(t+1) = M ∙ r(t)
Stop when |r(t+1) – r(t)|1 < μ
证明:迭代法的合理性
假设矩阵 M \boldsymbol{M} M有 n n n个特征向量 x 1 x_{1} x1 x 2 x_{2} x2… x n x_{n} xn,并且其对应的特征根为 λ 1 \lambda_{1} λ1 λ 2 \lambda_{2} λ2 … λ n \lambda_{n} λn且 λ 1 > λ 2 > ⋯ > λ n \lambda_{1}>\lambda_{2}>\cdots>\lambda_{n} λ1>λ2>⋯>λn,由于特征向量之间相互独立,故可以将其看作一组基础解系,则 r ( 0 ) = c 1 x 1 + c 2 x 2 + ⋯ + c n x n r^{(0)}=c_{1} x_{1}+c_{2} x_{2}+\cdots+c_{n} x_{n} r(0)=c1x1+c2x2+⋯+cnxn
M r ( 0 ) = M ( c 1 x 1 + c 2 x 2 + ⋯ + c n x n ) = c 1 ( M x 1 ) + c 2 ( M x 2 ) + ⋯ + c n ( M x n ) = c 1 ( λ 1 x 1 ) + c 2 ( λ 2 x 2 ) + ⋯ + c n ( λ n x n ) \begin{aligned} \boldsymbol{M} \boldsymbol{r}^{(0)} &=\boldsymbol{M}\left(\boldsymbol{c}_{1} \boldsymbol{x}_{1}+\boldsymbol{c}_{2} \boldsymbol{x}_{2}+\cdots+\boldsymbol{c}_{\boldsymbol{n}} \boldsymbol{x}_{\boldsymbol{n}}\right) \\ &=c_{1}\left(M x_{1}\right)+c_{2}\left(M x_{2}\right)+\cdots+c_{n}\left(M x_{n}\right) \\ &=c_{1}\left(\lambda_{1} x_{1}\right)+c_{2}\left(\lambda_{2} x_{2}\right)+\cdots+c_{n}\left(\lambda_{n} x_{n}\right) \end{aligned} Mr(0)=M(c1