Graph as Matrix: PageRank,Random Walks and Embeddings
0. Graph as Matrix
Investigate graph analysis and learning from a matrix perspective to
- Determine node importance via random walk (PageRank)
- Obtain node embeddings via matrix factorization (MF)
- View other node ebeddings (e.g., Node2Vec) as MF
1. PageRank: Google Algorithm
0). Example: the Web as a Graph
- Web as a graph: nodes → \to → web pages, edges → \to → hyperlinks
- In early days of the Web, links were navigational; today many links are transational (post, comment, like, buy, …)
- Web as a directed graph
1). Link Analysis Algorithms
- PageRank
- Personalized PageRank (PPR)
- Random Walk with Restarts
2). PageRank: the “Flow” Model
Idea: links as votes (page is more important if it has more links)
Links from important pages count more
Resursive question
A vote from an imporatnt page is worth more
- Each link’s vote is proportional to the importance of its source page
- If page i i i with importance r i r_i ri has d i d_i di out-links, each link gets r i / d i r_i/d_i ri/di votes
- Page
j
j
j's own importance
r
j
r_j
rj is the sum of the votes on its in-links
r j = ∑ i → j r i d i r_j=\sum_{i\to j} \frac{r_i}{d_i} rj=i→j∑diri
3). PageRank: Matrix Formulation
- Stochastic adjacency matrix
M
M
M
If j → i j \to i j→i, then M i j = 1 d j M_{ij}=\frac{1}{d_j} Mij=dj1
M is a column stochastic matrix (columns sum to 1 - Rank vector
r
r
r: an entry per page
r i r_i ri is the importance score of page i i i ( ∑ i r i = 1 \sum_i r_i =1 ∑iri=1) - The flow equation can be written as
r = M ⋅ r r = M \cdot r r=M⋅r
4). Connection to Random Walk
Imageine a random web surfer
a) At any time
t
t
t, surfer is on some page
i
i
i
b) At any time
t
+
1
t+1
t+1, the surfer follows an out-link from
i
i
i uniformly at random
c) Ends up on some page
j
j
j linked from
i
i
i
d) Process repeats indefinitely
Let
p
(
t
)
p(t)
p(t) denote the vector whose
i
t
h
i^{th}
ith coordinate is the probability that the surfer is at page
i
i
i at time
t
t
t. So
p
(
t
)
p(t)
p(t) is a probability distribution over pages
5). The Stationary Distribution
Follow a link uniformly at random
p
(
t
+
1
)
=
M
⋅
p
(
t
)
p(t+1) = M\cdot p(t)
p(t+1)=M⋅p(t)
Suppose the random walk reaches a state
p
(
t
+
1
)
=
M
⋅
p
(
t
)
=
p
(
t
)
p(t+1) = M\cdot p(t)=p(t)
p(t+1)=M⋅p(t)=p(t)
then
p
(
t
)
p(t)
p(t) is stationary distribution of a random walk
Since
r
=
M
⋅
r
r=M\cdot r
r=M⋅r,
r
r
r is a stationary distribution for the random walk
6). Eigenvector Formulation
The flow equation
1
⋅
r
=
M
⋅
r
1 \cdot r=M\cdot r
1⋅r=M⋅r. So the rank vector
r
r
r is an eigenvector of the stochastic adjacency matrix
M
M
M with eigenvalue 1
PageRank = Limiting distribution = principal eigenvector of
M
M
M,
r
r
r is the principal eigenvector of
M
M
M eigenvalue 1
2. PageRank: How to Solve
Given a graph with n n n nodes, we use an iterative procedure:
- Assign each node an initial page rank
- Repeat until convergence ( ∑ i ∣ r i t + 1 − r i t ∣ < ϵ \sum_i|r_i^{t+1} - r_i^t|<\epsilon ∑i∣rit+1−rit∣<ϵ), where r j t + 1 = ∑ i → j r i d i r_j^{t+1}=\sum_{i\to j} \frac{r_i}{d_i} rjt+1=∑i→jdiri
1). Power Iteration Method
Given a web graph with N N N nodes, where the nodes are pages and edges are hyperlinks
- Initialize: r 0 = [ 1 / N , … , 1 / N ] T r^0=[1/N, \dots,1/N]^T r0=[1/N,…,1/N]T
- Iterate: r t + 1 = M ⋅ r t r^{t+1}=M\cdot r^t rt+1=M⋅rt
- Stop when ∣ r t + 1 − r t ∣ < ϵ |r^{t+1} - r^t|<\epsilon ∣rt+1−rt∣<ϵ
About 50 iterations is sufficient to estimate the limiting solution
2). Problems
a). Dead ends
Some pages have no out-links
→
\to
→ cause importance to ‘leak out’
Solutions: teleports follow random teleport links with total probability 1.0 from dead-ends
- Adjust matrix accordingly
b). Spider traps
All out-links of some pages are within the group
→
\to
→ eventually absord all importance
Solutions: at each time step, the random surfer has two options
- With probability β \beta β, follow a link at random
- With probability 1 − β 1-\beta 1−β, jump to a random page
- Common values for β \beta β are in [0.8, 0.9]
Surfer will teleport out of spider trap within a few time steps
c). Why teleports solve the problems
Spider traps are not a problem, but with traps PageRank scores are not what we want
Solution: never get stuck in a spider trap by teleporting out of it in a finite number of steps
Dead-ends are a problem: the matrix is not column stochastic so our initial assumptions are not met
Solution: make matrix column stochastic by always teleporting when there is nowhere else to go
3). Solution: Random Teleports
PageRank equation
r
j
t
+
1
=
∑
i
→
j
β
r
i
d
i
+
(
1
−
β
)
1
N
r_j^{t+1}=\sum_{i\to j}\beta \frac{r_i}{d_i}+(1-\beta)\frac{1}{N}
rjt+1=i→j∑βdiri+(1−β)N1
The Google Matrix
G
G
G:
G
=
β
M
+
(
1
−
β
)
[
1
N
]
N
×
N
G=\beta M+(1-\beta)[\frac{1}{N}]_{N\times N}
G=βM+(1−β)[N1]N×N
We have a recursive problem:
r
=
G
⋅
r
r=G\cdot r
r=G⋅r and the Power method still works
3. Random Walk with Restarts and Personalized PageRank
1). Proximity on Graphs
- PageRank: teleports with uniform probability to any node in the network
- Personalized PageRank: ranks proximity of nodes to the teleport nodes S S S
- Proximity on Graphs: random walks with restarts - teleport back to the starting node
2). Random Walks
Idea
- Every node has some importance
- Importance gets evenly split among all edges and pushed to the neighbors
Given a set of QUERY_NODES, we simulate a random walk
- Make a step to a random neighbor and record the visit (visit count)
- With probability α \alpha α, restart the walk at one of the QUERY_NODES
- The nodes with the highest visit count have highest proximity to the QUERY_NODES
Benefits: the “similarity” considers
- Multiple connections
- Multiple paths
- Direct and indirect connections
- Degree of the node
3). PageRank Variants
PageRank: teleports to any node and nodes can have the same probability of the surfer landing
S
=
[
0.2
,
0.2
,
0.2
,
0.2
,
0.2
]
S=[0.2, 0.2, 0.2, 0.2, 0.2]
S=[0.2,0.2,0.2,0.2,0.2]
Topic-specific PageRank aka Personalized PageRank: teleports to a specific set of nodes and nodes can have different probabilities of the surfer landing
S
=
[
0.3
,
0
,
0.5
,
0.2
,
0
]
S=[0.3, 0, 0.5, 0.2, 0]
S=[0.3,0,0.5,0.2,0]
Random walks with restarts: Topic-specific PageRank where teleport is always to the same node
S
=
[
0
,
0
,
0
,
1
,
0
]
S=[0, 0, 0, 1, 0]
S=[0,0,0,1,0]
4. Matrix Factorization and Node Embeddings
0). Relationship between Node Embeddings and Matrix Factorization
Node embeddings
Objective: maximize
z
v
T
z
u
z_v^Tz_u
zvTzu for node pairs
(
u
,
v
)
(u, v)
(u,v) that are similar
Matrix factorization
Simplest node similarity: nodes
u
,
v
u,v
u,v are similar if they are connected by an edge (
z
v
T
z
u
=
A
u
v
z_v^Tz_u=A_{uv}
zvTzu=Auv and therefore
Z
T
Z
=
A
Z^TZ=A
ZTZ=A)
1). Matrix Factorization
- The embedding dimension (number of rows in Z Z Z) is much smaller than number of ndoes n n n
- Exact factorization A = Z T Z A=Z^TZ A=ZTZ is generally not possible
- However, we can learn Z Z Z approximately
- Objective: m i n Z ∣ ∣ A − Z T Z ∣ ∣ 2 \underset{Z}{min}||A-Z^TZ||_2 Zmin∣∣A−ZTZ∣∣2
- Conclusion: inner product decoder with node similarity defined by edge connectivity is equivalent to matrix factorization of A A A
2). Random Walk-based Similarity
DeepWalk and node2vec have a more complex node similarity definition based on random walks
- DeepWalk is equivalent to matrix factorization of the following matrix expression
log ( v o l ( G ) ( 1 T ∑ r = 1 T ( D − 1 A ) ) D − 1 ) − log b \log(vol(G)(\frac{1}{T}\sum_{r=1}^T(D^{-1}A))D^{-1})-\log b log(vol(G)(T1r=1∑T(D−1A))D−1)−logb - Node2vec can also be formulated as a more complex matrix factorization
3). Limitations
- Cannot obtain embeddings for nodes not in the training set
If some new nodes are added at test time (e.g., new user in a social network), we need to recompute all node embeddings - Cannot capture structural similarity
If two nodes are far from each other, they will have very different embeddings because it is unlikely that a random walk will reach one node from the other one. - Cannot utilize node, edge, and graph features
Solutions: Deep Representation Learning and Graph Neural Networks