Node Embeddings
1. Graph Represnetation Learning
Graph represnetation learning alleviates the need to do feature engineering every single time (automatically learn the features)
Goal: efficient task-independent feature learning for machine learning with graphs
Why embedding?
- Similarity of embeddings between nodes indicates their similarity in the netwrok
- Encode network information
- Potantially used for many downstream predictions (node classification, link prediction, graph prediction, anomalous node detection, clustering…)
2. Node Embeddings: Encoder and Decoder
Goal: encode nodes so that similarity in the embedding space approximates similarity in the graph
a) Encoder ENC maps from nodes to embeddings (a low-dimensional vector)
b) Define a node similarity function (i.e., a measure of similarity in the original network)
c) Decoder DEC maps from embeddings to the similarity score
d) Optimize the parameters of the encoder so that similarity ( u , v ) ≈ z v T z u (u, v)\approx z_v^Tz_u (u,v)≈zvTzu
1). “Shallow” Encoding
Simplest encoding approach: encoder is just an embedding-lookup so each node is assigned a unique embedding vector
ENV ( v ) = z v = Z ⋅ v \text{ENV}(v)=z_v=Z \cdot v ENV(v)=zv=Z⋅v
where Z Z Z is matrix and each column is a node embedding and v v v is an indicator vector with all zeroes excepy a one in column indicating node v v v
Methods: DeepWalk, node2vec
3. Random Walk Approaches for Node Embeddings
- Vector z u z_u zu is the embedding of node u u u
- Probability P ( v ∣ z u ) P(v|z_u) P(v∣zu) is the (predicted) probability of visiting node v v v on random walks starting from node u u u
- Random walk: given a graph and a starting point, we select one of its neighbors at random and move to this neighbor; then we select a neighbor of this point at random and move to it, etc. The (random) sequence of points visited this way is a random walk on the graph.
1). Random-walk Embeddings
z u T z v ≈ probability that u and v co-occur on a random walk over the graph z_u^Tz_v \approx \text{probability that \textit{u} and \textit{v} co-occur on a random walk over the graph} zuTzv≈probability that u and v co-occur on a random walk over the graph
- Estimate probability P R ( v ∣ u ) P_R(v|u) PR(v∣u) of visiting node v v v on a random walk starting from node u u u using the random walk strategy R R R
- Optimize embeddings to encode these random walk statistics
Why random walks?
- Expressivity: flexible stochastic definition of node similarity that incorporates both local and higher-order neighborhood information (If a random walk starting from node u u u visits v v v with high probability, u u u and v v v are similar)
- Efficiency: do not need to consider all node pairs when training; only need to consider pairs that co-occur on random walks
2). Unsupervised Feature Learning
Intuition: find embedding of nodes in d d d-dimensional space that preserves similarity
Idea: learn node embedding such that nearby nodes are close together in the network
N R ( u ) N_R(u) NR(u): neighborhood of u u u obtained by the strategy R R R
Goal: learn a mapping f : u → R