本文结构安排
- M-NMF
- LANE
- LINE
什么是Network Embedding?
LINE
-
[Information Network]
An information network is defined as G = ( V , E ) G = (V,E) G=(V,E), where V V V is the set
of vertices, each representing a data object and E E E is the
set of edges between the vertices, each representing a relationship between two data objects. Each edge e ∈ E e\in E e∈E is an ordered pair e = ( u , v ) e = (u,v) e=(u,v) and is associated with a weight w u v > 0 w_{uv} > 0 wuv>0, which indicates the strength of the relation. If G G G is undirected, we have ( u , v ) ≡ ( v , u ) (u,v) ≡ (v,u) (u,v)≡(v,u) and w u v ≡ w v u w_{uv} \equiv w_{vu} wuv≡wvu; if G is directed, we have ( u , v ) ≠ ( v , u ) (u,v) \neq (v,u) (u,v)̸=(v,u) and w u v ≠ w v u w uv \neq w vu wuv̸=wvu -
[First-order Proximity] The first-order proximity in a network is the local pairwise proximity between two vertices. For each pair of vertices linked by an edge ( u , v ) (u,v) (u,v), the weight on that edge, w u v w_{uv} wuv, indicates the first-order proximity between u and v. If no edge is observed between u and v, their first-order proximity is 0. The first-order proximity usually implies the similarity of two nodes in a real-world network.
LINE with First-order Proximity:The first-order proximity refers to the local pairwise proximity between the vertices in the network. For each undirected edge ( i , j ) (i,j) (i,j), the joint probability between vertex v i v_{i} vi and v j v_{j} vj as follows:
p 1 ( v i , v j ) = 1 1 + exp ( − u ⃗ i T ⋅ u ⃗ j ) p_{1}(v_{i},v_{j})=\frac{1}{1+\exp(-\vec{u}_{i}^{T} \cdot \vec{u}_{j})} p1(vi,vj)=1+exp(−uiT⋅uj)1
where $u_{i} \in R^{d} $ is the low-dimensional vector representation of vertex v i v_{i} vi . p ^ 1 ( i , j ) = w i j W \hat{p}_{1}(i,j) = \frac{w_{ij}}{W} p^1(i,j)=Wwij,where W = ∑ ( i , j ) ∈ E w i j W = \sum_{(i,j) \in E}^{ }w_{ij} W=∑(i,j)∈Ewij .
And its empirical probability can be defined as p ^ 1 ( i , j ) = w i j W \hat{p}_{1}(i,j)=\frac{w_{ij}}{W} p^1(i,j)=Wwij,where W = ∑ ( i , j ) ∈ E w i j W=\sum_{(i,j)\in E}^{ }w_{ij} W=∑(i,j)∈Ewij.To preserve the first-order proximity we can minimize the following objective function:
O 1 = d ( p ^ 1 ( ⋅ , ⋅ ) , p 1 ( ⋅ , ⋅ ) ) O_{1}=d(\hat{p}_{1}(\cdot,\cdot),p_{1}(\cdot,\cdot)) O1=d(p^1(⋅,⋅),p1(⋅,⋅))
where d ( ⋅ , ⋅ ) d(\cdot,\cdot) d(⋅,⋅) is the distance between two distributions. We choose to minimize the KL-divergence of two probability distributions. Replacing d ( ⋅ , ⋅ ) d(\cdot,\cdot) d(⋅,⋅) with KL-divergence and omitting some constants, we have:
O 1 = − ∑ ( i , j ) ∈ E w i j log p 1 ( v i , v j ) O_{1}=-\sum_{(i,j)\in E}^{ }w_{ij}\log p_{1}(v_{i},v_{j}) O1=−(i,j)∈E∑wijlogp1(vi,vj) -
[Second-order Proximity] The second-order proximity between a pair of vertices (u,v) in a network is the similarity between their neighborhood network structures. Mathematically, let p u = ( w u , 1 , . . . , w u , ∣ V ∣ ) p_{u} = (w_{u,1} ,...,w_{u,|V|}) pu=(wu,1,...,wu,∣V∣) denote the first-order proximity of u with all the other vertices,then the second-order proximity between u and v is determined by the similarity between p u and p v . If no vertex is linked from/to both u and v, the second-order proximity between u and v is 0.
The second-order proximity assumes that vertices sharing many connections to other vertices are similar to each other. In this case, each vertex is also treated as a specific “context” and vertices with similar distributions over the “contexts” are assumed to be similar.
Therefore, each vertex plays two roles: the vertex itself and a specific “context” of other vertices.We introduce two vectors u ⃗ i \vec{u}_{i} u