网络嵌入算法-Network Embedding-LINE/LANE/M-NMF

最新推荐文章于 2024-08-01 09:36:39 发布

泽泽馥泽泽

最新推荐文章于 2024-08-01 09:36:39 发布

阅读量3.3k

点赞数 1

分类专栏： Network Embedding 文章标签： LINE LANE M-NMF

本文链接：https://blog.csdn.net/Zhongsigen/article/details/83623557

版权

本文介绍了网络嵌入的基本概念，重点解析了LINE算法的一阶和二阶近似度，并详细阐述了M-NMF的优化过程。同时提及了LANE算法，但具体内容参见作者的其他博客文章。

摘要由CSDN通过智能技术生成

本文结构安排

M-NMF
LANE
LINE

什么是Network Embedding？

LINE

一阶二阶相似度衡量.png

[Information Network]
An information network is defined as $G = (V, E)$ , where $V$ is the set
of vertices, each representing a data object and $E$ is the
set of edges between the vertices, each representing a relationship between two data objects. Each edge $e\in E$ is an ordered pair $e = (u, v)$ and is associated with a weight $w_{uv} > 0$ , which indicates the strength of the relation. If $G$ is undirected, we have $(u, v) \equiv (v, u)$ and $w_{uv} \equiv w_{vu}$ ; if G is directed, we have $\neq (v,u)$ and $\neq w vu$
[First-order Proximity] The first-order proximity in a network is the local pairwise proximity between two vertices. For each pair of vertices linked by an edge $(u, v)$ , the weight on that edge, $w_{uv}$ , indicates the first-order proximity between u and v. If no edge is observed between u and v, their first-order proximity is 0. The first-order proximity usually implies the similarity of two nodes in a real-world network.

LINE with First-order Proximity:The first-order proximity refers to the local pairwise proximity between the vertices in the network. For each undirected edge $(i, j)$ , the joint probability between vertex $v_{i}$ and $v_{j}$ as follows:
$p_{1}(v_{i},v_{j})=\frac{1}{1+\exp(-\vec{u}_{i}^{T} \cdot \vec{u}_{j})}$
where $u_{i} \in R^{d} $ is the low-dimensional vector representation of vertex $v_{i}$ . $\hat{p}_{1}(i,j) = \frac{w_{ij}}{W}$ ,where $\sum_{(i,j) \in E}^{ }w_{ij}$ .
And its empirical probability can be defined as $\hat{p}_{1}(i,j)=\frac{w_{ij}}{W}$ ,where $W=\sum_{(i,j)\in E}^{ }w_{ij}$ .

To preserve the first-order proximity we can minimize the following objective function:
$O_{1}=d(\hat{p}_{1}(\cdot,\cdot),p_{1}(\cdot,\cdot))$
where $d(\cdot,\cdot)$ is the distance between two distributions. We choose to minimize the KL-divergence of two probability distributions. Replacing $d(\cdot,\cdot)$ with KL-divergence and omitting some constants, we have:
$O_{1}=-\sum_{(i,j)\in E}^{ }w_{ij}\log p_{1}(v_{i},v_{j})$
[Second-order Proximity] The second-order proximity between a pair of vertices (u,v) in a network is the similarity between their neighborhood network structures. Mathematically, let $p_{u} = (w_{u,1} ,...,w_{u,|V|})$ denote the first-order proximity of u with all the other vertices,then the second-order proximity between u and v is determined by the similarity between p u and p v . If no vertex is linked from/to both u and v, the second-order proximity between u and v is 0.

The second-order proximity assumes that vertices sharing many connections to other vertices are similar to each other. In this case, each vertex is also treated as a specific “context” and vertices with similar distributions over the “contexts” are assumed to be similar.
Therefore, each vertex plays two roles: the vertex itself and a specific “context” of other vertices.We introduce two vectors $\vec{u}_{i}$