Graph Auto-encoder
文章目录
- Graph Auto-encoder
- 1 Structural Deep Network Embedding
- 2 Deep neural networks for learning graph representations
- 3 Variational Graph Auto-Encoders
- 4 MGAE: Marginalized Graph Autoencoder for Graph Clustering
- 5 Learning Deep Network Representations with Adversarially Regularized Autoencoders
- 6 Deep Recursive Network Embedding with Regular Equivalence
- 7 Adversarially Regularized Graph Autoencoder for Graph Embedding
- 参考文献
1 Structural Deep Network Embedding
[Wang D, 2016, 1] 提出的SDNE使用自动编码器思想对图数据经行嵌入。原文提出了一阶、二阶近似,希望编码前后的顶点一阶、二阶近似关系保持不变。
-
First-Order Proximity The first-order proximity describes the pairwise proximity between vertexes. For any pair of vertexes, if a i j > 0 a_{ij} > 0 aij>0, there exists positive first-order proximity between v i v_i vi and v j v_j vj. Otherwise, the first-order proximity between v i v_i vi and v j v_j vj is 0.
-
Second-Order Proximity The second-order proximity between a pair of vertexes describes the proximity of the pair’s neighborhood structure. Let N u = { a u , 1 , ⋯ , a u , ∣ V ∣ } \mathcal{N}_u = \{ a_{u,1}, \cdots, a_{u,|\mathcal{V}|} \} Nu={au,1,⋯,au,∣V∣} denote the first-order proximity between v u v_u vu and other vertexes. Then, secondorder proximity is determined by the similarity of N u \mathcal{N}_u Nu and N v \mathcal{N}_v Nv.
卷积:
y
⃗
i
(
0
)
=
x
⃗
i
,
y
⃗
i
(
k
)
=
σ
(
W
(
k
)
y
⃗
i
(
k
−
1
)
+
b
⃗
(
k
)
)
,
k
=
1
,
⋯
,
K
.
(1.1)
\begin{aligned} \vec{y}_i^{(0)} &= \vec{x}_i, \\ \vec{y}_i^{(k)} &= \sigma \left( W^{(k)} \vec{y}_i^{(k-1)} + \vec{b}^{(k)} \right), \quad k = 1, \cdots, K. \end{aligned} \tag{1.1}
yi(0)yi(k)=xi,=σ(W(k)yi(k−1)+b(k)),k=1,⋯,K.(1.1)
重构损失:
L
=
∑
i
=
1
n
∥
x
⃗
^
i
−
x
⃗
i
∥
2
2
.
(1.2)
\mathcal{L} = \sum_{i=1}^{n} \| \hat{\vec{x}}_i - \vec{x}_i \|_2^2. \tag{1.2}
L=i=1∑n∥x^i−xi∥22.(1.2)
一阶近似损失:
L
1st
=
∑
i
,
j
=
1
n
a
i
,
j
∥
y
⃗
i
(
K
)
−
y
⃗
j
(
K
)
∥
2
2
.
(1.3)
\mathcal{L}_{\text{1st}} = \sum_{i,j=1}^{n} a_{i,j} \| \vec{y}_i^{(K)} - \vec{y}_j^{(K)} \|_2^2. \tag{1.3}
L1st=i,j=1∑nai,j∥yi(K)−yj(K)∥22.(1.3)
二阶近似损失:
L
2nd
=
∑
i
=
1
n
∥
(
x
⃗
^
i
−
x
⃗
i
)
⊙
b
⃗
i
∥
2
2
,
=
∥
(
X
^
−
X
)
⊙
B
∥
F
2
.
(1.4)
\begin{aligned} \mathcal{L}_{\text{2nd}} &= \sum_{i=1}^{n} \| \left( \hat{\vec{x}}_i - \vec{x}_i \right) \odot \vec{b}_i \|_2^2,\\ &= \| \left( \hat{X} - X \right) \odot B \|_F^2. \end{aligned} \tag{1.4}
L2nd=i=1∑n∥(x^i−xi)⊙bi∥22,=∥(X^−X)⊙B∥F2.(1.4)
当 a i , j = 0 a_{i,j}=0 ai,j=0时, b ⃗ i = 1 ⃗ \vec{b}_i = \vec{1} bi=1;如果 a i , j > 0 a_{i,j}>0 ai,j>0,则 b ⃗ i \vec{b}_i bi是一个大于1的超参数。通过自编码器,如果两个节点具有相近的邻接点结构,则在表示空间中距离越近。
惩罚项:
L
reg
=
1
2
∑
k
=
1
K
(
∥
W
(
k
)
∥
F
2
+
∥
W
^
(
k
)
∥
F
2
)
.
(1.5)
\mathcal{L}_{\text{reg}} = \frac{1}{2} \sum_{k=1}^{K} \left( \| W^{(k)} \|_F^2 + \| \hat{W}^{(k)} \|_F^2 \right). \tag{1.5}
Lreg=21k=1∑K(∥W(k)∥F2+∥W^(k)∥F2).(1.5)
总的损失为:
L
mix
=
L
2nd
+
α
L
1st
+
γ
L
reg
.
(1.6)
\mathcal{L}_{\text{mix}} = \mathcal{L}_{\text{2nd}} + \alpha \mathcal{L}_{\text{1st}} + \gamma \mathcal{L}_{\text{reg}}. \tag{1.6}
Lmix=L2nd+αL1st+γLreg.(1.6)
2 Deep neural networks for learning graph representations
[Cao S, 2016, 2] DNGR结合了随机游走和深度自动编码器。该模型由3部分组成:随机游走、正点互信息(PPMI)计算和叠加去噪自编码器。
- 随机游走:
给定游走最大步长和邻居矩阵:
p
k
=
α
p
k
−
1
A
+
(
1
−
α
)
p
0
.
(2.1)
p_k = \alpha p_{k-1} A + (1 - \alpha) p_0. \tag{2.1}
pk=αpk−1A+(1−α)p0.(2.1)
其中
α
\alpha
α是平滑参数。
-
PPMI矩阵:
用随机游走得到的概率共发生矩阵计算PPMI矩阵,
PMI w , c = log ( # ( w , c ) ⋅ ∑ w , c # ( w , c ) # ( w ) ⋅ # ( v ) ) , X w , c = PPMI w , c = max ( PMI w , c , 0 ) . (2.2) \begin{aligned} \text{PMI}_{w,c} &= \log \left( \frac{\#(w,c) \cdot \sum_{w,c} \#(w,c)}{\#(w) \cdot \#(v)} \right), \\ X_{w,c} &= \text{PPMI}_{w,c} = \max (\text{PMI}_{w,c}, 0). \end{aligned} \tag{2.2} PMIw,cXw,c=log(#(w)⋅#(v)#(w,c)⋅∑w,c#(w,c)),=PPMIw,c=max(PMIw,c,0).(2.2) -
用 X X X作自动编码器的输入。
损失为:
min
θ
1
,
θ
2
∑
i
=
1
N
L
(
x
(
i
)
,
g
θ
2
(
f
θ
1
(
x
~
(
i
)
)
)
)
.
(2.3)
\min_{\theta_1, \theta_2} \sum_{i=1}^{N} \mathcal{L} \left( x^{(i)}, g_{\theta_2} ( f_{\theta_1} ( \tilde{x}^{(i)} ) ) \right). \tag{2.3}
θ1,θ2mini=1∑NL(x(i),gθ2(fθ1(x~(i)))).(2.3)
3 Variational Graph Auto-Encoders
[Kipf T, 2016, 3] 将VAE应用于图数据上。对于无权重图 G = ( V , E ) \mathcal{G} = (\mathcal{V},\mathcal{E}) G=(V,E), A ∈ R N × N A \in \reals^{N \times N} A∈RN×N是邻居矩阵, D ∈ R N × N D \in \reals^{N \times N} D∈RN×N是度对角阵, X ∈ R N × D X \in \reals^{N \times D} X∈RN×D是顶点的特征向量矩阵, Z ∈ R N × F Z \in \reals^{N \times F} Z∈RN×F是编码后的特征向量矩阵。
- Inference model:
Inference model主要的是生成
X
X
X的新表达,即编码。
q
(
Z
∣
X
,
A
)
=
∏
i
=
1
N
q
(
z
⃗
i
∣
X
,
A
)
.
(3.1)
q(Z | X, A) = \prod_{i=1}^{N} q(\vec{z}_i | X, A). \tag{3.1}
q(Z∣X,A)=i=1∏Nq(zi∣X,A).(3.1)
其中:
q
(
z
⃗
i
∣
X
,
A
)
=
N
(
z
⃗
i
∣
μ
⃗
i
,
diag
(
σ
⃗
i
2
)
)
,
μ
⃗
=
GCN
μ
(
X
,
A
)
,
log
σ
⃗
=
GCN
σ
(
X
,
A
)
,
GCN
(
X
,
A
)
=
A
~
ReLU
(
A
~
X
W
0
)
W
1
,
A
~
=
D
−
1
2
A
D
−
1
2
.
(3.2)
\begin{aligned} q(\vec{z}_i | X, A) &= \mathcal{N} \left( \vec{z}_i | \vec{\mu}_i, \text{diag}(\vec{\sigma}_i^2) \right) ,\\ \vec{\mu} &= \text{GCN}_{\mu}(X, A),\\ \log \vec{\sigma} &= \text{GCN}_{\sigma}(X, A),\\ \text{GCN}(X, A) &= \tilde{A} \text{ReLU}(\tilde{A} X W_0) W_1,\\ \tilde{A} &= D^{-\frac{1}{2}} A D^{-\frac{1}{2}}. \end{aligned} \tag{3.2}
q(zi∣X,A)μlogσGCN(X,A)A~=N(zi∣μi,diag(σi2)),=GCNμ(X,A),=GCNσ(X,A),=A~ReLU(A~XW0)W1,=D−21AD−21.(3.2)
GCN
μ
(
X
,
A
)
\text{GCN}_{\mu}(X, A)
GCNμ(X,A)和
GCN
σ
(
X
,
A
)
\text{GCN}_{\sigma}(X, A)
GCNσ(X,A)共享
W
0
W_0
W0。
- Generative model:
Generative model是根据编码 Z Z Z生成邻接矩阵,重构图。
P ( A ∣ Z ) = ∏ i = 1 N ∏ j = 1 N p ( A i j ∣ z ⃗ i , z ⃗ j ) . (3.3) P(A|Z) = \prod_{i=1}^{N} \prod_{j=1}^{N} p(A_{ij}|\vec{z}_i, \vec{z}_j). \tag{3.3} P(A∣Z)=i=1∏Nj=1∏Np(Aij∣zi,zj).(3.3)
其中:
p
(
A
i
j
=
1
∣
z
⃗
i
,
z
⃗
j
)
=
σ
(
z
⃗
i
T
z
⃗
j
)
.
(3.4)
p(A_{ij} = 1|\vec{z}_i, \vec{z}_j) = \sigma (\vec{z}_i^T \vec{z}_j). \tag{3.4}
p(Aij=1∣zi,zj)=σ(ziTzj).(3.4)
- Learning:
损失函数:
L
=
E
q
(
Z
∣
X
,
A
)
[
log
p
(
A
∣
Z
)
]
−
KL
[
q
(
Z
∣
X
,
A
)
∥
p
(
Z
)
]
.
(3.5)
\mathcal{L} = \mathbb{E}_{q(Z | X, A)} \left[ \log p (A|Z) \right] - \text{KL} \left[ q(Z | X, A) \| p(Z) \right]. \tag{3.5}
L=Eq(Z∣X,A)[logp(A∣Z)]−KL[q(Z∣X,A)∥p(Z)].(3.5)
原文还提出了Non-probabilistic GAE:
A
^
=
σ
(
Z
Z
T
)
,
Z
=
GCN
(
X
,
A
)
.
(3.6)
\begin{aligned} \hat{A} &= \sigma(Z Z^T),\\ Z &= \text{GCN}(X, A). \end{aligned} \tag{3.6}
A^Z=σ(ZZT),=GCN(X,A).(3.6)
4 MGAE: Marginalized Graph Autoencoder for Graph Clustering
[Wang C, 2017, 4] 显示给出了一层卷积后的最佳参数
W
W
W:
∥
X
−
D
~
−
1
2
A
~
D
~
−
1
2
X
W
∥
2
+
λ
∥
W
∥
F
2
.
(4.1)
\| X - \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} X W \|^2 + \lambda\| W \|_F^2. \tag{4.1}
∥X−D
−21A
D
−21XW∥2+λ∥W∥F2.(4.1)
对输入的图随机选择保留
m
m
m个顶点,得到
[
X
~
1
,
⋯
,
X
~
m
]
[ \widetilde{X}_1, \cdots, \widetilde{X}_m ]
[X
1,⋯,X
m],记
X
ˉ
=
1
m
∑
i
=
1
m
X
~
i
\bar{X} = \frac{1}{m} \sum_{i=1}^{m}\widetilde{X}_i
Xˉ=m1∑i=1mX
i,则:
W
=
X
ˉ
T
(
D
~
−
1
2
A
~
D
~
−
1
2
)
T
X
(
b
a
r
X
T
(
D
~
−
1
2
A
~
D
~
−
1
2
)
T
(
D
~
−
1
2
A
~
D
~
−
1
2
)
X
)
−
1
.
(4.2)
W = \bar{X}^T \left( \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} \right)^T X \left( bar{X}^T \left( \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} \right)^T \left( \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} \right) X \right)^{-1}. \tag{4.2}
W=XˉT(D
−21A
D
−21)TX(barXT(D
−21A
D
−21)T(D
−21A
D
−21)X)−1.(4.2)
编码器为:
Z
=
(
D
~
−
1
2
A
~
D
~
−
1
2
)
X
W
.
(4.3)
Z = \left( \widetilde{D}^{-\frac{1}{2}} \widetilde{A} \widetilde{D}^{-\frac{1}{2}} \right) X W. \tag{4.3}
Z=(D
−21A
D
−21)XW.(4.3)
5 Learning Deep Network Representations with Adversarially Regularized Autoencoders
[Yu W, 2018, 5] 在自动编码器上加入对抗生成网络。
这个网络需要先由随机游走作为网络的输入,然后训练编码器、解码器、生成器、判别器。训练过程如下图。
6 Deep Recursive Network Embedding with Regular Equivalence
[Tu K, 2018, 6] 提出了两种等价定义。
-
(Structural Equivalence) We denote s ( u ) = s ( v ) s(u) = s(v) s(u)=s(v) if nodes u u u and v v v are structurally equivalent. Then s ( u ) = s ( v ) s(u) = s(v) s(u)=s(v) if and only if N ( u ) = N ( v ) \mathcal{N}(u) = \mathcal{N}(v) N(u)=N(v).
-
(Regular Equivalence) We denote r ( u ) = r ( v ) r (u) = r (v) r(u)=r(v) if nodes u u u and v v v are regularly equivalent. Then r ( u ) = r ( v ) r (u) = r (v) r(u)=r(v) if and only if { r ( i ) ∣ i ∈ N ( u ) } = { r ( j ) ∣ j ∈ N ( u ) } \{r (i)|i \in \mathcal{N}(u) \} = \{r (j)|j \in \mathcal{N}(u)\} {r(i)∣i∈N(u)}={r(j)∣j∈N(u)}.
嵌入过程:
- 对一个顶点找出邻居顶点,将它们的特征向量按顶点的度排序;
- 输入到LSTM中;
- LSTM输出便是顶点的重构。
7 Adversarially Regularized Graph Autoencoder for Graph Embedding
[Pan S, 2018, 7] 在[Kipf T, 2016, 3]的基础上更换了卷积操作,并且加入了对抗机制,但是与[Yu W, 2018, 5]不同,只有判别器没有生成器。
非概率编码器重构损失:
L
0
=
E
q
(
Z
∣
(
X
,
A
)
)
[
log
p
(
A
^
∣
Z
)
]
.
(7.1)
\mathcal{L}_0 = \mathbb{E}_{q(Z|(X,A))} [\log p(\hat{A}|Z)]. \tag{7.1}
L0=Eq(Z∣(X,A))[logp(A^∣Z)].(7.1)
变分编码器重构损失:
L
1
=
E
q
(
Z
∣
X
,
A
)
[
log
p
(
A
^
∣
Z
)
−
KL
[
q
(
Z
∣
X
,
A
)
∥
p
(
Z
)
]
]
.
(7.2)
\mathcal{L}_1 = \mathbb{E}_{q(Z|X,A)} \left[ \log p(\hat{A}|Z) - \text{KL} \left[ q(Z | X, A) \| p(Z) \right] \right]. \tag{7.2}
L1=Eq(Z∣X,A)[logp(A^∣Z)−KL[q(Z∣X,A)∥p(Z)]].(7.2)
对抗损失:
min
G
max
D
E
z
⃗
∼
p
z
[
log
D
(
Z
)
]
+
E
x
⃗
∼
p
(
x
⃗
)
[
log
(
1
−
D
(
G
(
X
,
A
)
)
)
]
.
(7.3)
\min_{\mathcal{G}} \max_{\mathcal{D}} \mathbb{E}_{\vec{z} \sim p_z} \left[ \log \mathcal{D}(Z) \right] + \mathbb{E}_{\vec{x} \sim p(\vec{x})} \left[ \log \left( 1 - \mathcal{D}(\mathcal{G}(X,A)) \right) \right]. \tag{7.3}
GminDmaxEz∼pz[logD(Z)]+Ex∼p(x)[log(1−D(G(X,A)))].(7.3)