Recurrent Graph Neural Networks
递归图神经网络(RecGNN)大多是图神经网络的开创性作品。 RecGNN旨在学习具有递归神经体系结构的节点表示。 他们假设图中的节点不断与其邻居交换信息/消息,直到达到稳定的平衡。 RecGNNs在概念上很重要,并启发了后来对卷积图神经网络的研究。 特别地,消息传递的思想被基于空间的卷积图神经网络所继承。[1]
1 A New Model for earning in raph Domains
[Marco Gori, 2005, 2] 最早提出了GCN的概念。
记 n n n为图上一顶点, x ⃗ n \vec{x}_n xn是顶点 n n n的状态(state), l ⃗ n \vec{l}_n ln是顶点 n n n的标签。相应的, x ⃗ ne [ n ] , l ⃗ ne [ n ] \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} xne[n],lne[n]是是顶点 n n n的邻居顶点的状态和标签。
-
transition function :
x ⃗ n = f w ( l ⃗ n , x ⃗ ne [ n ] , l ⃗ ne [ n ] ) , n ∈ N (1.1) \vec{x}_n = f_w \left( \vec{l}_n, \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} \right), \qquad n \in \mathcal{N} \tag{1.1} xn=fw(ln,xne[n],lne[n]),n∈N(1.1) -
output function :
o ⃗ n = g w ( x ⃗ n , l ⃗ n ) , n ∈ N (1.2) \vec{o}_n = g_w \left( \vec{x}_n, \vec{l}_n \right), \qquad n \in \mathcal{N} \tag{1.2} on=gw(xn,ln),n∈N(1.2)
将式(1.1)替换成下式:
x
⃗
n
=
∑
u
∈
N
h
w
(
l
⃗
n
,
x
⃗
u
,
l
⃗
u
)
,
n
∈
N
(1.3)
\vec{x}_n = \sum_{u \in \mathcal{N}} h_w \left( \vec{l}_n, \vec{x}_u, \vec{l}_u \right), \qquad n \in \mathcal{N} \tag{1.3}
xn=u∈N∑hw(ln,xu,lu),n∈N(1.3)
其中
h
w
h_w
hw可以用显式线性函数或者神经网络。
-
Linear GNN:
h w ( l ⃗ n , x ⃗ u , l ⃗ u ) = A n , u x ⃗ u + b n A n , u = μ s ⋅ ∣ ne [ u ] ∣ ⋅ R e s i z e ( ϕ w ( l ⃗ n , l ⃗ u ) ) b ⃗ n = ρ w ( l ⃗ n ) ϕ w : R 2 q × 1 → R s 2 × 1 ρ w : R q → R s μ ∈ ( 0 , 1 ) (1.4) \begin{aligned} h_w \left( \vec{l}_n, \vec{x}_u, \vec{l}_u \right) &= A_{n,u} \vec{x}_u + b_n \\ A_{n,u} &= \frac{\mu}{s \cdot |\text{ne}[u]|} \cdot Resize\left( \phi_w (\vec{l}_n, \vec{l}_u) \right) \\ \vec{b}_n &= \rho_w (\vec{l}_n)\\ \phi_w &: \reals^{2q \times 1} \rightarrow \reals^{s^2 \times 1} \\ \rho_w &: \reals^{q} \rightarrow \reals^{s} \\ \mu & \in (0, 1) \tag{1.4} \end{aligned} hw(ln,xu,lu)An,ubnϕwρwμ=An,uxu+bn=s⋅∣ne[u]∣μ⋅Resize(ϕw(ln,lu))=ρw(ln):R2q×1→Rs2×1:Rq→Rs∈(0,1)(1.4) -
Neural GNN:
h w h_w hw使用神经网络。
2 The Graph Neural Network Model
[Franco Scarselli, 2009, 3] 与 [Marco Gori, 2005, 2]相比多了边上的信息 l ⃗ co [ n ] \vec{l}_{\text{co}[n]} lco[n]。
x ⃗ n = f w ( l ⃗ n , l ⃗ co [ n ] , x ⃗ ne [ n ] , l ⃗ ne [ n ] ) o ⃗ n = g w ( x ⃗ n , l ⃗ n ) , n ∈ N (2.1) \begin{aligned} \vec{x}_n &= f_w \left( \vec{l}_n, \vec{l}_{\text{co}[n]}, \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} \right) \\ \vec{o}_n &= g_w \left( \vec{x}_n, \vec{l}_n \right), \qquad n \in \mathcal{N} \end{aligned} \tag{2.1} xnon=fw(ln,lco[n],xne[n],lne[n])=gw(xn,ln),n∈N(2.1)
相应的,
x
⃗
n
=
∑
u
∈
N
h
w
(
l
⃗
n
,
l
⃗
(
n
,
u
)
,
x
⃗
u
,
l
⃗
u
)
,
n
∈
N
(2.2)
\vec{x}_n = \sum_{u \in \mathcal{N}} h_w \left( \vec{l}_n, \vec{l}_{(n,u)}, \vec{x}_u, \vec{l}_u \right), \qquad n \in \mathcal{N} \tag{2.2}
xn=u∈N∑hw(ln,l(n,u),xu,lu),n∈N(2.2)
在训练上,先循环式(2.2)直到 ∥ x ⃗ n ( t ) − x ⃗ n ( t − 1 ) ∥ ≤ ϵ \|\vec{x}_n(t) - \vec{x}_n(t-1)\| \leq \epsilon ∥xn(t)−xn(t−1)∥≤ϵ,即达到稳定点,然后再经行BP反向传播,更新参数,接着继续循环式(2.2)。
3 Graph Echo State Networks
[Claudio Gallicchio, 2010, 4] 将transition function分成了:
-
local state transition function :
x t ( v ) = τ ( u ⃗ ( v ) , x t − 1 ( N ( v ) ) ) = f ( W in u ⃗ ( v ) , W ^ N x t − 1 ( N ( v ) ) ) (3.1) \begin{aligned} x_t(v) &= \tau \left( \vec{u}(v), x_{t-1}\left( \mathcal{N}(v) \right)\right) \\ &= f \left( W_{\text{in}} \vec{u}(v), \hat{W}_{\mathcal{N}} x_{t-1}\left( \mathcal{N}(v) \right) \right) \end{aligned} \tag{3.1} xt(v)=τ(u(v),xt−1(N(v)))=f(Winu(v),W^Nxt−1(N(v)))(3.1) -
global state transition function :
x t ( g ) = τ ^ ( g , x t − 1 ( g ) ) = ( f ( W in u ⃗ ( v 1 ) + W ^ v 1 x t − 1 ( g ) ) ⋮ f ( W in u ⃗ ( v ∣ V ∣ ) + W ^ v ∣ V ∣ x t − 1 ( g ) ) ) . (3.2) \begin{aligned} x_t(g) &= \hat{\tau} \left( g, x_{t-1}(g) \right) \\ &= \begin{pmatrix} f \left( W_{\text{in}} \vec{u}(v_1) + \hat{W}_{v_1} x_{t-1}(g) \right) \\ \vdots \\ f \left( W_{\text{in}} \vec{u}(v_{|\mathcal{V}|}) + \hat{W}_{v_{|\mathcal{V}|}} x_{t-1}(g) \right) \end{pmatrix}. \end{aligned} \tag{3.2} xt(g)=τ^(g,xt−1(g))=⎝⎜⎜⎜⎛f(Winu(v1)+W^v1xt−1(g))⋮f(Winu(v∣V∣)+W^v∣V∣xt−1(g))⎠⎟⎟⎟⎞.(3.2)
output function根据任务不同选择的函数也不一样:
- structure-to-structure:
y ⃗ ( v ) = g out ( x ⃗ ( v ) ) = W out x ⃗ ( v ) . (3.3) \vec{y}(v) = g_{\text{out}}(\vec{x}(v)) = W_{\text{out}} \vec{x}(v). \tag{3.3} y(v)=gout(x(v))=Woutx(v).(3.3)
- structure-to-element:
y ⃗ ( v ) = g out ( 1 ∣ V ∣ ∑ v ∈ V x ⃗ ( v ) ) = W out ( 1 ∣ V ∣ ∑ v ∈ V x ⃗ ( v ) ) . (3.4) \vec{y}(v) = g_{\text{out}} \left( \frac{1}{|\mathcal{V}|} \sum_{v \in \mathcal{V}} \vec{x}(v) \right) = W_{\text{out}} \left( \frac{1}{|\mathcal{V}|} \sum_{v \in \mathcal{V}} \vec{x}(v) \right). \tag{3.4} y(v)=gout(∣V∣1v∈V∑x(v))=Wout(∣V∣1v∈V∑x(v)).(3.4)
4 Gated Graph Sequence Neural Networks
[Yujia Li, 2015, 5] 在[Franco Scarselli, 2009, 3]基础上,将 l ⃗ co [ n ] \vec{l}_{\text{co}[n]} lco[n]分成出边和入边:
h ⃗ v ( t ) = f ∗ ( l ⃗ v , l ⃗ co [ v ] , l ⃗ ne [ v ] , h ⃗ ne [ v ] ( t − 1 ) ) = ∑ v ′ ∈ IN [ v ] f ( l ⃗ v , l ⃗ ( v ′ , v ) , l ⃗ v ′ , h ⃗ v ′ ( t − 1 ) ) + ∑ v ′ ∈ OUT [ v ] f ( l ⃗ v , l ⃗ ( v , v ′ ) , l ⃗ v ′ , h ⃗ v ′ ( t − 1 ) ) (4.1) \begin{aligned} \vec{h}_{v}^{(t)} &= f^{*}\left( \vec{l}_v, \vec{l}_{\text{co}[v]}, \vec{l}_{\text{ne}[v]}, \vec{h}_{\text{ne}[v]}^{(t-1)} \right) \\ &= \sum_{v^{'} \in \text{IN}[v]} f \left( \vec{l}_v, \vec{l}_{(v^{'}, v)}, \vec{l}_{v^{'}}, \vec{h}_{v^{'}}^{(t-1)} \right) +\sum_{v^{'} \in \text{OUT}[v]} f \left( \vec{l}_v, \vec{l}_{(v, v^{'})}, \vec{l}_{v^{'}}, \vec{h}_{v^{'}}^{(t-1)} \right) \end{aligned} \tag{4.1} hv(t)=f∗(lv,lco[v],lne[v],hne[v](t−1))=v′∈IN[v]∑f(lv,l(v′,v),lv′,hv′(t−1))+v′∈OUT[v]∑f(lv,l(v,v′),lv′,hv′(t−1))(4.1)
h
⃗
v
\vec{h}_v
hv使用GRU单元:
h
⃗
v
(
1
)
=
[
x
⃗
v
T
,
0
⃗
]
T
A
=
[
A
(out)
,
A
(in)
]
a
⃗
v
(
t
)
=
A
v
:
T
[
h
⃗
1
(
t
−
1
)
,
⋯
,
h
⃗
∣
V
∣
(
t
−
1
)
]
T
+
b
⃗
z
⃗
v
(
t
)
=
σ
(
W
z
a
⃗
v
(
t
)
+
U
z
h
⃗
v
(
t
−
1
)
)
r
⃗
v
(
t
)
=
σ
(
W
r
a
⃗
v
(
t
)
+
U
r
h
⃗
v
(
t
−
1
)
)
h
⃗
v
(
t
)
~
=
tanh
(
W
a
⃗
v
(
t
)
+
U
(
r
⃗
v
(
t
)
⊙
h
⃗
v
(
t
−
1
)
)
)
h
⃗
v
(
t
)
=
(
1
−
z
⃗
v
(
t
)
)
⊙
h
⃗
v
(
t
−
1
)
+
z
⃗
v
(
t
)
⊙
h
⃗
v
(
t
)
~
.
(4.2)
\begin{aligned} \vec{h}_v^{(1)} &= \left[\vec{x}_v^T, \vec{0} \right]^T \\ A &= \left[A^{\text{(out)}}, A^{\text{(in)}} \right] \\ \vec{a}_v^{(t)} &= A_{v:}^T\left [\vec{h}_{1}^{(t-1)}, \cdots, \vec{h}_{|\mathcal{V}|}^{(t-1)} \right]^T + \vec{b} \\ \vec{z}_v^{(t)} &= \sigma \left( W^{z} \vec{a}_v^{(t)} + U^{z} \vec{h}_v^{(t-1)} \right) \\ \vec{r}_v^{(t)} &= \sigma \left( W^{r} \vec{a}_v^{(t)} + U^{r} \vec{h}_v^{(t-1)} \right) \\ \widetilde{\vec{h}_v^{(t)}} &= \tanh \left( W \vec{a}_v^{(t)} + U \left( \vec{r}_v^{(t)} \odot \vec{h}_v^{(t-1)} \right) \right) \\ \vec{h}_v^{(t)} &= \left( 1 - \vec{z}_v^{(t)} \right) \odot \vec{h}_v^{(t-1)} + \vec{z}_v^{(t)} \odot \widetilde{\vec{h}_v^{(t)}}. \end{aligned} \tag{4.2}
hv(1)Aav(t)zv(t)rv(t)hv(t)
hv(t)=[xvT,0]T=[A(out),A(in)]=Av:T[h1(t−1),⋯,h∣V∣(t−1)]T+b=σ(Wzav(t)+Uzhv(t−1))=σ(Wrav(t)+Urhv(t−1))=tanh(Wav(t)+U(rv(t)⊙hv(t−1)))=(1−zv(t))⊙hv(t−1)+zv(t)⊙hv(t)
.(4.2)
output function :
h
⃗
G
=
tanh
(
∑
v
∈
V
σ
(
i
(
[
h
⃗
v
(
T
)
,
x
⃗
v
]
)
)
⊙
tanh
j
(
[
h
⃗
v
(
T
)
,
x
⃗
v
]
)
)
\vec{h}_{\mathcal{G}} = \tanh \left( \sum_{v \in \mathcal{V}} \sigma \left( i\left( [\vec{h}_v^{(T)}, \vec{x}_v]\right) \right) \odot \tanh j\left( [\vec{h}_v^{(T)}, \vec{x}_v]\right) \right)
hG=tanh(v∈V∑σ(i([hv(T),xv]))⊙tanhj([hv(T),xv]))
其中
i
(
.
)
,
j
(
.
)
i(.),j(.)
i(.),j(.)都是神经网络,以
[
h
⃗
v
(
T
)
,
x
⃗
v
]
[\vec{h}_v^{(T)}, \vec{x}_v]
[hv(T),xv]做输入。
输出序列 o ⃗ ( 1 ) , ⋯ , o ⃗ ( K ) \vec{o}^{(1)}, \cdots, \vec{o}^{(K)} o(1),⋯,o(K),对于第 k k k个输出,记 X ( k ) = [ x ⃗ 1 ( k ) , ⋯ , x ⃗ ∣ V ∣ ( k ) ] ∈ R ∣ V ∣ × L V \mathcal{X}^{(k)} = \left[ \vec{x}_{1}^{(k)}, \cdots, \vec{x}_{|\mathcal{V}|}^{(k)} \right] \in \reals^{|\mathcal{V}| \times L_{\mathcal{V}}} X(k)=[x1(k),⋯,x∣V∣(k)]∈R∣V∣×LV,在第 t t t步为 H ( k , t ) = [ h ⃗ 1 ( k , t ) , ⋯ , h ⃗ ∣ V ∣ ( k , t ) ] ∈ R ∣ V ∣ × D \mathcal{H}^{(k,t)} = \left[ \vec{h}_{1}^{(k,t)}, \cdots, \vec{h}_{|\mathcal{V}|}^{(k,t)} \right] \in \reals^{|\mathcal{V}| \times D } H(k,t)=[h1(k,t),⋯,h∣V∣(k,t)]∈R∣V∣×D。结构如下图:
在使用
H
(
k
,
T
)
\mathcal{H}^{(k,T)}
H(k,T)预测
X
k
+
1
)
\mathcal{X}^{k+1)}
Xk+1)时,我们向模型当中引入了节点标注。每个节点的预测都是相互独立的,使用神经网络
j
(
[
h
⃗
v
(
k
,
T
)
,
x
⃗
v
(
k
)
]
)
j\left( [\vec{h}_v^{(k,T)}, \vec{x}_v^{(k)}]\right)
j([hv(k,T),xv(k)])来完成:
x
⃗
v
(
k
+
1
)
=
σ
(
j
(
[
h
⃗
v
(
k
,
T
)
,
x
⃗
v
(
k
)
]
)
)
\vec{x}_v^{(k+1)} = \sigma \left( j\left( [\vec{h}_v^{(k,T)}, \vec{x}_v^{(k)}]\right) \right)
xv(k+1)=σ(j([hv(k,T),xv(k)]))
训练上也与[Franco Scarselli, 2009, 3]不同,而是采用随时间步逐步BP。
5 Learning Steady-States of Iterative Algorithms over Graphs
[Hanjun Dai, 2018, 6] 同样是采用了不动点的原理,中间迭代过程为:
h
⃗
v
(
0
)
←
constant
,
∀
v
∈
V
h
⃗
v
(
t
+
1
)
←
T
(
{
h
⃗
u
(
t
)
}
u
∈
N
(
v
)
)
,
∀
t
≥
1.
(5.1)
\begin{aligned} \vec{h}_v^{(0)} &\leftarrow \text{constant} , &\forall v \in \mathcal{V} \\ \vec{h}_v^{(t+1)} &\leftarrow \mathcal{T} \left( \{ \vec{h}_u^{(t)} \}_{u \in \mathcal{N}(v) } \right), &\forall t \geq 1. \end{aligned} \tag{5.1}
hv(0)hv(t+1)←constant,←T({hu(t)}u∈N(v)),∀v∈V∀t≥1.(5.1)
稳定状态为:
h
⃗
v
∗
=
T
(
{
h
⃗
u
∗
}
u
∈
N
(
v
)
)
,
∀
v
∈
V
.
(5.2)
\vec{h}_v^{*} = \mathcal{T} \left( \{ \vec{h}_u^{*} \}_{u \in \mathcal{N}(v) } \right), \quad \forall v \in \mathcal{V}. \tag{5.2}
hv∗=T({hu∗}u∈N(v)),∀v∈V.(5.2)
式(5.1)的 T \mathcal{T} T其实就是transition function。
-
transition function :
T Θ [ { h u ^ } u ∈ N ( v ) ] = W 1 σ ( W 2 [ x ⃗ v , ∑ u ∈ N ( v ) [ h ^ u , x ⃗ u ] ] ) . (5.3) \mathcal{T}_{\Theta} \left[ \{ \widehat{ h_u } \}_{u \in \mathcal{N}(v) } \right] = W_1 \sigma \left( W_2 \left[ \vec{x}_v, \sum_{u \in \mathcal{N}(v)} \left[ \widehat{h}_u , \vec{x}_u \right] \right] \right). \tag{5.3} TΘ[{hu }u∈N(v)]=W1σ⎝⎛W2⎣⎡xv,u∈N(v)∑[h u,xu]⎦⎤⎠⎞.(5.3) -
output function :
g ( h ^ v ) = σ ( V 2 T ReLU ( V 1 T h ^ v ) ) . (5.4) g(\widehat{h}_v) = \sigma \left( V_2^T \text{ReLU} \left( V_1^T \widehat{h}_v \right) \right). \tag{5.4} g(h v)=σ(V2TReLU(V1Th v)).(5.4)
在训练过程中,使用了采样的操作,取
V
~
=
{
v
1
,
⋯
,
v
N
}
∈
V
\widetilde{\mathcal{V}} = \{ v_1, \cdots, v_N \} \in \mathcal{V}
V
={v1,⋯,vN}∈V,则
h
^
v
i
\hat{h}_{v_i}
h^vi的更新过程为:
h
^
v
i
←
(
1
−
α
)
h
^
v
i
+
α
T
Θ
[
{
h
u
^
}
u
∈
N
(
v
)
]
,
∀
v
i
∈
V
~
.
(5.5)
\hat{h}_{v_i} \leftarrow (1 - \alpha)\hat{h}_{v_i} + \alpha \mathcal{T}_{\Theta} \left[ \{ \widehat{ h_u } \}_{u \in \mathcal{N}(v) } \right], \quad \forall v_i \in \widetilde{\mathcal{V}}. \tag{5.5}
h^vi←(1−α)h^vi+αTΘ[{hu
}u∈N(v)],∀vi∈V
.(5.5)
参考文献
- 1 Wu Z, Pan S, Chen F, et al. A Comprehensive Survey on Graph Neural Networks.[J]. arXiv: Learning, 2019.
- 2 M. Gori, G. Monfardini and F. Scarselli, “A new model for learning in graph domains,” Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Montreal, Que., 2005, pp. 729-734 vol. 2.
- 3 Scarselli F, Gori M, Tsoi A C, et al. The Graph Neural Network Model[J]. IEEE Transactions on Neural Networks, 2009, 20(1): 61-80.
- 4 Gallicchio C, Micheli A. Graph Echo State Networks[C]. international joint conference on neural network, 2010: 1-8.
- 5 Li Y, Tarlow D, Brockschmidt M, et al. Gated Graph Sequence Neural Networks[J]. arXiv: Learning, 2016.
- 6 Dai H, Kozareva Z, Dai B, et al. Learning Steady-States of Iterative Algorithms over Graphs[C]. international conference on machine learning, 2018: 1106-1114.