图卷积神经网络GCN---递归GCN代表作

Recurrent Graph Neural Networks

递归图神经网络(RecGNN)大多是图神经网络的开创性作品。 RecGNN旨在学习具有递归神经体系结构的节点表示。 他们假设图中的节点不断与其邻居交换信息/消息,直到达到稳定的平衡。 RecGNNs在概念上很重要,并启发了后来对卷积图神经网络的研究。 特别地,消息传递的思想被基于空间的卷积图神经网络所继承。[1]

1 A New Model for earning in raph Domains

[Marco Gori, 2005, 2] 最早提出了GCN的概念。

n n n为图上一顶点, x ⃗ n \vec{x}_n x n是顶点 n n n的状态(state), l ⃗ n \vec{l}_n l n是顶点 n n n的标签。相应的, x ⃗ ne [ n ] , l ⃗ ne [ n ] \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} x ne[n],l ne[n]是是顶点 n n n的邻居顶点的状态和标签。

  • transition function :
    x ⃗ n = f w ( l ⃗ n , x ⃗ ne [ n ] , l ⃗ ne [ n ] ) , n ∈ N (1.1) \vec{x}_n = f_w \left( \vec{l}_n, \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} \right), \qquad n \in \mathcal{N} \tag{1.1} x n=fw(l n,x ne[n],l ne[n]),nN(1.1)

  • output function :
    o ⃗ n = g w ( x ⃗ n , l ⃗ n ) , n ∈ N (1.2) \vec{o}_n = g_w \left( \vec{x}_n, \vec{l}_n \right), \qquad n \in \mathcal{N} \tag{1.2} o n=gw(x n,l n),nN(1.2)

在这里插入图片描述

将式(1.1)替换成下式:
x ⃗ n = ∑ u ∈ N h w ( l ⃗ n , x ⃗ u , l ⃗ u ) , n ∈ N (1.3) \vec{x}_n = \sum_{u \in \mathcal{N}} h_w \left( \vec{l}_n, \vec{x}_u, \vec{l}_u \right), \qquad n \in \mathcal{N} \tag{1.3} x n=uNhw(l n,x u,l u),nN(1.3)
其中 h w h_w hw可以用显式线性函数或者神经网络。

  • Linear GNN:
    h w ( l ⃗ n , x ⃗ u , l ⃗ u ) = A n , u x ⃗ u + b n A n , u = μ s ⋅ ∣ ne [ u ] ∣ ⋅ R e s i z e ( ϕ w ( l ⃗ n , l ⃗ u ) ) b ⃗ n = ρ w ( l ⃗ n ) ϕ w : R 2 q × 1 → R s 2 × 1 ρ w : R q → R s μ ∈ ( 0 , 1 ) (1.4) \begin{aligned} h_w \left( \vec{l}_n, \vec{x}_u, \vec{l}_u \right) &= A_{n,u} \vec{x}_u + b_n \\ A_{n,u} &= \frac{\mu}{s \cdot |\text{ne}[u]|} \cdot Resize\left( \phi_w (\vec{l}_n, \vec{l}_u) \right) \\ \vec{b}_n &= \rho_w (\vec{l}_n)\\ \phi_w &: \reals^{2q \times 1} \rightarrow \reals^{s^2 \times 1} \\ \rho_w &: \reals^{q} \rightarrow \reals^{s} \\ \mu & \in (0, 1) \tag{1.4} \end{aligned} hw(l n,x u,l u)An,ub nϕwρwμ=An,ux u+bn=sne[u]μResize(ϕw(l n,l u))=ρw(l n):R2q×1Rs2×1:RqRs(0,1)(1.4)

  • Neural GNN:
    h w h_w hw使用神经网络。

2 The Graph Neural Network Model

[Franco Scarselli, 2009, 3] 与 [Marco Gori, 2005, 2]相比多了边上的信息 l ⃗ co [ n ] \vec{l}_{\text{co}[n]} l co[n]

x ⃗ n = f w ( l ⃗ n , l ⃗ co [ n ] , x ⃗ ne [ n ] , l ⃗ ne [ n ] ) o ⃗ n = g w ( x ⃗ n , l ⃗ n ) , n ∈ N (2.1) \begin{aligned} \vec{x}_n &= f_w \left( \vec{l}_n, \vec{l}_{\text{co}[n]}, \vec{x}_{\text{ne}[n]}, \vec{l}_{\text{ne}[n]} \right) \\ \vec{o}_n &= g_w \left( \vec{x}_n, \vec{l}_n \right), \qquad n \in \mathcal{N} \end{aligned} \tag{2.1} x no n=fw(l n,l co[n],x ne[n],l ne[n])=gw(x n,l n),nN(2.1)

在这里插入图片描述

相应的,
x ⃗ n = ∑ u ∈ N h w ( l ⃗ n , l ⃗ ( n , u ) , x ⃗ u , l ⃗ u ) , n ∈ N (2.2) \vec{x}_n = \sum_{u \in \mathcal{N}} h_w \left( \vec{l}_n, \vec{l}_{(n,u)}, \vec{x}_u, \vec{l}_u \right), \qquad n \in \mathcal{N} \tag{2.2} x n=uNhw(l n,l (n,u),x u,l u),nN(2.2)

在训练上,先循环式(2.2)直到 ∥ x ⃗ n ( t ) − x ⃗ n ( t − 1 ) ∥ ≤ ϵ \|\vec{x}_n(t) - \vec{x}_n(t-1)\| \leq \epsilon x n(t)x n(t1)ϵ,即达到稳定点,然后再经行BP反向传播,更新参数,接着继续循环式(2.2)。

3 Graph Echo State Networks

[Claudio Gallicchio, 2010, 4] 将transition function分成了:

  • local state transition function :
    x t ( v ) = τ ( u ⃗ ( v ) , x t − 1 ( N ( v ) ) ) = f ( W in u ⃗ ( v ) , W ^ N x t − 1 ( N ( v ) ) ) (3.1) \begin{aligned} x_t(v) &= \tau \left( \vec{u}(v), x_{t-1}\left( \mathcal{N}(v) \right)\right) \\ &= f \left( W_{\text{in}} \vec{u}(v), \hat{W}_{\mathcal{N}} x_{t-1}\left( \mathcal{N}(v) \right) \right) \end{aligned} \tag{3.1} xt(v)=τ(u (v),xt1(N(v)))=f(Winu (v),W^Nxt1(N(v)))(3.1)

  • global state transition function :
    x t ( g ) = τ ^ ( g , x t − 1 ( g ) ) = ( f ( W in u ⃗ ( v 1 ) + W ^ v 1 x t − 1 ( g ) ) ⋮ f ( W in u ⃗ ( v ∣ V ∣ ) + W ^ v ∣ V ∣ x t − 1 ( g ) ) ) . (3.2) \begin{aligned} x_t(g) &= \hat{\tau} \left( g, x_{t-1}(g) \right) \\ &= \begin{pmatrix} f \left( W_{\text{in}} \vec{u}(v_1) + \hat{W}_{v_1} x_{t-1}(g) \right) \\ \vdots \\ f \left( W_{\text{in}} \vec{u}(v_{|\mathcal{V}|}) + \hat{W}_{v_{|\mathcal{V}|}} x_{t-1}(g) \right) \end{pmatrix}. \end{aligned} \tag{3.2} xt(g)=τ^(g,xt1(g))=f(Winu (v1)+W^v1xt1(g))f(Winu (vV)+W^vVxt1(g)).(3.2)

output function根据任务不同选择的函数也不一样:

  • structure-to-structure:

y ⃗ ( v ) = g out ( x ⃗ ( v ) ) = W out x ⃗ ( v ) . (3.3) \vec{y}(v) = g_{\text{out}}(\vec{x}(v)) = W_{\text{out}} \vec{x}(v). \tag{3.3} y (v)=gout(x (v))=Woutx (v).(3.3)

  • structure-to-element:

y ⃗ ( v ) = g out ( 1 ∣ V ∣ ∑ v ∈ V x ⃗ ( v ) ) = W out ( 1 ∣ V ∣ ∑ v ∈ V x ⃗ ( v ) ) . (3.4) \vec{y}(v) = g_{\text{out}} \left( \frac{1}{|\mathcal{V}|} \sum_{v \in \mathcal{V}} \vec{x}(v) \right) = W_{\text{out}} \left( \frac{1}{|\mathcal{V}|} \sum_{v \in \mathcal{V}} \vec{x}(v) \right). \tag{3.4} y (v)=gout(V1vVx (v))=Wout(V1vVx (v)).(3.4)

4 Gated Graph Sequence Neural Networks

[Yujia Li, 2015, 5] 在[Franco Scarselli, 2009, 3]基础上,将 l ⃗ co [ n ] \vec{l}_{\text{co}[n]} l co[n]分成出边和入边:

h ⃗ v ( t ) = f ∗ ( l ⃗ v , l ⃗ co [ v ] , l ⃗ ne [ v ] , h ⃗ ne [ v ] ( t − 1 ) ) = ∑ v ′ ∈ IN [ v ] f ( l ⃗ v , l ⃗ ( v ′ , v ) , l ⃗ v ′ , h ⃗ v ′ ( t − 1 ) ) + ∑ v ′ ∈ OUT [ v ] f ( l ⃗ v , l ⃗ ( v , v ′ ) , l ⃗ v ′ , h ⃗ v ′ ( t − 1 ) ) (4.1) \begin{aligned} \vec{h}_{v}^{(t)} &= f^{*}\left( \vec{l}_v, \vec{l}_{\text{co}[v]}, \vec{l}_{\text{ne}[v]}, \vec{h}_{\text{ne}[v]}^{(t-1)} \right) \\ &= \sum_{v^{'} \in \text{IN}[v]} f \left( \vec{l}_v, \vec{l}_{(v^{'}, v)}, \vec{l}_{v^{'}}, \vec{h}_{v^{'}}^{(t-1)} \right) +\sum_{v^{'} \in \text{OUT}[v]} f \left( \vec{l}_v, \vec{l}_{(v, v^{'})}, \vec{l}_{v^{'}}, \vec{h}_{v^{'}}^{(t-1)} \right) \end{aligned} \tag{4.1} h v(t)=f(l v,l co[v],l ne[v],h ne[v](t1))=vIN[v]f(l v,l (v,v),l v,h v(t1))+vOUT[v]f(l v,l (v,v),l v,h v(t1))(4.1)

h ⃗ v \vec{h}_v h v使用GRU单元:
h ⃗ v ( 1 ) = [ x ⃗ v T , 0 ⃗ ] T A = [ A (out) , A (in) ] a ⃗ v ( t ) = A v : T [ h ⃗ 1 ( t − 1 ) , ⋯   , h ⃗ ∣ V ∣ ( t − 1 ) ] T + b ⃗ z ⃗ v ( t ) = σ ( W z a ⃗ v ( t ) + U z h ⃗ v ( t − 1 ) ) r ⃗ v ( t ) = σ ( W r a ⃗ v ( t ) + U r h ⃗ v ( t − 1 ) ) h ⃗ v ( t ) ~ = tanh ⁡ ( W a ⃗ v ( t ) + U ( r ⃗ v ( t ) ⊙ h ⃗ v ( t − 1 ) ) ) h ⃗ v ( t ) = ( 1 − z ⃗ v ( t ) ) ⊙ h ⃗ v ( t − 1 ) + z ⃗ v ( t ) ⊙ h ⃗ v ( t ) ~ . (4.2) \begin{aligned} \vec{h}_v^{(1)} &= \left[\vec{x}_v^T, \vec{0} \right]^T \\ A &= \left[A^{\text{(out)}}, A^{\text{(in)}} \right] \\ \vec{a}_v^{(t)} &= A_{v:}^T\left [\vec{h}_{1}^{(t-1)}, \cdots, \vec{h}_{|\mathcal{V}|}^{(t-1)} \right]^T + \vec{b} \\ \vec{z}_v^{(t)} &= \sigma \left( W^{z} \vec{a}_v^{(t)} + U^{z} \vec{h}_v^{(t-1)} \right) \\ \vec{r}_v^{(t)} &= \sigma \left( W^{r} \vec{a}_v^{(t)} + U^{r} \vec{h}_v^{(t-1)} \right) \\ \widetilde{\vec{h}_v^{(t)}} &= \tanh \left( W \vec{a}_v^{(t)} + U \left( \vec{r}_v^{(t)} \odot \vec{h}_v^{(t-1)} \right) \right) \\ \vec{h}_v^{(t)} &= \left( 1 - \vec{z}_v^{(t)} \right) \odot \vec{h}_v^{(t-1)} + \vec{z}_v^{(t)} \odot \widetilde{\vec{h}_v^{(t)}}. \end{aligned} \tag{4.2} h v(1)Aa v(t)z v(t)r v(t)h v(t) h v(t)=[x vT,0 ]T=[A(out),A(in)]=Av:T[h 1(t1),,h V(t1)]T+b =σ(Wza v(t)+Uzh v(t1))=σ(Wra v(t)+Urh v(t1))=tanh(Wa v(t)+U(r v(t)h v(t1)))=(1z v(t))h v(t1)+z v(t)h v(t) .(4.2)

output function :
h ⃗ G = tanh ⁡ ( ∑ v ∈ V σ ( i ( [ h ⃗ v ( T ) , x ⃗ v ] ) ) ⊙ tanh ⁡ j ( [ h ⃗ v ( T ) , x ⃗ v ] ) ) \vec{h}_{\mathcal{G}} = \tanh \left( \sum_{v \in \mathcal{V}} \sigma \left( i\left( [\vec{h}_v^{(T)}, \vec{x}_v]\right) \right) \odot \tanh j\left( [\vec{h}_v^{(T)}, \vec{x}_v]\right) \right) h G=tanh(vVσ(i([h v(T),x v]))tanhj([h v(T),x v]))
其中 i ( . ) , j ( . ) i(.),j(.) i(.),j(.)都是神经网络,以 [ h ⃗ v ( T ) , x ⃗ v ] [\vec{h}_v^{(T)}, \vec{x}_v] [h v(T),x v]做输入。

输出序列 o ⃗ ( 1 ) , ⋯   , o ⃗ ( K ) \vec{o}^{(1)}, \cdots, \vec{o}^{(K)} o (1),,o (K),对于第 k k k个输出,记 X ( k ) = [ x ⃗ 1 ( k ) , ⋯   , x ⃗ ∣ V ∣ ( k ) ] ∈ R ∣ V ∣ × L V \mathcal{X}^{(k)} = \left[ \vec{x}_{1}^{(k)}, \cdots, \vec{x}_{|\mathcal{V}|}^{(k)} \right] \in \reals^{|\mathcal{V}| \times L_{\mathcal{V}}} X(k)=[x 1(k),,x V(k)]RV×LV,在第 t t t步为 H ( k , t ) = [ h ⃗ 1 ( k , t ) , ⋯   , h ⃗ ∣ V ∣ ( k , t ) ] ∈ R ∣ V ∣ × D \mathcal{H}^{(k,t)} = \left[ \vec{h}_{1}^{(k,t)}, \cdots, \vec{h}_{|\mathcal{V}|}^{(k,t)} \right] \in \reals^{|\mathcal{V}| \times D } H(k,t)=[h 1(k,t),,h V(k,t)]RV×D。结构如下图:

在这里插入图片描述

在使用 H ( k , T ) \mathcal{H}^{(k,T)} H(k,T)预测 X k + 1 ) \mathcal{X}^{k+1)} Xk+1)时,我们向模型当中引入了节点标注。每个节点的预测都是相互独立的,使用神经网络 j ( [ h ⃗ v ( k , T ) , x ⃗ v ( k ) ] ) j\left( [\vec{h}_v^{(k,T)}, \vec{x}_v^{(k)}]\right) j([h v(k,T),x v(k)])来完成:
x ⃗ v ( k + 1 ) = σ ( j ( [ h ⃗ v ( k , T ) , x ⃗ v ( k ) ] ) ) \vec{x}_v^{(k+1)} = \sigma \left( j\left( [\vec{h}_v^{(k,T)}, \vec{x}_v^{(k)}]\right) \right) x v(k+1)=σ(j([h v(k,T),x v(k)]))

训练上也与[Franco Scarselli, 2009, 3]不同,而是采用随时间步逐步BP。

5 Learning Steady-States of Iterative Algorithms over Graphs

[Hanjun Dai, 2018, 6] 同样是采用了不动点的原理,中间迭代过程为:
h ⃗ v ( 0 ) ← constant , ∀ v ∈ V h ⃗ v ( t + 1 ) ← T ( { h ⃗ u ( t ) } u ∈ N ( v ) ) , ∀ t ≥ 1. (5.1) \begin{aligned} \vec{h}_v^{(0)} &\leftarrow \text{constant} , &\forall v \in \mathcal{V} \\ \vec{h}_v^{(t+1)} &\leftarrow \mathcal{T} \left( \{ \vec{h}_u^{(t)} \}_{u \in \mathcal{N}(v) } \right), &\forall t \geq 1. \end{aligned} \tag{5.1} h v(0)h v(t+1)constant,T({h u(t)}uN(v)),vVt1.(5.1)
稳定状态为:
h ⃗ v ∗ = T ( { h ⃗ u ∗ } u ∈ N ( v ) ) , ∀ v ∈ V . (5.2) \vec{h}_v^{*} = \mathcal{T} \left( \{ \vec{h}_u^{*} \}_{u \in \mathcal{N}(v) } \right), \quad \forall v \in \mathcal{V}. \tag{5.2} h v=T({h u}uN(v)),vV.(5.2)

式(5.1)的 T \mathcal{T} T其实就是transition function。

  • transition function :
    T Θ [ { h u ^ } u ∈ N ( v ) ] = W 1 σ ( W 2 [ x ⃗ v , ∑ u ∈ N ( v ) [ h ^ u , x ⃗ u ] ] ) . (5.3) \mathcal{T}_{\Theta} \left[ \{ \widehat{ h_u } \}_{u \in \mathcal{N}(v) } \right] = W_1 \sigma \left( W_2 \left[ \vec{x}_v, \sum_{u \in \mathcal{N}(v)} \left[ \widehat{h}_u , \vec{x}_u \right] \right] \right). \tag{5.3} TΘ[{hu }uN(v)]=W1σW2x v,uN(v)[h u,x u].(5.3)

  • output function :
    g ( h ^ v ) = σ ( V 2 T ReLU ( V 1 T h ^ v ) ) . (5.4) g(\widehat{h}_v) = \sigma \left( V_2^T \text{ReLU} \left( V_1^T \widehat{h}_v \right) \right). \tag{5.4} g(h v)=σ(V2TReLU(V1Th v)).(5.4)

在训练过程中,使用了采样的操作,取 V ~ = { v 1 , ⋯   , v N } ∈ V \widetilde{\mathcal{V}} = \{ v_1, \cdots, v_N \} \in \mathcal{V} V ={v1,,vN}V,则 h ^ v i \hat{h}_{v_i} h^vi的更新过程为:
h ^ v i ← ( 1 − α ) h ^ v i + α T Θ [ { h u ^ } u ∈ N ( v ) ] , ∀ v i ∈ V ~ . (5.5) \hat{h}_{v_i} \leftarrow (1 - \alpha)\hat{h}_{v_i} + \alpha \mathcal{T}_{\Theta} \left[ \{ \widehat{ h_u } \}_{u \in \mathcal{N}(v) } \right], \quad \forall v_i \in \widetilde{\mathcal{V}}. \tag{5.5} h^vi(1α)h^vi+αTΘ[{hu }uN(v)],viV .(5.5)

在这里插入图片描述

参考文献

  • 2
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值