1.GCN
图上的图信号:
x
=
[
1
,
2
,
3
,
4
,
4
]
x=[1,2,3,4,4]
x=[1,2,3,4,4]
先滤波,再做非线性变换
x
′
=
g
θ
∗
x
k
x' = g_{\theta}* x_{k}
x′=gθ∗xk
x
k
+
1
=
σ
(
w
x
′
)
x_{k+1} = \sigma(wx')
xk+1=σ(wx′)
使用图卷积滤波:
g
θ
∗
x
=
U
g
θ
U
T
x
g_{\theta}* x=U g_{\theta}U^T x
gθ∗x=UgθUTx
其中:
L
=
I
N
−
D
−
1
/
2
A
D
−
1
/
2
=
U
Λ
U
T
L=I_N-D^{-1/2}AD^{-1/2}=U\Lambda U^T
L=IN−D−1/2AD−1/2=UΛUT,
g
θ
=
d
i
a
g
(
θ
)
g_{\theta} = diag(\theta)
gθ=diag(θ)
简化滤波过程:
1.1 切比雪夫近似
由Hammond et al. (2011)得,有切比雪夫近似:
g
θ
′
(
Λ
)
≈
∑
k
=
0
K
θ
k
′
T
k
(
Λ
~
)
g_{\theta'}(\Lambda) \approx \sum^{K}_{k=0} \theta'_k T_k(\tilde{\Lambda})
gθ′(Λ)≈k=0∑Kθk′Tk(Λ~)
其中: Λ ~ = 2 λ m a x Λ − I N \tilde{\Lambda}=\frac{2}{\lambda_{max}}\Lambda -I_N Λ~=λmax2Λ−IN, T k = 2 x T k − 1 ( x ) − T k − 2 ( x ) T_k=2xT_{k-1}(x)-T_{k-2}(x) Tk=2xTk−1(x)−Tk−2(x), T 0 ( x ) = 1 T_0(x)=1 T0(x)=1, T 1 ( x ) = x T_1(x)=x T1(x)=x
因此有:
g θ ′ ∗ x = U g θ ( Λ ) U T x ≈ U ∑ k = 0 K θ k ′ T k ( Λ ~ ) U T x = ∑ k = 0 K θ k ′ U T k ( Λ ~ ) U T x = ∑ k = 0 K θ k ′ T k ( U Λ ~ U T ) x = ∑ k = 0 K θ k ′ T k ( L ~ ) x g_{\theta'} * x =U g_{\theta}(\Lambda) U^T x \\ \approx U \sum^{K}_{k=0} \theta'_k T_k(\tilde{\Lambda}) U^Tx \\ = \sum^{K}_{k=0} \theta'_k U T_k(\tilde{\Lambda} )U^T x \\ = \sum^{K}_{k=0} \theta'_k T_k(U\tilde{\Lambda} U^T) x \\= \sum^{K}_{k=0} \theta'_k T_k(\tilde{L}) x gθ′∗x=Ugθ(Λ)UTx≈Uk=0∑Kθk′Tk(Λ~)UTx=k=0∑Kθk′UTk(Λ~)UTx=k=0∑Kθk′Tk(UΛ~UT)x=k=0∑Kθk′Tk(L~)x
其中: L ~ = 2 λ m a x L − I N \tilde{L}=\frac{2}{\lambda_{max}}L -I_N L~=λmax2L−IN
1.2 限制阶数K=1
令 K = 1 K=1 K=1, 因为 T 0 ( x ) = 1 , T 1 ( x ) = x T_0(x)=1, T_1(x)=x T0(x)=1,T1(x)=x,则有
g θ ′ ∗ x = ∑ k = 0 K θ k ′ T k ( L ~ ) x ≈ ( θ 0 T 0 ( L ~ ) + θ 1 T 1 ( L ~ ) ) x = ( θ 0 + θ 1 L ~ ) x g_{\theta'} * x = \sum^{K}_{k=0} \theta'_k T_k(\tilde{L}) x \\ \approx (\theta_0T_0(\tilde{L})+\theta_1T_1(\tilde{L}))x \\ = (\theta_0+\theta_1\tilde{L})x gθ′∗x=k=0∑Kθk′Tk(L~)x≈(θ0T0(L~)+θ1T1(L~))x=(θ0+θ1L~)x
其中: L ~ = 2 λ m a x L − I N \tilde{L}=\frac{2}{\lambda_{max}}L -I_N L~=λmax2L−IN
1.3 假设 λ m a x = 2 \lambda_{max}=2 λmax=2
假设
λ
m
a
x
=
2
\lambda_{max}=2
λmax=2,则有
L
~
=
L
−
I
N
\tilde{L}=L -I_N
L~=L−IN,有:
g
θ
′
∗
x
=
(
θ
0
+
θ
1
L
~
)
x
=
(
θ
0
+
θ
1
(
L
−
I
N
)
)
x
g_{\theta'} * x = (\theta_0+\theta_1\tilde{L})x \\=(\theta_0+\theta_1 (L -I_N))x
gθ′∗x=(θ0+θ1L~)x=(θ0+θ1(L−IN))x
1.4 设 θ 0 , θ 1 \theta_0, \theta_1 θ0,θ1
设
θ
0
=
−
θ
1
\theta_0=-\theta_1
θ0=−θ1,有
g
θ
′
∗
x
=
(
θ
0
+
θ
1
(
L
−
I
N
)
)
x
=
θ
(
I
N
−
L
+
I
N
)
x
=
θ
(
I
N
−
(
I
N
−
D
−
1
/
2
A
D
−
1
/
2
)
+
I
N
)
x
=
θ
(
I
N
+
D
−
1
/
2
A
D
−
1
/
2
)
x
g_{\theta'} * x =(\theta_0+\theta_1 (L -I_N))x \\=\theta(I_N-L+I_N)x \\=\theta(I_N-(I_N-D^{-1/2}AD^{-1/2})+I_N)x \\=\theta(I_N+D^{-1/2}AD^{-1/2})x
gθ′∗x=(θ0+θ1(L−IN))x=θ(IN−L+IN)x=θ(IN−(IN−D−1/2AD−1/2)+IN)x=θ(IN+D−1/2AD−1/2)x
1.5 renormalization trick
g θ ′ ∗ x = θ ( I N + D − 1 / 2 A D − 1 / 2 ) x ≈ θ ( D ~ − 1 / 2 A ~ D ~ − 1 / 2 ) x g_{\theta'} * x =\theta(I_N+D^{-1/2}AD^{-1/2})x \\ \approx \theta(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})x gθ′∗x=θ(IN+D−1/2AD−1/2)x≈θ(D~−1/2A~D~−1/2)x
1.6 总结
综上:
x
′
=
g
θ
′
∗
x
≈
θ
(
D
~
−
1
/
2
A
~
D
~
−
1
/
2
)
x
x' = g_{\theta'} * x \approx \theta(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})x
x′=gθ′∗x≈θ(D~−1/2A~D~−1/2)x
代入非线性方程,有:
x
(
k
+
1
)
=
σ
(
w
x
)
≈
σ
(
w
(
D
~
−
1
/
2
A
~
D
~
−
1
/
2
)
x
k
)
x_{(k+1)}=\sigma(wx) \\\approx\sigma(w(\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})x_{k})
x(k+1)=σ(wx)≈σ(w(D~−1/2A~D~−1/2)xk)
对于特征矩阵
X
X
X:
X
(
k
+
1
)
≈
σ
(
(
D
~
−
1
/
2
A
~
D
~
−
1
/
2
)
Θ
X
k
)
X_{(k+1)}\approx\sigma((\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})\Theta X_{k})
X(k+1)≈σ((D~−1/2A~D~−1/2)ΘXk)
2.SGC
2.1一阶切比雪夫滤波器
在GCN中,经过近似,一阶(K=1)切比雪夫滤波器近似为传播矩阵:
S
1
−
o
r
d
e
r
=
I
N
+
D
−
1
/
2
A
D
−
1
/
2
S_{1-order} =I_N+D^{-1/2}AD^{-1/2}
S1−order=IN+D−1/2AD−1/2
由于
L
=
I
N
−
D
−
1
/
2
A
D
−
1
/
2
L=I_N-D^{-1/2}AD^{-1/2}
L=IN−D−1/2AD−1/2, 因此有
S
1
−
o
r
d
e
r
=
2
I
N
−
L
S_{1-order}=2I_N-L
S1−order=2IN−L
x
′
=
S
1
−
o
r
d
e
r
x
=
(
I
N
+
D
−
1
/
2
A
D
−
1
/
2
)
x
=
(
2
I
N
−
L
)
x
=
(
2
I
N
−
U
Λ
U
T
)
x
=
(
2
U
U
−
1
−
U
Λ
U
T
)
x
=
U
(
2
I
−
Λ
)
U
T
x'=S_{1-order} \;x \\=(I_N+D^{-1/2}AD^{-1/2}) x \\=(2I_N-L)x \\=(2I_N-U\Lambda U^T)x \\=(2UU^{-1}-U\Lambda U^T)x \\=U(2I-\Lambda)U^T
x′=S1−orderx=(IN+D−1/2AD−1/2)x=(2IN−L)x=(2IN−UΛUT)x=(2UU−1−UΛUT)x=U(2I−Λ)UT
其中,由于L是实对称矩阵,因此有
U
T
=
U
−
1
U^T=U^{-1}
UT=U−1
由此可得:相当于滤波函数为
g
θ
(
Λ
)
=
2
I
−
Λ
g_\theta(\Lambda)=2I-\Lambda
gθ(Λ)=2I−Λ
也即
g
θ
(
λ
)
=
2
−
λ
g_\theta(\lambda)=2-\lambda
gθ(λ)=2−λ
其中,
λ
\lambda
λ是拉普拉斯矩阵
L
L
L的特征值,表示频率
在经过K次累积后(K层网络),有
g
θ
(
λ
)
K
=
(
2
−
λ
)
K
g_\theta(\lambda)^K=(2-\lambda)^K
gθ(λ)K=(2−λ)K
其函数图像为
2.2 增强正则化邻接矩阵
当GCN采用renormalization trick策略后,传播矩阵由
S
1
−
o
r
d
e
r
S_{1-order}
S1−order改为
S
~
a
d
j
\tilde{S}_{adj}
S~adj,其中:
S
~
a
d
j
=
D
~
−
1
/
2
A
~
D
~
−
1
/
2
\tilde{S}_{adj} = \tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}
S~adj=D~−1/2A~D~−1/2 其中
A
~
=
A
+
I
\tilde{A}=A+I
A~=A+I,
D
~
=
D
+
I
\tilde{D}=D+I
D~=D+I
相应的,定义增强正则化矩阵
L
~
=
I
N
−
D
~
−
1
/
2
A
~
D
~
−
1
/
2
\tilde{L}=I_N-\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2}
L~=IN−D~−1/2A~D~−1/2, 其特征值为
λ
~
\tilde{\lambda}
λ~
相应的,使用 S ~ a d j \tilde{S}_{adj} S~adj做传播矩阵,有
x ′ = S ~ a d j x = ( D ~ − 1 / 2 A ~ D ~ − 1 / 2 ) x = ( I N − L ~ ) x = ( I N − U Λ ~ U T ) x = ( U U − 1 − U Λ ~ U T ) x = U ( I − Λ ~ ) U T x'=\tilde{S}_{adj} \;x \\= (\tilde{D}^{-1/2}\tilde{A}\tilde{D}^{-1/2})x \\=(I_N-\tilde{L})x \\=(I_N-U\tilde{\Lambda} U^T)x \\=(UU^{-1}-U\tilde{\Lambda} U^T)x \\=U(I-\tilde{\Lambda})U^T x′=S~adjx=(D~−1/2A~D~−1/2)x=(IN−L~)x=(IN−UΛ~UT)x=(UU−1−UΛ~UT)x=U(I−Λ~)UT
也即
g
θ
(
λ
)
=
1
−
λ
~
g_\theta(\lambda)=1-\tilde{\lambda}
gθ(λ)=1−λ~
其中,
λ
\lambda
λ是拉普拉斯矩阵
L
L
L的特征值,表示频率
在经过K次累积后(K层网络),有
g
θ
(
λ
~
)
K
=
(
1
−
λ
~
)
K
g_\theta(\tilde{\lambda})^K=(1-\tilde{\lambda})^K
gθ(λ~)K=(1−λ~)K
SGC证明:
0
=
λ
0
<
λ
n
<
λ
n
~
<
λ
0=\lambda_0<\lambda_n<\tilde{\lambda_n}<\lambda
0=λ0<λn<λn~<λ
因此有图像
或句话说,renormalization trick策略使得传播矩阵的最大特征值变小了,在1.6左右,而不是原先的2
2.3 正则化邻接矩阵
为说明其优点,可以先考虑
S
a
d
j
=
D
−
1
/
2
A
D
−
1
/
2
S_{adj}=D^{-1/2}AD^{-1/2}
Sadj=D−1/2AD−1/2做传播矩阵,有
x
′
=
S
a
d
j
x
=
(
D
−
1
/
2
A
D
−
1
/
2
)
x
=
(
I
N
−
L
)
x
=
U
(
I
−
Λ
)
U
T
x'=S_{adj} \;x \\= (D^{-1/2}AD^{-1/2})x \\=(I_N-L)x \\=U(I-\Lambda)U^T
x′=Sadjx=(D−1/2AD−1/2)x=(IN−L)x=U(I−Λ)UT
也即
g
θ
(
λ
)
=
1
−
λ
g_\theta(\lambda)=1-\lambda
gθ(λ)=1−λ
在经过K次累积后(K层网络),有
g
θ
(
λ
)
K
=
(
1
−
λ
)
K
g_\theta(\lambda)^K=(1-\lambda)^K
gθ(λ)K=(1−λ)K
综上,三种传播矩阵 S 1 − o r d e r S_{1-order} S1−order, S a d j S_{adj} Sadj, S ~ a d j \tilde{S}_{adj} S~adj做传播矩阵,分别有滤波函数为
FAGCN
设计两个传播矩阵:
F
L
=
α
I
+
D
−
1
/
2
A
D
−
1
/
2
=
(
α
+
1
)
I
−
L
\mathcal{F}_L=\alpha I+D^{-1/2}AD^{-1/2} \\=(\alpha+1)I-L
FL=αI+D−1/2AD−1/2=(α+1)I−L
F L = α I − D − 1 / 2 A D − 1 / 2 = ( α − 1 ) I + L \mathcal{F}_L=\alpha I-D^{-1/2}AD^{-1/2} \\=(\alpha-1)I+L FL=αI−D−1/2AD−1/2=(α−1)I+L
分别相当于滤波函数
g 1 ( λ ) = ( 1 − λ + α ) g1(\lambda)=(1-\lambda+\alpha) g1(λ)=(1−λ+α) g 2 ( λ ) = ( λ − 1 + α ) g2(\lambda)=(\lambda-1+\alpha) g2(λ)=(λ−1+α)
其图像分别为