文章目录
@article{wang2019orthogonal,
title={Orthogonal Convolutional Neural Networks.},
author={Wang, Jiayun and Chen, Yubei and Chakraborty, Rudrasis and Yu, Stella X},
journal={arXiv: Computer Vision and Pattern Recognition},
year={2019}}
概
本文提出了一种正交化CNN的方法.
主要内容
符号说明
X
∈
R
N
×
C
×
H
×
W
X \in \mathbb{R}^{N \times C \times H \times W}
X∈RN×C×H×W: 输入
K
∈
R
M
×
C
×
k
×
k
K \in \mathbb{R}^{M \times C \times k \times k}
K∈RM×C×k×k: 卷积核
Y
∈
R
N
×
M
×
H
′
×
W
′
Y \in \mathbb{R}^{N \times M \times H' \times W'}
Y∈RN×M×H′×W′: 输出
Y
=
C
o
n
v
(
K
,
X
)
Y= Conv(K,X)
Y=Conv(K,X)
Y = C o n v ( K , X ) Y=Conv(K,X) Y=Conv(K,X)的俩种表示
Y = K X ~ Y=K\tilde{X} Y=KX~
此时 K ∈ R M × C k 2 K\in \mathbb{R}^{M \times Ck^2} K∈RM×Ck2, 每一行相当于一个卷积核, X ~ ∈ R C k 2 × H ′ W ′ \tilde{X} \in \mathbb{R}^{Ck^2 \times H'W'} X~∈RCk2×H′W′, Y ∈ R M × H ′ W ′ Y \in \mathbb{R}^{M \times H'W'} Y∈RM×H′W′.
Y = K X Y=\mathcal{K}X Y=KX
此时 X ∈ R C H W X \in \mathbb{R}^{CHW} X∈RCHW相当于将一张图片拉成条, K ∈ R M H W ′ × C H W \mathcal{K} \in \mathbb{R}^{MHW' \times CHW} K∈RMHW′×CHW, 同样每一次行列作内积相当于一次卷积操作, Y ∈ R M H ′ W ′ Y \in \mathbb{R}^{MH'W'} Y∈RMH′W′.
kernel orthogonal regularization
相当于要求
K
K
T
=
I
KK^T=I
KKT=I(行正交) 或者
K
T
K
=
I
K^TK=I
KTK=I(列正交), 正则项为
L
k
o
r
t
h
−
r
o
w
=
∥
K
K
T
−
I
∥
F
,
L
k
o
r
t
h
−
c
o
l
=
∥
K
T
K
−
I
∥
F
.
L_{korth-row}= \|KK^T-I\|_F,\\ L_{korth-col}= \|K^TK-I\|_F.
Lkorth−row=∥KKT−I∥F,Lkorth−col=∥KTK−I∥F.
作者在最新的论文版本中说明了, 这二者是等价的.
orthogonal convolution
作者期望的便是 K K T = I \mathcal{K}\mathcal{K}^T=I KKT=I或者 K T K = I \mathcal{K}^T\mathcal{K}=I KTK=I.
用 K ( i h w , ⋅ ) \mathcal{K}(ihw,\cdot) K(ihw,⋅)表示第 ( i − 1 ) H ′ W ′ + ( h − 1 ) W ′ + w (i-1) H'W'+(h-1)W'+w (i−1)H′W′+(h−1)W′+w行, 对应的 K ( ⋅ , i h w ) \mathcal{K}(\cdot, ihw) K(⋅,ihw)表示 ( i − 1 ) H W + ( h − 1 ) W + w (i-1) HW+(h-1)W+w (i−1)HW+(h−1)W+w列.
则
K
K
T
=
I
\mathcal{K}\mathcal{K}^T=I
KKT=I等价于
⟨
K
(
i
h
1
w
1
,
⋅
)
,
K
(
j
h
2
w
2
,
⋅
)
⟩
=
{
1
,
(
i
,
h
1
,
w
1
)
=
(
j
,
h
2
,
w
2
)
0
,
e
l
s
e
.
(5)
\tag{5} \langle \mathcal{K}(ih_1w_1, \cdot), \mathcal{K}(jh_2w_2,\cdot)\rangle = \left \{ \begin{array}{ll} 1, & (i,h_1,w_1)=(j,h_2,w_2) \\ 0, & else. \end{array} \right.
⟨K(ih1w1,⋅),K(jh2w2,⋅)⟩={1,0,(i,h1,w1)=(j,h2,w2)else.(5)
K
T
K
=
I
\mathcal{K}^T\mathcal{K}=I
KTK=I等价于
⟨
K
(
⋅
,
i
h
1
w
1
)
,
K
(
⋅
,
j
h
2
w
2
)
⟩
=
{
1
,
(
i
,
h
1
,
w
1
)
=
(
j
,
h
2
,
w
2
)
0
,
e
l
s
e
.
(10)
\tag{10} \langle \mathcal{K}(\cdot, ih_1w_1), \mathcal{K}(\cdot, jh_2w_2)\rangle = \left \{ \begin{array}{ll} 1, & (i,h_1,w_1)=(j,h_2,w_2) \\ 0, & else. \end{array} \right.
⟨K(⋅,ih1w1),K(⋅,jh2w2)⟩={1,0,(i,h1,w1)=(j,h2,w2)else.(10)
实际上这么作是由很多冗余的, 可以进一步化为更简单的形式.
(5)等价于
C
o
n
v
(
K
,
K
,
p
a
d
d
i
n
g
=
P
,
s
t
r
i
d
e
=
S
)
=
I
r
0
,
(7)
\tag{7} Conv(K, K,padding=P, stride=S)=I_{r0},
Conv(K,K,padding=P,stride=S)=Ir0,(7)
其中
I
r
0
∈
R
M
×
M
×
(
2
P
/
S
+
1
)
×
(
2
P
/
S
+
1
)
I_{r0}\in \mathbb{R}^{M\times M \times (2P/S+1) \times (2P/S+1)}
Ir0∈RM×M×(2P/S+1)×(2P/S+1)仅在
[
i
,
i
,
⌊
k
−
1
S
⌋
+
1
,
⌊
k
−
1
S
⌋
+
1
]
,
i
=
1
,
…
,
M
[i,i,\lfloor \frac{k-1}{S} \rfloor+1,\lfloor \frac{k-1}{S} \rfloor+1], i=1,\ldots, M
[i,i,⌊Sk−1⌋+1,⌊Sk−1⌋+1],i=1,…,M处为
1
1
1其余元素均为
0
0
0.
P
=
⌊
k
−
1
S
⌋
⋅
S
.
P= \lfloor \frac{k-1}{S} \rfloor \cdot S.
P=⌊Sk−1⌋⋅S.
其推导过程如下(这个实在不好写清楚):
K
T
K
\mathcal{K}^T\mathcal{K}
KTK在
S
=
1
S=1
S=1特殊情况下的特殊情况下, (10)等价于
C
o
n
v
(
K
T
,
K
T
,
p
a
d
d
i
n
g
=
k
−
1
,
s
t
r
i
d
e
=
1
)
=
I
c
0
,
(11)
\tag{11} Conv (K^T,K^T, padding=k-1, stride=1)=I_{c0},
Conv(KT,KT,padding=k−1,stride=1)=Ic0,(11)
其中
I
c
0
∈
R
C
×
C
×
(
2
k
−
1
)
×
(
2
k
−
1
)
I_{c0} \in \mathbb{R}^{C \times C \times (2k-1) \times (2k-1)}
Ic0∈RC×C×(2k−1)×(2k−1), 同样仅在
(
i
,
i
,
k
,
k
)
(i,i,k,k)
(i,i,k,k)处为1, 其余非零.
K
T
∈
R
C
×
M
×
k
×
k
K^T \in \mathbb{R}^{C \times M \times k \times k}
KT∈RC×M×k×k是
K
K
K的第1, 2坐标轴进行变换.
同样的
min
K
∥
K
K
T
−
I
∥
F
\min_K \|\mathcal{K}\mathcal{K}^T-I\|_F
Kmin∥KKT−I∥F
与
min
K
∥
K
T
K
−
I
∥
F
\min_K \|\mathcal{K}^T\mathcal{K}-I\|_F
Kmin∥KTK−I∥F
是等价的.
另一方面, 最开始提到的kernel orthogonal regularization是orthogonal convolution的必要条件(但不充分)
K
K
T
=
I
KK^T=I
KKT=I,
K
T
K
=
I
K^TK=I
KTK=I分别等价于:
C
o
n
v
(
K
,
K
,
p
a
d
d
i
n
g
=
0
)
=
I
r
0
C
o
n
v
(
K
T
,
K
T
,
p
a
d
d
i
n
g
=
0
)
=
I
c
0
,
Conv(K,K,padding=0)=I_{r0} \\ Conv(K^T, K^T, padding=0)=I_{c_0},
Conv(K,K,padding=0)=Ir0Conv(KT,KT,padding=0)=Ic0,
其中
I
r
0
∈
R
M
×
M
×
1
×
1
I_{r0} \in \mathbb{R}^{M \times M \times 1 \times 1}
Ir0∈RM×M×1×1,
I
c
0
∈
R
C
×
C
×
1
×
1
I_{c0} \in \mathbb{R}^{C \times C \times 1 \times 1}
Ic0∈RC×C×1×1.