总结
通过拆解cnn的卷积核(用depth-wise和1x1的卷积核来代替原先卷积核),减小计算cost
细节
为了在效果和资源之间tradeoff的模型
普通的cnn,输入是
D
F
×
D
F
×
M
D_F \times D_F \times M
DF×DF×M(其中
D
F
D_F
DF是图像的height、width,
M
M
M是input channel/input depth),输出是
D
G
×
D
G
×
N
D_G \times D_G \times N
DG×DG×N(其中
D
G
D_G
DG是输出的图像height、width,
N
N
N是输出的图像depth)
要达到上面的效果,通常使用
D
K
D_K
DK维度的卷积核,这样计算的复杂度是:
D
F
⋅
D
F
⋅
M
⋅
D
K
⋅
D
K
⋅
N
D_F \cdot D_F \cdot M \cdot D_K \cdot D_K \cdot N
DF⋅DF⋅M⋅DK⋅DK⋅N
传统cnn有提取、组合特征的效果,在这里可以用depthwise separable convolution解决
提取depth wise的特征,用depth-wise convolution,即卷积核的大小为
D
K
⋅
D
k
⋅
1
D_K \cdot D_k \cdot 1
DK⋅Dk⋅1,共M
个,其计算cost为:
D
F
⋅
D
F
⋅
M
⋅
D
K
⋅
D
K
D_F \cdot D_F \cdot M \cdot D_K \cdot D_K
DF⋅DF⋅M⋅DK⋅DK
组合特征,用point-wise convolution,即1x1的卷积核,卷积核大小为
1
⋅
1
⋅
M
1 \cdot 1 \cdot M
1⋅1⋅M,共N
个,其计算cost为:
D
F
⋅
D
F
⋅
M
⋅
N
D_F \cdot D_F \cdot M \cdot N
DF⋅DF⋅M⋅N
上面两个联合起来的cost为:
D
F
⋅
D
F
⋅
M
⋅
D
K
⋅
D
K
+
D
F
⋅
D
F
⋅
M
⋅
N
D_F \cdot D_F \cdot M \cdot D_K \cdot D_K + D_F \cdot D_F \cdot M \cdot N
DF⋅DF⋅M⋅DK⋅DK+DF⋅DF⋅M⋅N
计算cost缩减了:
D
F
⋅
D
F
⋅
M
⋅
D
K
⋅
D
K
+
D
F
⋅
D
F
⋅
M
⋅
N
D
F
⋅
D
F
⋅
M
⋅
D
K
⋅
D
K
⋅
N
=
1
N
+
1
D
K
2
\frac{D_F \cdot D_F \cdot M \cdot D_K \cdot D_K + D_F \cdot D_F \cdot M \cdot N}{D_F \cdot D_F \cdot M \cdot D_K \cdot D_K \cdot N} = \frac{1}{N} + \frac{1}{D_K^2}
DF⋅DF⋅M⋅DK⋅DK⋅NDF⋅DF⋅M⋅DK⋅DK+DF⋅DF⋅M⋅N=N1+DK21
tricks
用了batch-norm,relu
avg-pooling
变种
thinner
卷积核的depth增加折损参数
α
\alpha
α,即最终cost为:
D
F
⋅
D
F
⋅
α
M
⋅
D
K
⋅
D
K
+
D
F
⋅
D
F
⋅
α
M
⋅
α
N
D_F \cdot D_F \cdot \alpha M \cdot D_K \cdot D_K + D_F \cdot D_F \cdot \alpha M \cdot \alpha N
DF⋅DF⋅αM⋅DK⋅DK+DF⋅DF⋅αM⋅αN,折损参数
α
\alpha
α可以缩减计算cost约
α
2
\alpha^2
α2
更节约资源型:给width、height增加折损参数 ρ \rho ρ,即最终cost为: ρ D F ⋅ ρ D F ⋅ α M ⋅ D K ⋅ D K + ρ D F ⋅ ρ D F ⋅ α M ⋅ α N \rho D_F \cdot \rho D_F \cdot \alpha M \cdot D_K \cdot D_K + \rho D_F \cdot \rho D_F \cdot \alpha M \cdot \alpha N ρDF⋅ρDF⋅αM⋅DK⋅DK+ρDF⋅ρDF⋅αM⋅αN,折损参数 ρ \rho ρ可以缩减计算cost约 ρ 2 \rho^2 ρ2
实验
数据集:ImageNet、Stanford Dogs dataset(
评估指标:accuracy