文章目录
![在这里插入图片描述](https://i-blog.csdnimg.cn/blog_migrate/c557cb778da586978bfb7337d1eabea7.png)
MobileNet框架
深度可分离卷积
传统卷积操作是在卷积的过程中既进行卷积操作,同时将输入组合成新的输出。深度可分离卷积将这一步骤分成两步。如下图2.(a)就是一个标准卷积操作,分解成图2.(b)中的深度卷积和图2.©的1*1卷积。
为什么要分离呢?为了降低计算成本。
首先,标准卷积
输入
D
F
∗
D
F
∗
M
(
i
n
p
u
t
)
,
D
F
D_F*D_F*M(input),D_F
DF∗DF∗M(input),DF是输入特征图宽度和高度,
M
M
M是输入通道数。
卷积核
D
K
∗
D
K
∗
M
∗
N
(
k
e
r
n
e
l
)
,
D
K
D_K*D_K*M*N(kernel),D_K
DK∗DK∗M∗N(kernel),DK是卷积核的k_size,
M
M
M是输入通道数,
N
N
N是输出通道数。
输出
D
G
∗
D
G
∗
N
(
o
u
t
p
u
t
)
,
D
G
D_G*D_G*N(output),D_G
DG∗DG∗N(output),DG是输出特征图的宽度和高度,
N
N
N是输出深度。如下式
由此,标准卷积核的计算代价是:
D
K
∗
D
K
∗
M
∗
N
∗
D
F
∗
D
F
D_K*D_K*M*N*D_F*D_F
DK∗DK∗M∗N∗DF∗DF
D
e
p
t
h
w
i
s
e
s
e
p
a
r
a
b
l
e
c
o
n
v
o
l
u
t
i
o
n
=
d
e
p
t
h
c
o
n
v
o
l
u
t
i
o
n
s
+
p
o
i
n
t
w
i
s
e
c
o
n
v
o
l
u
t
i
o
n
s
(
1
∗
1
卷
积
)
Depthwise\ separable\ convolution=depth\ convolutions+pointwise\ convolutions(1*1卷积)
Depthwise separable convolution=depth convolutions+pointwise convolutions(1∗1卷积),如下式:
D
K
∗
D
K
∗
M
∗
D
F
∗
D
F
+
M
∗
N
∗
D
F
∗
D
F
D_K*D_K*M*D_F*D_F+M*N*D_F*D_F
DK∗DK∗M∗DF∗DF+M∗N∗DF∗DF
那么我们可以将两个计算量相除进行比较:
D
K
∗
D
K
∗
M
∗
D
F
∗
D
F
+
M
∗
N
∗
D
F
∗
D
F
D
K
∗
D
K
∗
M
∗
N
∗
D
F
∗
D
F
=
1
N
+
1
D
K
2
\frac{D_K*D_K*M*D_F*D_F+M*N*D_F*D_F}{D_K*D_K*M*N*D_F*D_F}=\frac{1}{N}+\frac{1}{D_K^2}
DK∗DK∗M∗N∗DF∗DFDK∗DK∗M∗DF∗DF+M∗N∗DF∗DF=N1+DK21
可以看出
1
N
+
1
D
K
2
\frac{1}{N}+\frac{1}{D_K^2}
N1+DK21小于零,那么可以验证前面的说法,减少了参数量。
网络结构
MobileNet如下表所示:
Conv dw:
Conv:
每层所占的资源
宽度乘数:缩小模型
Mobilenet中引入了宽度乘子
α
\alpha
α,输入通道的个数
M
M
M变成
α
M
\alpha M
αM,输出通道的个数
N
N
N变成
α
N
\alpha N
αN
通过宽度乘子的可分离卷积的计算代价就变为了:
D
K
∗
D
K
∗
α
M
∗
D
F
∗
D
F
+
α
M
∗
α
N
∗
D
F
∗
D
F
,
α
∈
(
0
,
1
]
D_K*D_K*\alpha M*D_F*D_F+\alpha M*\alpha N*D_F*D_F,\alpha \in(0,1]
DK∗DK∗αM∗DF∗DF+αM∗αN∗DF∗DF,α∈(0,1],宽度乘子可以用于任何模型结构,以定义一个新的更小的模型,具有合理的精度、延迟和大小。
分辨率乘数:减少表征数
文中还引入了一个超参数,是分辨率乘子
ρ
\rho
ρ。将分辨率乘子应用到输入图像中,在每一层的内部表征中都以相同的方法进行缩减。
他的计算代价为:
D
K
∗
D
K
∗
α
M
∗
ρ
D
F
∗
ρ
D
F
+
α
M
∗
α
N
∗
ρ
D
F
∗
ρ
D
F
,
α
∈
(
0
,
1
]
,
ρ
∈
(
0
,
1
]
D_K*D_K*\alpha M*\rho D_F*\rho D_F+\alpha M*\alpha N*\rho D_F*\rho D_F,\alpha \in(0,1],\rho \in(0,1]
DK∗DK∗αM∗ρDF∗ρDF+αM∗αN∗ρDF∗ρDF,α∈(0,1],ρ∈(0,1]
文中提到,
ρ
\rho
ρ通常是隐式设置的,因此网络输入分辨率为224、192、160或128。
实验结果
首先,如上表MobileNet的实验结果相较于全卷积模型准确率下降了1%,但是,他的数据量减少了将近9倍,而超参数减少了7倍。
然后,是宽度算子
α
=
0.75
\alpha=0.75
α=0.75,如下表,可以看出,在其他条件相同的条件下,thinner网络比原网络在准确度上提高了3%
表6、7表示
α
和
ρ
\alpha和\rho
α和ρ在MobileNet结构在精度、计算和大小相权衡的情况下的结果。
与其他网络比较结果
人脸属性分类任务
COCO数据集
与FaceNet人脸识别模型比较
代码
def _depthwise_separable_conv(inputs,
num_pwc_filters,
width_multiplier,
sc,
downsample=False):
""" Helper function to build the depth-wise separable convolution layer.
"""
num_pwc_filters = round(num_pwc_filters * width_multiplier)
_stride = 2 if downsample else 1
# skip pointwise by setting num_outputs=None
depthwise_conv = slim.separable_convolution2d(inputs,
num_outputs=None,
stride=_stride,
depth_multiplier=1,
kernel_size=[3, 3],
scope=sc+'/depthwise_conv')
bn = slim.batch_norm(depthwise_conv, scope=sc+'/dw_batch_norm')
pointwise_conv = slim.convolution2d(bn,
num_pwc_filters,
kernel_size=[1, 1],
scope=sc+'/pointwise_conv')
bn = slim.batch_norm(pointwise_conv, scope=sc+'/pw_batch_norm')
return bn
with tf.variable_scope(scope) as sc:
end_points_collection = sc.name + '_end_points'
with slim.arg_scope([slim.convolution2d, slim.separable_convolution2d],
activation_fn=None,
outputs_collections=[end_points_collection]):
with slim.arg_scope([slim.batch_norm],
is_training=is_training,
activation_fn=tf.nn.relu,
fused=True):
net = slim.convolution2d(inputs, round(32 * width_multiplier), [3, 3], stride=2, padding='SAME', scope='conv_1')
net = slim.batch_norm(net, scope='conv_1/batch_norm')
net = _depthwise_separable_conv(net, 64, width_multiplier, sc='conv_ds_2')
net = _depthwise_separable_conv(net, 128, width_multiplier, downsample=True, sc='conv_ds_3')
net = _depthwise_separable_conv(net, 128, width_multiplier, sc='conv_ds_4')
net = _depthwise_separable_conv(net, 256, width_multiplier, downsample=True, sc='conv_ds_5')
net = _depthwise_separable_conv(net, 256, width_multiplier, sc='conv_ds_6')
net = _depthwise_separable_conv(net, 512, width_multiplier, downsample=True, sc='conv_ds_7')
net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_8')
net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_9')
net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_10')
net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_11')
net = _depthwise_separable_conv(net, 512, width_multiplier, sc='conv_ds_12')
net = _depthwise_separable_conv(net, 1024, width_multiplier, downsample=True, sc='conv_ds_13')
net = _depthwise_separable_conv(net, 1024, width_multiplier, sc='conv_ds_14')
net = slim.avg_pool2d(net, [7, 7], scope='avg_pool_15')
end_points = slim.utils.convert_collection_to_dict(end_points_collection)
net = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
end_points['squeeze'] = net
logits = slim.fully_connected(net, num_classes, activation_fn=None, scope='fc_16')
predictions = slim.softmax(logits, scope='Predictions')
end_points['Logits'] = logits
end_points['Predictions'] = predictions
return logits, end_points
mobilenet.default_image_size = 224
总结
文中共提出了三个内容,一个是深度可分离卷积,另外两个是宽度乘法器和分辨率乘法器。深度可分离卷积在减少参数量上起到了明显的作用,而宽度乘法器和分辨率乘法器需要调整合适的大小,在参数量和精度之间进行权衡,得到一个既精度损失不大,而参数量减小的结果。