MobileNet V1 复古的直筒子结构 --温故而知新

最新推荐文章于 2022-10-24 17:33:07 发布

Lebhoryi

最新推荐文章于 2022-10-24 17:33:07 发布

阅读量305

点赞数 2

分类专栏： tensorflow 文章标签： mobilenet 深度学习设计轻量级网络

本文链接：https://blog.csdn.net/weixin_37598106/article/details/107859280

版权

tensorflow 专栏收录该内容

16 篇文章 1 订阅

订阅专栏

原创: Lebhoryi@gmail.com
时间: 2020/08/06

文章目录

0x00 Paper
0x01 前言
0x02 让人兴奋的点
0x03 参考文献

0x00 Paper

paper: MobileNet
code: mobilenet.py

0x01 前言

1.1 为了解决什么问题

为了获取更高的准确率，常用的方法是加深网络和使用更复杂的网络结构，而忽略模型大小和推理速度。

对于如何获取更小更高效的网络来说，常用的方法是压缩提前训练好的模型或者是直接训练更好的模型，对于模型的大小有极高的追求，但是对于推理速度有所欠缺。

本文就针对模型大小和推理速度同时做了优化，为获取一个更小更快的网络。

1.2 派系

小且高效的神经网络

1.2.1 压缩预先训练好的模型

Method	Paper	Read
quantization	Quantized Convolutional neural networks for mobile devices. arXiv preprint arXiv:1512.06473 , 2015
hash	Compressing neural networks with the hashing trick. CoRR, abs/1504.04788 , 2015.
pruning, quantization, huffman coding	Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149 , 2, 2015

1.2.2 设计更小的神经网络

Method	Paper	Read
factorizations	J. Jin, A. Dundar, and E. Culurciello. Flattened Convolutional neural networks for feedforward acceleration. arXiv preprint arXiv:1412.5474 , 2014
	Factorized Convolutional neural networks. arXiv preprint arXiv:1608.04337, 2016
	peeding up Convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 , 2014.
	Speeding-up Convolutional neural networks using fine-tuned cpecomposition. arXiv preprint arXiv:1412.6553 , 2014
low bit networks	Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 , 2014.
	Quantized neural networks: Training neural networks with low precision weights and activations. arXiv preprint arXiv:1609.07061 , 2016.
	Xnornet: Imagenet classification using binary Convolutional neural networks. arXiv preprint arXiv:1603.05279, 2016
queezenet	queezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 1mb model size. arXiv preprint arXiv:1602.07360 , 2016
distillation	Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 , 2015

1.2.3 other

Other	Read
Structured transforms for small-footprint deep learning. In Advances in Neural Information Processing Systems , pages 3088–3096, 2015.
Deep fried convnets. In Proceedings of the IEEE International Conference on Computer Vision , pages 1476–1483, 2015

0x02 让人兴奋的点

得益于factorization Convolutions(分解卷积)， MobileNet 将普通的 CNN 过程分解成了两步：Depthwise Convolution & Pointwise Convolution

2.1 正常人的卷积

一个卷积核，二维输入

输入尺寸 $5 * 5 * 2$ ，一个卷积核，卷积核大小 $3 * 3$ , 做对应的卷积计算：
一个卷积，三维输入

当输入是三通道的时候，变化开始了

输入尺寸 $6 * 6 * 3$ , 一个卷积核，卷积核大小 $3 * 3 * 3$
多个卷积核，三维输入

$Params = (k_w * k_h * C_{in} + 1) * C_{out}$

$FLOPs = (k_w * k_h * C_{in} + 1) * C_{out} *H * W$

不考虑偏置：

$Params = k_w * k_h * C_{in} * C_{out}$

$FLOPs = k_w * k_h * C_{in} * C_{out} *H * W$

$k_h$ 、 $k_w$ 是卷积核的 hight、weight， $C_{in}$ 表示输入的通道数， $C_{out}$ 表示卷积核的数量，即输出通道数， $H$ 、 $W$ 表示输出特征图的 height、weight

在涉及到多个卷积核的时候，这就产生了两个步骤：

1. Convolution
2. Combine

而这个，就是接下来 MobileNet 整一篇文章的核心思想，将上述两个步骤分别拆分，强行不让它一步工作到位，而是分步计算，这就是 dpethwise Convolution 和 Pointwise Convolution 的由来。

2.2 Depthwise Convolution

Depthwise Convolution 中文译文深度卷积，深度卷积的操作和普通的卷积操作类似，但是有两个不同的地方：

缺失了Combine 这一步骤，只有卷积
一个通道数只使用一个卷积核

正常卷积计算过程(这部分看不懂的请往回看)：

假设卷积核的数量(即输出通道数)为1，输入通道数为M，单个卷积核的通道数即为M，一定与输入通道数量相同，将一个M通道的卷积核对输入做一次卷积 + 加权 计算，输出 $H * W * 1$ 大小的特征图，有N个卷积核，输出的特征图大小为 $H * W * N$

而深度卷积的计算过程：

假设输入通道数为M，在卷积之前做了一个很巧的分割，将上述步骤中的一个卷积核的卷积 + 加权 过程剔除了相加，只保留卷积，那么输出的通道数即为M，输出的特征图大小并非 $H * W * 1$ ，而是变成了 $H * W * M$

若是还是不好理解，可以将M作为卷积核的数量，然后在计算过程中的输入通道数固定为1，也就是对每一个维度做一次卷积，而不是所有的维度做一次卷积计算

这个地方看的迷糊的请往下看，上述为什么强调第二点，是因为后续的参数量计算，理清这个概念对于后面的参数量计算相当友好

We use Depthwise Convolutions to apply a single filter per each input channel (input depth).

那么，该层的 Param 和 FLOPs 为：

$Params = (k_w * k_h * 1+ 1) *M$

$FLOPs = (k_w * k_h * 1+ 1) * M *H * W$

不考虑偏置：

$Params = k_w * k_h * 1 * M$

$FLOPs = k_w * k_h * 1* M *H * W$

Tensorflow2 中的代码如下，输出的通道数为 filters_in * depth_multiplier：

tf.keras.layers.DepthwiseConv2D(
    kernel_size, strides=(1, 1), padding='valid', depth_multiplier=1,
    data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True,
    depthwise_initializer='glorot_uniform', bias_initializer='zeros',
    depthwise_regularizer=None, bias_regularizer=None, activity_regularizer=None,
    depthwise_constraint=None, bias_constraint=None, **kwargs
)

2.3 Pointwise Convolution

大多数计算量集中在了这一层

Pointwise Convolution 中文译文逐点卷积。逐点卷积的主要功能是补上之前忽略掉的加权这一步骤，术语是Combine，在深度方向上进行加权组合，生成新的特征图。

该层的卷积核大小为 $1 * 1 * M$ ， M 是上一层的输出通道数，N 为该层卷积核的数量，则输出是 $H * W * N$ 。

那么，该层的 Param 和 FLOPs 为(不考虑偏置)：

$P a r a m s = 1 * 1 * M * N$

$F L O P s = 1 * 1 * M * N * H * W$

将上述两层的汇总，深度可分离的卷积的 Param 和 FLOPs 为：

$Params = k_w * k_h * 1 * M + 1 * 1 * M * N$

$FLOPs = k_w * k_h * 1* M *H * W + 1 * 1 * M * N *H * W$

2.4 DSCNN 高光时刻

计算量骤减

正常卷积：

$FLOPs = k_w * k_h * M * N *H * W$

Dscnn：

$FLOPs = k_w * k_h * 1* M *H * W + 1 * 1 * M * N *H * W$

两者相比：

$\frac{k_w * k_h * 1* M *H * W + 1 * 1 * M * N *H * W}{k_w * k_h * M * N *H * W} = \frac{1}{N} + \frac{1}{k_w * k_h}$

通常 $N >> k^2$ ，所以上述公式可以约等于 $1 / (k_w * k_h)$ 。

假设使用了 $3 * 3$ 的卷积核，则 MobileNet 相对来说减少了大概8~9倍的计算量。

Q: MobileNet 和 DSCNN 之间的联系和区别是啥？

A: 以 DSCNN 为核心构建的28层的神经网络结构

By defining the network in such simple terms we are able to easily explore network topologies to find a good network. … Counting depthwise and Pointwise Convolutions as separate layers, MobileNet has 28 layers.

2.5 MobileNet 的两个超参

在摘要的地方，作者提出了引入了两个全局超参来优化模型的延时和准确率，一直不得而入，知道看到了文章的3.3 、3.4 两小节。

Width Multiplier
- 减少计算量，作用在通道数上面
Resolution Multiplier
- 减少计算量，作用在输入图片上，并且每一层都减少相同的乘法系数

结果比对：

剩下的就是一些实验结果比对了，反正就是好就完事，不建议过分精度

0x03 参考文献

深度可分离卷积

Lebhoryi

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
MobileNet V1 复古的直筒子结构 --温故而知新

原创: Lebhoryi@gmail.com时间: 2020/08/06文章目录0x00 Paper0x01 前言1.1 为了解决什么问题1.2 派系1.2.1 压缩预先训练好的模型1.2.2 设计更小的神经网络1.2.3 other0x02 让人兴奋的点2.1 正常人的卷积2.2 Depthwise Convolution2.3 Pointwise Convolution2.4 DSCNN 高光时刻2.5 MobileNet 的两个超参0x03 参考文献0x00 Paperpaper: M.
复制链接

扫一扫

专栏目录