https://arxiv.org/pdf/1704.04861.pdf
- 摘要
- mobilenet是一个流线型结构,用了depthwise separable convolutions
- 用了两个全局超参,在耗时和准确性之间做了trade off,width multiplier and resolution multiplier
- prior work
- 开发者可以根据resource的限制来选择一个小模型
- MobileNets primarily focus on optimizing for latency but also yield small networks.
- 小模型:
- MobileNets are built primarily from depthwise separable convolutions initially introduced in [26] and subsequently used in Inception models [13] to reduce the computation in the first few layers.
- 模型是在depthwise separable convolutions基础上建立的,之后用在了inception模型中
- Flattened networks [16] build a network out of fully factorized convolutions and showed the potential of extremely factorized networks. Independent of this current paper, Factorized Networks[34] introduces a similar factorized convolution as well as the use of topological connections.
- the Xception network [3] demonstrated how to scale up depthwise separable filters to out perform Inception V3 networks.
- Squeezenet [12] which uses a bottleneck approach to design a very small network.
- Other reduced computation networks include structured transform networks [28] and deep fried convnets [37].
- MobileNets are built primarily from depthwise separable convolutions initially introduced in [26] and subsequently used in Inception models [13] to reduce the computation in the first few layers.
- mobilenet结构
- depthwise separable filters
- depthwise separable convolution is a form of factorized convolutions which factorize a standard convolution into a depthwise convolution and a 1×1 convolution called a pointwise convolution.
- a depthwise convolution
- 每一个input channel只有一个filter,标准卷积是有output channel个filter
- a pointwise convolution
- 1*1filter 将depthwide convolution的输出combine起来
- 标准的卷积
- filter和combination一起完成,depthwise separable filter分开做了这两步
- a depthwise convolution
- 这种操作显著降低了计算量和模型大小:
-
- depthwise separable convolution is a form of factorized convolutions which factorize a standard convolution into a depthwise convolution and a 1×1 convolution called a pointwise convolution.
- 模型结构及训练
- 所有层都加了BN以及RELU
- 模型结构:
- 1*1卷积可以用GEMM优化
- mobilenet中的优化:
- Our model structure puts nearly all of the computation into dense 1 × 1 convolutions. This can be implemented with highly optimized general matrix multiply (GEMM) functions. Often convolutions are implemented by a GEMM but require an initial reordering in memory called im2col in order to map it to a GEMM. For instance, this approach is used in the popular Caffe package [15].
- 1×1 convolutions do not require this reordering in memory and can be implemented directly with GEMM which is one of the most optimized numerical linear algebra algorithms.
- MobileNet spends 95% of it’s computation time in 1 × 1 convolutions which also has 75% of the parameters as can be seen in Table 2. Nearly all of the additional parameters are in the fully connected layer.
- 模型训练所用优化器为g RMSprop with asynchronous gradient descent,没看懂
- Width Multiplier: Thinner Models
- depthwise separable filters输入输出通道数都乘以Width Multiplier
- Resolution Multiplier: Reduced Representation
- model中的所有层的长宽都同时乘以Resolution Multiplier
- depthwise separable filters
- experiment
- model choice
- depthwise separable convolutions vs full convolution
- thinner model vs shallower model
- We next show results comparing thinner models with width multiplier to shallower models using less layers. To make MobileNet shallower, the 5 layers of separable filters with feature size 14 × 14 × 512 in Table 1 are removed. Table 5 shows that at similar computation and number of parameters, that making MobileNets thinner is 3% better than making them shallower.
- Model Shrinking Hyperparameters
- model choice
...之后便是一些模型与现有模型结果比较