论文阅读笔记
Abstract
This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception.
Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.
Introduction
摘要简要介绍卷集神经网络的历史,从早期LeNet-style models,到2012年的AleNet。之后由ILSVRC的驱动,随之而来的是建立这种网络的趋势越来越深:首先是2013的Zeiler和Fergus, 2014年的VGG,2014 Inception V1,...Inception V2, Inception V3, Inception-ResNet。
An Inception model can be understood as a stack of such modules(Figure 1). This is a departure from earlier VGG-style networks which were stacks of simple convolution layers.
While Inception modules are conceptually similar to convolutions (they are convolutional feature extractors), they empirically appear to be capable of learning richer representations with less parameters. How do they work, and how do they differ from regular convolutions? What design strategies come after Inception?
The Inception hypothesis
In effect, the fundamental hypothesis behind Inception is that cross-channel correlations and spatial correlations are sufficiently decoupled that it is preferable not to map them jointly.
Would it be reasonable to make a much stronger hypothesis than the Inception hypothesis, and assume that cross-channel correlations and spatial correlations can be mapped completely separately?
An “extreme” version of an Inception module, based on this stronger hypothesis, would first use a 1x1 convolution to map cross-channel correlations, and would then separately map the spatial correlations of every output channel.
This is shown in figure 4. We remark that this extreme form of an Inception module is almost identical to a depthwise separable convolution.
A depthwise separable convolution, commonly called “separable convolution” in deep learning frameworks such as TensorFlow and Keras, consists in a depthwise convolution, i.e. a spatial convolution performed independently over each channel of an input, followed by a pointwise convolution, i.e. a 1x1 convolution, projecting the channels output by the depthwise convolution onto a new channel space. This is not to be confused with a spatially separable convolution, which is also commonly called “separable convolution” in the image processing community.
注:separable convolution分为spatial separable convolution和depthwise separable convolution.
Two minor differences between and “extreme” version of an Inception module and a depthwise separable convolution would be:
- The order of the operations: depthwise separable convolutions as usually implemented (e.g. in TensorFlow) perform first channel-wise spatial convolution and then perform 1x1 convolution, whereas Inception performs the 1x1 convolution first.
- The presence or absence of a non-linearity after the first operation. In Inception, both operations are followed by a ReLU non-linearity, however depthwiseseparable convolutions are usually implemented without non-linearities.
The Xception architecture
We propose a convolutional neural network architecture based entirely on depthwise separable convolution layers. In effect, we make the following hypothesis: that the mapping of cross-channels correlations and spatial correlations in the feature maps of convolutional neural networks can be entirely decoupled. Because this hypothesis is a stronger version of the hypothesis underlying the Inception architecture, we name our proposed architecture Xception, which stands for “Extreme Inception”.
In short, the Xception architecture is a linear stack of depthwise separable convolution layers with residual connections.
Reference
- https://helicqin.github.io/2018/10/02/Xception-Deep%20Learning%20with%20Depthwise%20Separable%20Convolutions/
- https://www.jianshu.com/p/9a0ba830af37
- https://blog.csdn.net/u014061630/article/details/80797477
- https://bbs.cvmart.net/topics/3407/vote_count?
- https://zhuanlan.zhihu.com/p/32746221
Words
emerge 出现,产生
We show that this architecture, dubbed Xception.
The fundamental building block of Inception-style models is the Inception module
empirically 凭经验
canonical 典范
explicitly 明确地
extensively 广泛地
cornerstone 基石
网络不收敛(与论文不相关内容)
- 看看训练参数(h5 or protobuf)有没有加载正确,以免使用随机初始化的参数。
- 训练数据有没有喂进去。