Xception:Deep Learning with Depthwise Separable Convolutions

最新推荐文章于 2024-07-24 10:46:13 发布

点宝木九日

最新推荐文章于 2024-07-24 10:46:13 发布

阅读量434

点赞数

文章标签：神经网络

本文链接：https://blog.csdn.net/weixin_37759713/article/details/109736160

版权

论文阅读笔记

论文链接：https://openaccess.thecvf.com/content_cvpr_2017/html/Chollet_Xception_Deep_Learning_CVPR_2017_paper.html

Abstract

This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception.

Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.

Introduction

摘要简要介绍卷集神经网络的历史，从早期LeNet-style models,到2012年的AleNet。之后由ILSVRC的驱动，随之而来的是建立这种网络的趋势越来越深：首先是2013的Zeiler和Fergus, 2014年的VGG，2014 Inception V1,...Inception V2, Inception V3, Inception-ResNet。

An Inception model can be understood as a stack of such modules(Figure 1). This is a departure from earlier VGG-style networks which were stacks of simple convolution layers.

While Inception modules are conceptually similar to convolutions (they are convolutional feature extractors), they empirically appear to be capable of learning richer representations with less parameters. How do they work, and how do they differ from regular convolutions? What design strategies come after Inception?

The Inception hypothesis

In effect, the fundamental hypothesis behind Inception is that cross-channel correlations and spatial correlations are sufficiently decoupled that it is preferable not to map them jointly.

Would it be reasonable to make a much stronger hypothesis than the Inception hypothesis, and assume that cross-channel correlations and spatial correlations can be mapped completely separately?

An “extreme” version of an Inception module, based on this stronger hypothesis, would first use a 1x1 convolution to map cross-channel correlations, and would then separately map the spatial correlations of every output channel.

This is shown in figure 4. We remark that this extreme form of an Inception module is almost identical to a depthwise separable convolution.

A depthwise separable convolution, commonly called “separable convolution” in deep learning frameworks such as TensorFlow and Keras, consists in a depthwise convolution, i.e. a spatial convolution performed independently over each channel of an input, followed by a pointwise convolution, i.e. a 1x1 convolution, projecting the channels output by the depthwise convolution onto a new channel space. This is not to be confused with a spatially separable convolution, which is also commonly called “separable convolution” in the image processing community.

注：separable convolution分为spatial separable convolution和depthwise separable convolution.

Two minor differences between and “extreme” version of an Inception module and a depthwise separable convolution would be:

The order of the operations: depthwise separable convolutions as usually implemented (e.g. in TensorFlow) perform first channel-wise spatial convolution and then perform 1x1 convolution, whereas Inception performs the 1x1 convolution first.
The presence or absence of a non-linearity after the first operation. In Inception, both operations are followed by a ReLU non-linearity, however depthwiseseparable convolutions are usually implemented without non-linearities.

The Xception architecture

We propose a convolutional neural network architecture based entirely on depthwise separable convolution layers. In effect, we make the following hypothesis: that the mapping of cross-channels correlations and spatial correlations in the feature maps of convolutional neural networks can be entirely decoupled. Because this hypothesis is a stronger version of the hypothesis underlying the Inception architecture, we name our proposed architecture Xception, which stands for “Extreme Inception”.

In short, the Xception architecture is a linear stack of depthwise separable convolution layers with residual connections.

Reference

Words

emerge 出现，产生

We show that this architecture, dubbed Xception.

The fundamental building block of Inception-style models is the Inception module

empirically 凭经验

canonical 典范

explicitly 明确地

extensively 广泛地

cornerstone 基石

网络不收敛(与论文不相关内容)

看看训练参数(h5 or protobuf)有没有加载正确，以免使用随机初始化的参数。
训练数据有没有喂进去。

点宝木九日

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Xception:Deep Learning with Depthwise Separable Convolutions

论文阅读笔记论文链接：https://openaccess.thecvf.com/content_cvpr_2017/html/Chollet_Xception_Deep_Learning_CVPR_2017_paper.htmlAbstractThis observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Incept
复制链接

扫一扫