Xception:Deep Learning with Depthwise Separable Convolutions

论文阅读笔记

论文链接:https://openaccess.thecvf.com/content_cvpr_2017/html/Chollet_Xception_Deep_Learning_CVPR_2017_paper.html

Abstract

This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception.

Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.

Introduction

摘要简要介绍卷集神经网络的历史,从早期LeNet-style models,到2012年的AleNet。之后由ILSVRC的驱动,随之而来的是建立这种网络的趋势越来越深:首先是2013的Zeiler和Fergus, 2014年的VGG,2014 Inception V1,...Inception V2, Inception V3, Inception-ResNet。

An Inception model can be understood as a stack of such modules(Figure 1). This is a departure from earlier VGG-style networks which were stacks of simple convolution layers.

While Inception modules are conceptually similar to convolutions (they are convolutional feature extractors), they empirically appear to be capable of learning richer representations with less parameters. How do they work, and how do they differ from regular convolutions? What design strategies come after Inception?

The Inception hypothesis

In effect, the fundamental hypothesis behind Inception is that cross-channel correlations and spatial correlations are sufficiently decoupled that it is preferable not to map them jointly.

Would it be reasonable to make a much stronger hypothesis than the Inception hypothesis, and assume that cross-channel correlations and spatial correlations can be mapped completely separately?

An “extreme” version of an Inception module, based on this stronger hypothesis, would first use a 1x1 convolution to map cross-channel correlations, and would then separately map the spatial correlations of every output channel.

This is shown in figure 4. We remark that this extreme form of an Inception module is almost identical to a depthwise separable convolution.

 

A depthwise separable convolution, commonly called “separable convolution” in deep learning frameworks such as TensorFlow and Keras, consists in a depthwise convolution, i.e. a spatial convolution performed independently over each channel of an input, followed by a pointwise convolution, i.e. a 1x1 convolution, projecting the channels output by the depthwise convolution onto a new channel space. This is not to be confused with a spatially separable convolution, which is also commonly called “separable convolution” in the image processing community.

注:separable convolution分为spatial separable convolution和depthwise separable convolution.

Two minor differences between and “extreme” version of an Inception module and a depthwise separable convolution would be:

  • The order of the operations: depthwise separable convolutions as usually implemented (e.g. in TensorFlow) perform first channel-wise spatial convolution and then perform 1x1 convolution, whereas Inception performs the 1x1 convolution first.
  • The presence or absence of a non-linearity after the first operation. In Inception, both operations are followed by a ReLU non-linearity, however depthwiseseparable convolutions are usually implemented without non-linearities.

The Xception architecture

We propose a convolutional neural network architecture based entirely on depthwise separable convolution layers. In effect, we make the following hypothesis: that the mapping of cross-channels correlations and spatial correlations in the feature maps of convolutional neural networks can be entirely decoupled. Because this hypothesis is a stronger version of the hypothesis underlying the Inception architecture, we name our proposed architecture Xception, which stands for “Extreme Inception”.

In short, the Xception architecture is a linear stack of depthwise separable convolution layers with residual connections.

Reference

  1. https://helicqin.github.io/2018/10/02/Xception-Deep%20Learning%20with%20Depthwise%20Separable%20Convolutions/
  2. https://www.jianshu.com/p/9a0ba830af37
  3. https://blog.csdn.net/u014061630/article/details/80797477
  4. https://bbs.cvmart.net/topics/3407/vote_count?
  5. https://zhuanlan.zhihu.com/p/32746221

Words

emerge 出现,产生

We show that this architecture, dubbed Xception.

The fundamental building block of Inception-style models is the Inception module

empirically 凭经验

canonical 典范

explicitly 明确地

extensively 广泛地

cornerstone 基石

网络不收敛(与论文不相关内容)

  1. 看看训练参数(h5 or protobuf)有没有加载正确,以免使用随机初始化的参数。
  2. 训练数据有没有喂进去。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: Xception是一种深度学习模型,它使用深度可分离卷积来提高模型的效率和准确性。深度可分离卷积是一种卷积操作,它将标准卷积分成两个步骤:深度卷积和逐点卷积。深度卷积在每个输入通道上执行卷积,而逐点卷积在每个通道之间执行卷积。这种方法可以减少计算量和参数数量,同时提高模型的准确性。Xception模型在图像分类、目标检测和语义分割等任务中表现出色。 ### 回答2: Xception是一个基于深度可分离卷积的深度学习架构。深度学习在计算机视觉和自然语言处理等领域取得了巨大成功,但也面临着计算复杂性和模型尺寸庞大的问题。Xception通过引入深度可分离卷积来解决这些问题。 深度可分离卷积由分离卷积和逐点卷积两个步骤组成。首先,分离卷积将输入张量分别应用于空间和通道维度上的低秩张量。通过这种方式,模型可以分别学习特征的空间位置和通道之间的依赖关系。其次,逐点卷积将通道维度上的低秩张量应用于输出特征图。逐点卷积允许每个通道单独学习特征。 通过使用深度可分离卷积,Xception减少了参数的数量,并提高了模型的计算效率。与传统卷积相比,深度可分离卷积在减少计算量的同时,还可以提高模型的表示能力。这意味着Xception可以更好地捕捉和表示输入数据中的特征。 在实践中,Xception在图像分类、目标检测和语义分割等任务上都取得了非常好的表现。由于其较小的模型尺寸和高效的计算性能,Xception成为了很多研究者和工程师首选的深度学习架构之一。 总而言之,Xception通过引入深度可分离卷积来解决深度学习中的计算复杂性和模型尺寸庞大的问题。它减少了模型参数的数量、提高了模型的计算效率,同时又保持了较高的表示能力。作为一种强大的深度学习架构,Xception在多个领域具有广泛的应用和研究价值。 ### 回答3: Xception是一种深度学习模型,使用深度可分离卷积的方法来提高模型的准确性和效率。深度可分离卷积是一种卷积操作,由分离卷积和逐元素卷积两个步骤组成。 在传统卷积中,输入图像通过一个卷积核进行卷积操作,得到特征图。而在深度可分离卷积中,卷积操作被分解成两个步骤。首先,输入图像通过一个分离卷积核进行深度卷积,从而获取特征深度信息。然后,逐元素卷积操作被应用于分离卷积的输出,以获取空间信息。这种分离的方式减少了计算量和参数量,提高了模型的效率。 Xception模型使用了这种深度可分离卷积的结构。相比于传统的卷积方式,Xception模型能够更好地捕捉到输入图像中的细节信息。同时,由于深度卷积和逐元素卷积的分离,Xception模型的参数量大大减少,使得模型更加轻量化,便于在移动设备等资源受限的场景中应用。 通过对ImageNet大规模图像数据库进行训练,Xception模型取得了很好的性能。它在图像分类、目标检测和语义分割等任务上都取得了优秀的结果。同时,Xception模型也为其他相关任务,如迁移学习和特征提取等,提供了一个有力的基础。 总的来说,Xception是一种利用深度可分离卷积的深度学习模型,它在提高准确性和效率方面取得了显著的进展。它的设计和性能使得它成为了计算机视觉领域一个重要的技术突破。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值