READING NOTE: Rethinking the Inception Architecture for Computer Vision

原创 2016年08月29日 23:20:02

TITLE: Rethinking the Inception Architecture for Computer Vision

AUTHER: Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna

ASSOCIATION: Google Inc., University College London

FROM: arXiv:1512.00567


  1. Several general and specific design priciples are discussed

Design Choices

General Design Principles

  1. Avoid representational bottlenecks, especially early in the network. One should avoid bottlenecks with extreme compression. In general the representation size should gently decrease from the inputs to the outputs before reaching the final representation used for the task at hand.
  2. Higher dimensional representations are easier to process locally within a network. Increasing the activations per tile in a convolutional network allows for more disentangled features. The resulting networks will train faster.
  3. Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power.
  4. Balance the width and depth of the network.

Sepecific Design Strategy

  1. Factorizing Convolutions with Large Filter Size includes Factorization into smaller convolutions and Spatial Factorization into Asymmetric Convolutions. Both help to improve the speed and the complexity of the learnt function.
  2. Utility of Auxiliary Classifiers act as regularizer rather than help evolving the low-level features. Near the end of training, the network with the auxiliary branches starts to overtake the accuracy of the network without any auxiliary branch and reaches a slightly higher plateau.
  3. Efficient Grid Size Reduction reduces the computational cost while removing the representational bottleneck.

Some Other Ideas

In this paper, a very intereting experiment is of value to be noted. With different perceptive field size, the networks can achieve similar results if similar computational cost is constant.

In my own trials of SSD, I found networks of similar computational cost with differnt perceptive field size have very different result in detection task. For example, Network A has a perceptive field size of 112x112, while Network B is 170x170. Network B has a slightly better performance on classificatino task on Network A. On the contrary, after the two networks are finetuned on 200*200 images on detection task, Network A is better. Thus, how about we train a network with the perceptive field size of, let’s say, 56x56 and finetune it on 100x100 images? Will it have a comparable result?


经典的卷积网络结构:AlexNet,VGGNet,Google Inception Net,ResNet(HR)

原文 卷积网络LeNet5 LeNet5 诞生于1994年,是最早的深层卷积神经网络之一,并且推动了深度学习的发展。从1988年开始,在多次成功的迭代后,这项由Yann LeCun完成的开拓性成...
  • m0_37733057
  • m0_37733057
  • 2017年04月18日 20:35
  • 5526

【Computer Vision】计算机视觉相关课程和书籍

Table of Contents BooksCoursesPapersSoftwareDatasetsTutorials and TalksResources for studentsBlog...
  • j_d_c
  • j_d_c
  • 2017年03月16日 09:22
  • 1443


GoogleNetGoogleNet 简介本节讲的是GoogleNet,这里面的Google自然代表的就是科技界的老大哥Google公司。Googe Inception Net首次出现在ILSVRC2...
  • u011974639
  • u011974639
  • 2017年08月03日 16:50
  • 2616

《Rethinking the Inception Architecture for Computer Vision》笔记

  • KangRoger
  • KangRoger
  • 2017年04月04日 23:11
  • 2482

[深度学习论文笔记][Image Classification] Rethinking the Inception Architecture for Computer Vision

Szegedy, Christian, et al. “Rethinking the inception architecture for computer vision.” arXiv prepri...
  • Hao_Zhang_Vision
  • Hao_Zhang_Vision
  • 2016年10月09日 11:21
  • 769

Rethinking the Inception Architecture for Computer Vision

  • 2017年08月23日 11:12
  • 505KB
  • 下载

GoogLeNet:Inception V3:Rethinking the Inception Architecture for Computer Vision论文笔记

俗话说得好,探索的道路是永无止境的,GoogLeNet经过了Inception V1、Inception V2(BN)的发展以后,Google的Szegedy等人又对其进行了更深层次的研究和拓展,在本...
  • wspba
  • wspba
  • 2017年03月29日 16:10
  • 3587

Reading List for Computer Vision Newbie

之前有给学弟写过一个Computer Vision方面初学者的Reading List。联想到自己当年也是一步一步不知深浅地踩出来,中间走了不少弯路,遂决定将这份Reading List重新写一下,加...
  • scyscyao
  • scyscyao
  • 2011年05月25日 02:00
  • 3320

图像处理和计算机视觉中的Gabor滤波:Gabor filter for image processing and computer vision

Gabor filter for image processing and computer vision N. Petkov and M.B. Wieling, University of...
  • GarfieldEr007
  • GarfieldEr007
  • 2015年12月29日 13:00
  • 1221

MATLAB and Octave Functions for Computer Vision

转自: Peter Kovesi   Index ...
  • rookiew
  • rookiew
  • 2015年12月05日 08:40
  • 745
您举报文章:READING NOTE: Rethinking the Inception Architecture for Computer Vision