READING NOTE: Rethinking the Inception Architecture for Computer Vision

原创 2016年08月29日 23:20:02

TITLE: Rethinking the Inception Architecture for Computer Vision

AUTHER: Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna

ASSOCIATION: Google Inc., University College London

FROM: arXiv:1512.00567


  1. Several general and specific design priciples are discussed

Design Choices

General Design Principles

  1. Avoid representational bottlenecks, especially early in the network. One should avoid bottlenecks with extreme compression. In general the representation size should gently decrease from the inputs to the outputs before reaching the final representation used for the task at hand.
  2. Higher dimensional representations are easier to process locally within a network. Increasing the activations per tile in a convolutional network allows for more disentangled features. The resulting networks will train faster.
  3. Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power.
  4. Balance the width and depth of the network.

Sepecific Design Strategy

  1. Factorizing Convolutions with Large Filter Size includes Factorization into smaller convolutions and Spatial Factorization into Asymmetric Convolutions. Both help to improve the speed and the complexity of the learnt function.
  2. Utility of Auxiliary Classifiers act as regularizer rather than help evolving the low-level features. Near the end of training, the network with the auxiliary branches starts to overtake the accuracy of the network without any auxiliary branch and reaches a slightly higher plateau.
  3. Efficient Grid Size Reduction reduces the computational cost while removing the representational bottleneck.

Some Other Ideas

In this paper, a very intereting experiment is of value to be noted. With different perceptive field size, the networks can achieve similar results if similar computational cost is constant.

In my own trials of SSD, I found networks of similar computational cost with differnt perceptive field size have very different result in detection task. For example, Network A has a perceptive field size of 112x112, while Network B is 170x170. Network B has a slightly better performance on classificatino task on Network A. On the contrary, after the two networks are finetuned on 200*200 images on detection task, Network A is better. Thus, how about we train a network with the perceptive field size of, let’s say, 56x56 and finetune it on 100x100 images? Will it have a comparable result?



《Rethinking the Inception Architecture for Computer Vision》笔记


[深度学习论文笔记][Image Classification] Rethinking the Inception Architecture for Computer Vision

Szegedy, Christian, et al. “Rethinking the inception architecture for computer vision.” arXiv prepri...

GoogLeNet:Inception V3:Rethinking the Inception Architecture for Computer Vision论文笔记

俗话说得好,探索的道路是永无止境的,GoogLeNet经过了Inception V1、Inception V2(BN)的发展以后,Google的Szegedy等人又对其进行了更深层次的研究和拓展,在本...
  • wspba
  • wspba
  • 2017-03-29 16:10
  • 2335

Reading List for Computer Vision Newbie

之前有给学弟写过一个Computer Vision方面初学者的Reading List。联想到自己当年也是一步一步不知深浅地踩出来,中间走了不少弯路,遂决定将这份Reading List重新写一下,加...

MATLAB and Octave Functions for Computer Vision

转自: Peter Kovesi   Index ...

图像处理和计算机视觉中的Gabor滤波:Gabor filter for image processing and computer vision

Gabor filter for image processing and computer vision N. Petkov and M.B. Wieling, University of...

深度学习讲座笔记:Deep Learning for Computer Vision - Andrej Karpathy at Bay Area Deep Learning School

Andrej Karpathy这次演讲是Day 1 的第2个演讲,题为深度学习在图像处理方面的应用,阐述了卷积神经网络架构的设计,以及ILSVR历年竞赛的情况以及最近在图像处理这方面的进展,并且给出了...

Learning ROS for Robotics Programming Second Edition学习笔记(五) indigo computer vision

Learning ROS for Robotics Programming Second Edition学习笔记(五) indigo computer vision FireWire IEEE139...

计算机视觉和模式识别中的稀疏表示(Sparse Representation for Computer Vision and Pattern Recognition)

最近拜读了John Wright和Yi Ma2010年在Proceedings of the IEEE上关于稀疏表示的大作 :Sparse Representation for Computer Vi...