Inception v1 论文及源码

本文详细解读了Inception v1论文,探讨了Network-in-network、R-CNN以及多任务和高层考虑。文章指出,通过增强网络的宽度和深度虽能提升性能,但也可能导致过拟合和计算成本增加。Inception架构寻找最佳局部稀疏结构,以密集组件逼近。同时,文章还分析了模型的架构细节和源码,包括关键函数truncated_normal_initializer的作用。
摘要由CSDN通过智能技术生成

Going deeper with convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich.
http://arxiv.org/pdf/1409.4842v1.pdf.

论文解读

Network-in-network

R-CNN

R-CNN是如今主流的object detection的方法。R-CNN将整个detection问题分为两个子问题。首先,利用一些底层的表示,例如:颜色,像素的一致性等,来暗示提示object。然后利用CNN分类器来识别在某个位置等物体。
这两个步骤,权衡了在颜色像素等级别对原图像分块和更加高层次的分类器。本文使用了类似的方法(pipeline),但是每个阶段使用了一些加强的方法,例如multi-box,

The current leading approach for object detection is the Regions with Convolutional Neural Net- works (R-CNN) proposed by Girshick et al. [6]. R-CNN decomposes the overall detection problem into two subproblems: to first utilize low-level cues such as color and superpixel consistency for potential object proposals in a category-agnostic fashion, and to then use CNN classifiers to identify object categories at those locations. Such a two stage approach leverages the accuracy of bound- ing box segmentation with low-level cues, as well as the highly powerful classification power of state-of-the-art CNNs. We adopted a similar pipeline in our detection submissions, but have ex- plored enhancements in both stages, such as multi-box [5] prediction for higher object bounding box recall, and ensemble approaches for better categorization of bounding box proposals.

Multivation and High Level Considerations

虽说最直接的提高网络性能的方法是增加宽度和深度,但是这两个方法有drawback,缺点。更多的参数导致容易过拟合,并且,消耗更多的计算资源。

For example, in a deep vision network, if two convolutional layers are chained, any uniform increase in the number of their filters results in a quadratic increase of computation.

解决这两个问题,可以将全联接层转换为稀疏链接层,即使是在CNN里。
主要思想就是,将有output 有关联的layer连起来。

if the probability distribution of the data-set is representable by a large, very sparse deep neural network, then the optimal network topology can be constructed layer by layer by analyzing the correlation statistics of the activations of the last layer and clustering neurons with highly correlated outputs.

Although the strict math- ematical proof requires very strong conditions, the fact that this statement resonates with the well known Hebbian principle – neurons that fire together, wire together – suggests that the underlying idea is applicable even under less strict conditions, in practice.

但是现在的硬件并不适合sparse网络,稀疏并不适合parallel computing。

The vast literature on sparse matrix computations (e.g. [3]) suggests that clustering sparse matrices into relatively dense submatrices tends to give state of the art practical performance for sparse matrix multiplication.

Architectural Details

Inception 结构基于这样一个想法:一个最优化的稀疏的CNN能多大程度上接近dense components。

The main idea of the Inception architecture is based on finding out how an optimal local sparse structure in a convolutional vision network can be approximated and covered by readily available dense components.

假设 translation 不变性意味着使用卷积来构建模型。
Note that assuming translation invariance means that our network will be built from convolutional building blocks.

我们需要的是找到这个最优的local construction,然后在空间上重复使用它。
ll we need is to find the optimal local construction and to repeat it spatially.

Arora et al. [2] suggests a layer-by layer construction in which one should analyze the correlation statistics of the last layer and cluster them into groups of units with high correlation.

我们假设这些 earlier 层的unit与输入图片等一些区域对应。
We assume that each unit from the earlier layer corresponds to some region of the input image and these units are grouped into filter banks.

源码分析

函数

variable_scope

def variable_scope(name_or_scope,
                   default_name=None,
                   values=None,
                   initializer=None,
                   regularizer=None,
                   caching_device=None,
                   partitioner=None,
                   custom_getter=None,
                   reuse=None,
                   dtype=None,
                   use_resource=None,
                   constraint=None):

Args:
name_or_scope: string or VariableScope: the scope to open.
default_name: The default name to use if the name_or_scope argument is
None, this name will be uniquified. If name_or_scope is provided it
won’t be used and therefore it is not required and can be None.
values: The list of Tensor arguments that are passed to the op function.

truncated_normal_initializer
就是class TruncatedNormal(Initializer)
Initializer that generates a truncated normal distribution.
一个initializer,类似于random_normal_initializer
These values are similar to values from a random_normal_initializer
except that values more than two standard deviations from the mean
are discarded and re-drawn. This is the recommended initializer for
neural network weights and filters.

源码

分析加在注释中。

def inception_v1_base(inputs, final_endpoint='Mixed_5c', scope='InceptionV1'):
  """Defines the Inception V1 base architecture.

  This architecture is defined in:
    Going deeper with convolutions
    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
    Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich.
    http://arxiv.org/pdf/1409.4842v1.pdf.

  Args:
    inputs: a tensor of size [batch_size, height, width, channels].
    final_endpoint: specifies the endpoint to construct the network up to. It
      can be one of ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
      'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
      'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e',
      'Mixed_4f', 'MaxPool_5a_2x2', 'Mixed_5b', 'Mixed_5c']
    scope: Optional variable_scope.

  Returns:
    A dictionary from components of the network to the corresponding activation.

  Raises:
    ValueError
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值