论文阅读-Visualizing and Understanding Convolutional Networks

作者: Matthew D.Zeiler
来源: ECCV 2014
评价: ZF-Net, 反卷积, 卷积层可视化
论文链接: PDF

1 Problem

There is no clear understanding of why large convolution network models work so well.

2 The proposed method

  1. A novel way to visualize the acitivity within the model.
    other contributions:
  • They also perform a sensitivity analysis of the classifier output through a series of occlusion experiments, revealing which parts of the scene are important for classification.
  • Show the power of ImageNet pre-trained model.

Related Work

  • Challenges:
    1. There are limited methods to visualize features in higher layers.
  1. Erhan, D et al. Visualizing higher-layer features of a deep network. In Technical report, University of Montreal, 2009.
    short-comings: requires a carefuol initialization and does not give any information about the unit’s invariance.

2 Approach

2.1 Visualization with a Deconvnet

The author present a novel way to map the intermediate output back to the input pixel space, showing what input pattern originally caused a given activation in the feature maps.

  • Deconvnet: Max Unpooling(switiches) --> Rectification --> Filtering
    在这里插入图片描述

3 Training Details

  1. Data augmentation(crops and horitontal flips etc)
  2. Hyperparameters.
  3. Supress overfitting

4 Convnet Visualization

  • Feature Visualization: they choose top 9 strongest activations a visualize it.(but what does strongest mean? how they measured it?)
  • Feature Evolution during Training: Lower layers converge quickly, upper layers need more epochs to converge.
  • Feature Invariance: They experiment on translated, rotated and scaled image samples, finding that the output of the network are stable to transalitions and scalings, except for rotations.

4.1 Architecture Selection

Using the visualization techniques, the author find problems of AlexNet by watching visualizaiton of its first and second layers.

  • Problems:
    1. The first layer filters are a mix of extremely high and low frequency information, with little coverage of the mid frequency.
    2. large stride 4 used in the 1st layer convolutions cause aliasing artifacts(混叠的工件)
  • Solutions:
    1. in first convolutional layer, kernel size was changed from 11x11 to 7x7, strides was changed from 4x4 to 2x2. (The author argue that this new architecture retains much more information in the 1st and 2nd layer features.

4.2 Occlusion Sensitivity(遮挡敏感性)

  • Question: With image classifcation approaches, if the model is truly identifying the location of the object of just using the surrounding context.
  • Experiment: By systematically occluding different portions of the input image with a grey square, and monitoring the output of the classifier.
  • Conclusion: The model indeed localizing the objects within the scene.

4.3 Correspondence Analysis

  • Question: To explore the correspondence between specific object parts in different images.(e.g. faces have a particular spactial configuration of the eyes and nose).
  • Experiment: They choose five dog pictures with infront pose, and systemmatically mask out the same part of the face in each image(e.g. all left eyes, see Fig.8) . And measure the consistency of within image feature difference with their proposed formula.
  • Conclusion: For specific part(e.g. left eyes, the lower the score is the higher the consistency is. Table1 shows the comprisons of layer 5 and layer 7 respectively.
    在这里插入图片描述

5 Experiments

5.1 ImageNet 2012

  • Ablation experiments: Varing ImageNet Model Sizes: the author do ablation experiments by removing convolutional and fully-connected layers to observe the model’s performance.
  • Conclusions: They find that removing one or two arbitrary layers only cause slight drop in accuracy. But removing 4 layers (two conv-layers and two fc layers) lead to a dramatic drop. Which indicates that the overall depth of the network really matters.

5.2 Feature Generalization

The author validate their model trained on ImgageNet on Caltech-101, Caltech-256 and PASCAL2012, by fixing all layer but the top softmax classifier. While achieving the best performance on Caltech and rather good score in PASCAL.

  • Conclusion: pre-training on ImageNet can build very strong feature extractors which can well generalize to other dataset. While a model trained from scratch(从头开始训练) does terribly. See Table4.
    在这里插入图片描述

3.7 Data sets

  • ILSVRC2012
  • Caltech-101
  • Caltech-256
  • PASCAL2012

3.8 Weakness

  1. Training is a multi-stage pipeline.
  2. Training is expensive in space and time.
  3. Object detection is slow.
  4. Fixed size region proposal. All region proposals are reshaped into fixed size, which distort the object in the images and may lose some information.

4 Future works

open question: Their model generalize less well to the PASCAL data, they conjecture that their performance would improve if a different loss function was used that permitted multiple objects per image.

5 What do other researchers say about his work?

  • Simonyan et al. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR 2015.

    • For instance, the best-performing submissions to the ILSVRC- 2013 (Zeiler & Fergus, 2013; Sermanet et al., 2014) utilised smaller receptive window size and smaller stride of the first convolutional layer.
  • Christian Szegedy et al. Going Deeper with Convolutions. CVPR 2015.

    • For larger datasets such as Imagenet, the recent trend has been to increase the number of layers [12] and layer size [21, 14], while using dropout [7] to address the problem of overfitting
  • Kaiming He et al. Deep residual learning for image recognition. CVPR 2016.
    Deep networks naturally integrate low/mid/high- level features [50] and classifiers in an end-to-end multi- layer fashion, and the “levels” of features can be enriched by the number of stacked layers (depth). R

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值