论文阅读-Visualizing and Understanding Convolutional Networks

最新推荐文章于 2023-08-07 10:56:41 发布

LuffysMan

最新推荐文章于 2023-08-07 10:56:41 发布

阅读量618

点赞数 1

分类专栏：论文阅读 # Deep Learning 文章标签： ZF-Net

本文链接：https://blog.csdn.net/LuffysMan/article/details/94711920

版权

Deep Learning 同时被 2 个专栏收录

16 篇文章 0 订阅

订阅专栏

论文阅读

12 篇文章 0 订阅

订阅专栏

作者: Matthew D.Zeiler
来源: ECCV 2014
评价: ZF-Net, 反卷积, 卷积层可视化
论文链接: PDF

1 Problem

There is no clear understanding of why large convolution network models work so well.

2 The proposed method

A novel way to visualize the acitivity within the model.
other contributions:

They also perform a sensitivity analysis of the classifier output through a series of occlusion experiments, revealing which parts of the scene are important for classification.
Show the power of ImageNet pre-trained model.

Related Work

Challenges:
1. There are limited methods to visualize features in higher layers.

Erhan, D et al. Visualizing higher-layer features of a deep network. In Technical report, University of Montreal, 2009.
short-comings: requires a carefuol initialization and does not give any information about the unit’s invariance.

2 Approach

2.1 Visualization with a Deconvnet

The author present a novel way to map the intermediate output back to the input pixel space, showing what input pattern originally caused a given activation in the feature maps.

Deconvnet: Max Unpooling(switiches) --> Rectification --> Filtering

3 Training Details

Data augmentation(crops and horitontal flips etc)
Hyperparameters.
Supress overfitting

4 Convnet Visualization

Feature Visualization: they choose top 9 strongest activations a visualize it.(but what does strongest mean? how they measured it?)
Feature Evolution during Training: Lower layers converge quickly, upper layers need more epochs to converge.
Feature Invariance: They experiment on translated, rotated and scaled image samples, finding that the output of the network are stable to transalitions and scalings, except for rotations.

4.1 Architecture Selection

Using the visualization techniques, the author find problems of AlexNet by watching visualizaiton of its first and second layers.

Problems:
1. The first layer filters are a mix of extremely high and low frequency information, with little coverage of the mid frequency.
2. large stride 4 used in the 1st layer convolutions cause aliasing artifacts(混叠的工件)
Solutions:
1. in first convolutional layer, kernel size was changed from 11x11 to 7x7, strides was changed from 4x4 to 2x2. (The author argue that this new architecture retains much more information in the 1st and 2nd layer features.

4.2 Occlusion Sensitivity(遮挡敏感性)

Question: With image classifcation approaches, if the model is truly identifying the location of the object of just using the surrounding context.
Experiment: By systematically occluding different portions of the input image with a grey square, and monitoring the output of the classifier.
Conclusion: The model indeed localizing the objects within the scene.

4.3 Correspondence Analysis

Question: To explore the correspondence between specific object parts in different images.(e.g. faces have a particular spactial configuration of the eyes and nose).
Experiment: They choose five dog pictures with infront pose, and systemmatically mask out the same part of the face in each image(e.g. all left eyes, see Fig.8) . And measure the consistency of within image feature difference with their proposed formula.
Conclusion: For specific part(e.g. left eyes, the lower the score is the higher the consistency is. Table1 shows the comprisons of layer 5 and layer 7 respectively.

5 Experiments

5.1 ImageNet 2012

Ablation experiments: Varing ImageNet Model Sizes: the author do ablation experiments by removing convolutional and fully-connected layers to observe the model’s performance.
Conclusions: They find that removing one or two arbitrary layers only cause slight drop in accuracy. But removing 4 layers (two conv-layers and two fc layers) lead to a dramatic drop. Which indicates that the overall depth of the network really matters.

5.2 Feature Generalization

The author validate their model trained on ImgageNet on Caltech-101, Caltech-256 and PASCAL2012, by fixing all layer but the top softmax classifier. While achieving the best performance on Caltech and rather good score in PASCAL.

Conclusion: pre-training on ImageNet can build very strong feature extractors which can well generalize to other dataset. While a model trained from scratch(从头开始训练) does terribly. See Table4.

3.7 Data sets

ILSVRC2012
Caltech-101
Caltech-256
PASCAL2012

3.8 Weakness

Training is a multi-stage pipeline.
Training is expensive in space and time.
Object detection is slow.
Fixed size region proposal. All region proposals are reshaped into fixed size, which distort the object in the images and may lose some information.

4 Future works

open question: Their model generalize less well to the PASCAL data, they conjecture that their performance would improve if a different loss function was used that permitted multiple objects per image.

5 What do other researchers say about his work?

Simonyan et al. Very Deep Convolutional Networks for Large-Scale Image Recognition. ICLR 2015.
- For instance, the best-performing submissions to the ILSVRC- 2013 (Zeiler & Fergus, 2013; Sermanet et al., 2014) utilised smaller receptive window size and smaller stride of the first convolutional layer.
Christian Szegedy et al. Going Deeper with Convolutions. CVPR 2015.
- For larger datasets such as Imagenet, the recent trend has been to increase the number of layers [12] and layer size [21, 14], while using dropout [7] to address the problem of overfitting
Kaiming He et al. Deep residual learning for image recognition. CVPR 2016.
Deep networks naturally integrate low/mid/high- level features [50] and classifiers in an end-to-end multi- layer fashion, and the “levels” of features can be enriched by the number of stacked layers (depth). R

LuffysMan

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
论文阅读-Visualizing and Understanding Convolutional Networks

作者: Matthew D.Zeiler来源: ECCV 2014评价: ZF-Net, 反卷积, 卷积层可视化论文链接: PDF1 ProblemThere is no clear understanding of why large convolution network models work so well.2 The proposed methodA novel way ...
复制链接

扫一扫

专栏目录