[CNN][CV] 需要了解的9个神经网络Understanding CNN-Important papers and networks
Reference 本文基于此文总结
AlexNet (2012)
The first CNN winner in ImageNet.
ZF Net (2013)
Visualizing and Understanding Deep Neural Networks by Matt Zeiler
Winner of ImageNet in 2013. Inspiring idea in visualizing feature maps and structures which excite a given feature map using deconvnet.–intuition.
VGG Net (2014)
Input 224*224 RGB, uses all 3*3 filters. Very deep.
Work well on both image classification and localization.
GoogLeNet (2015)
Not general approach of simply stacking conv and pooling layers on top of each other in a sequential structure. Acknowledge the consideration on power and memory cost.
-Inception module
Use same convolution, and concatenate the result of each feature map.
Naïve Inception module to Full Inception module: adding 1*1 filters.
–1*1 filter:
Method of dimensionality reduction. e.g.
100 ∗ 100 ∗ 60 → 20   1 ∗ 1   f i l t e r 100 ∗ 100 ∗ 20 100*100*60 \xrightarrow{20\,1*1 \,filter} 100*100*20 100∗100∗60201∗1filter100∗100∗20
This can be thought of as a “pooling of features” because we are reducing the depth of the volume, similar to how we reduce the dimensions of height and width with normal max pooling layers.
ResNet (2015)
152 layers.
Residual Block
Region Based CNNS (R-CNNS)
To solve the problem of object detection. The process can be split into two general components, the region proposal(Selective Search) step and the classification step.
Generative Adversarial Networks (2014)
For example, let’s consider a trained CNN that works well on ImageNet data. Let’s take an example image and apply a perturbation, or a slight modification, so that the prediction error is maximized. Thus, the object category of the prediction changes, while the image itself looks the same when compared to the image without the perturbation. From the highest level, adversarial examples are basically the images that fool ConvNets.
GAN:Let’s think of two models, a generative model and a discriminative model. The discriminative model has the task of determining whether a given image looks natural (an image from the dataset) or looks like it has been artificially created. The task of the generator is to create images so that the discriminator gets trained to produce the correct outputs. This can be thought of as a zero-sum or minimax two player game. The analogy used in the paper is that the generative model is like “a team of counterfeiters, trying to produce and use fake currency” while the discriminative model is like “the police, trying to detect the counterfeit currency”. The generator is trying to fool the discriminator while the discriminator is trying to not get fooled by the generator. As the models train, both methods are improved until a point where the “counterfeits are indistinguishable from the genuine articles”.
Generating Image Descriptions (2014)
Combine CNNs with RNNs
Given image and text descriptions (weak label)
-Alignment Model
Aims to learn the align the visual and textual data.
Uses R-CNN to detect object in the image. Embed them into a 500 dimensional space. Use Bidirectional RNN to embed words into the same multimodal space. Compute similarity by inner products.
-Generation Model
Learn from the datasets created by alignment model to generate description.
Spatial Transformer Networks (2015)
The network learns transformations to the feature volumes and apply the transformation (warp) between layers.