Understanding CNN

1 篇文章 0 订阅

[CNN][CV] 需要了解的9个神经网络Understanding CNN-Important papers and networks

Reference 本文基于此文总结

Machine Learning Glossary

AlexNet (2012)


The first CNN winner in ImageNet.

ZF Net (2013)


Visualizing and Understanding Deep Neural Networks by Matt Zeiler

Winner of ImageNet in 2013. Inspiring idea in visualizing feature maps and structures which excite a given feature map using deconvnet.–intuition.

VGG Net (2014)


Input 224*224 RGB, uses all 3*3 filters. Very deep.

Work well on both image classification and localization.

GoogLeNet (2015)


Not general approach of simply stacking conv and pooling layers on top of each other in a sequential structure. Acknowledge the consideration on power and memory cost.

-Inception module

Use same convolution, and concatenate the result of each feature map.

Naïve Inception module to Full Inception module: adding 1*1 filters.

–1*1 filter:

Method of dimensionality reduction. e.g.

100 ∗ 100 ∗ 60 → 20   1 ∗ 1   f i l t e r 100 ∗ 100 ∗ 20 100*100*60 \xrightarrow{20\,1*1 \,filter} 100*100*20 100100602011filter 10010020

This can be thought of as a “pooling of features” because we are reducing the depth of the volume, similar to how we reduce the dimensions of height and width with normal max pooling layers.

ResNet (2015)


152 layers.

Residual Block

Region Based CNNS (R-CNNS)


R-CNN

Fast R-CNN

Faster R-CNN

To solve the problem of object detection. The process can be split into two general components, the region proposal(Selective Search) step and the classification step.

Generative Adversarial Networks (2014)


For example, let’s consider a trained CNN that works well on ImageNet data. Let’s take an example image and apply a perturbation, or a slight modification, so that the prediction error is maximized. Thus, the object category of the prediction changes, while the image itself looks the same when compared to the image without the perturbation. From the highest level, adversarial examples are basically the images that fool ConvNets.

GAN:Let’s think of two models, a generative model and a discriminative model. The discriminative model has the task of determining whether a given image looks natural (an image from the dataset) or looks like it has been artificially created. The task of the generator is to create images so that the discriminator gets trained to produce the correct outputs. This can be thought of as a zero-sum or minimax two player game. The analogy used in the paper is that the generative model is like “a team of counterfeiters, trying to produce and use fake currency” while the discriminative model is like “the police, trying to detect the counterfeit currency”. The generator is trying to fool the discriminator while the discriminator is trying to not get fooled by the generator. As the models train, both methods are improved until a point where the “counterfeits are indistinguishable from the genuine articles”.

create artificial pictures

Generating Image Descriptions (2014)


Combine CNNs with RNNs

Given image and text descriptions (weak label)

-Alignment Model

Aims to learn the align the visual and textual data.

Uses R-CNN to detect object in the image. Embed them into a 500 dimensional space. Use Bidirectional RNN to embed words into the same multimodal space. Compute similarity by inner products.

-Generation Model

Learn from the datasets created by alignment model to generate description.

Spatial Transformer Networks (2015)


The network learns transformations to the feature volumes and apply the transformation (warp) between layers.

Animation

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值