AlexNet
CONV - MAXPOOL - NORM(not common)
CONV - MAXPOOL - NORM
CONV - CONV - CONV - MAXPOOL
FC - FC - FC
local response normalization
VGG
- deeper (16-19layers)
- small filter (3x3 CONV stride 1, pad 1)
- 2 x 2 MAX POOL, stride 2
- stack of 3x3 has same effective receptive field as 7x7(three stacks)
- more nonlinearity
- fewer parameters
- 2nd in classication, 1st in localization
- VGG16, VGG19
FC7 feature(4096) generailize well to other tasks
detection: multiple instance
- localization: one instance
GoogleNet
- 22 layers
- No fc layers
- “Inception” module
- a good local network topology(network within a network)
- stack inception modules on top of each other
- In: conv pool conv pool + inception x
- Auxiliary classifcation outputs to inject additioanl gradient at lower layers (AvgPool-1x1Conv-FC-FC-Softmax)
Inception
Naive
- feature map too depth too hight
- use bottleneck