上节回顾:深度学习框架——TensorFlow PyTorch Caffe
LeNet-5
LeCun,1998
Architecture: CONV-POOL-CONV-POOL-FC-FC
Conv filters :5x5 stride=1
Subsampling (Pooling)layers :2x2 stride=1
AlexNet
Krizhevsky ,2012 , imageNet challenge
Architecture:CONV1-MAXPOOL1-NORM1--CONV2-MAX POOL2-NORM2-CONV3-CONV4-CONV5-Max POOL3-FC6-FC7-FC8
假设input:227*227*3
CONV1:96 11*11 , stride=4 , pad=0
Q:output=?
(227-11)/4 +1=55,所以是55*55*96
Q:parameters=?
11*11*3*96=35K
POOL1:3*3 filters, stride=2
Q:output=?
(55-3)/2+1=27,所以是27*27*96
Q:parameters=?
Nothing
Full (simplified) AlexNetarchitecture:
[227x227x3] INPUT
[55x55x96] CONV1:96 11x11 filters at stride 4, pad 0
[27x27x96] MAXPOOL1: 3x3 filters at stride 2
[27x27x96] NORM1:Normalization layer
[27x27x256] CONV2:256 5x5 filters at stride 1, pad 2
[13x13x256] MAXPOOL2: 3x3 filters at stride 2
[13x13x256] NORM2:Normalization layer
[13x13x384] CONV3:384 3x3 filters at stride 1, pad 1
[13x13x384] CONV4:384 3x3 filters at stride 1, pad 1
[13x13x256] CONV5:256 3x3 filters at stride 1, pad 1
[6x6x256] MAXPOOL3: 3x3 filters at stride 2
[4096] FC6:4096 neurons
[4096] FC7:4096 neurons
[1000] FC8:1000 neurons (class scores)
Details/Retrospectives:
- first use of ReLU
- used Norm layers (notcommon anymore)
- heavy data augmentation
- dropout 0.5
- batch size 128
- SGD Momentum 0.9
- Learning rate 1e-2,reduced by 10
manually when val accuracy plateaus
- L2 weight decay 5e-4
- 7 CNN ensemble: 18.2%-> 15.4%
ZFNet
Zeiler and Fergus, 2013 ,ImageNet Challenge
在AlexNet上的微调
AlexNet but:
CONV1: change from (11x11 stride 4) to(7x7 stride 2)
CONV3,4,5: instead of 384, 384, 256filters use 512, 1024, 512
把top 5的错误率从16.4%降到了11.7%
VGGNet
Simonyan and Zisserman, 2014,ImageNet Challenge
特点:Small filters, Deeper networks
8 layers (AlexNet) -> 16 - 19 layers (VGG16Net)
Only 3x3 CONV stride 1, pad 1 and 2x2 MAX POOL stride 2
Top 5 error从11.7%降到7.3%
Q:为什么用更小的filters?
3*3 conv (stride=1) layer的叠加可以与一个7*7 conv layer的receptivefield 相同。
第一层是3*3第二层虽然也是3*3但是有重叠信息,实际的RF是5