1. Introduction
classification -> localization -> semantic segmentation -> instance segmentation
2. Terminology and Background Concepts
2.1 Common Deep Network Architectures
AlexNet
(上面这幅图是Alex原文的图,用的是两台GPU)
- five convolutional layers
- max-pooling ones
- ReLUs
- three fully-connected layers
- dropout
具体分析:
1. conv1:
224*224*3(RGB)
–> 预处理(?)
–> 227*227*3
–> 96个filter1, size=11*11*3(也是RGB三通道), stride=4
–> (227-11)/4+1=55,输出=55*55*96
–> ReLU1
–> max pooling1, size=3, stride=2
–> (55-3)/2+1=27,输出:27*27*96
–> norm1,local size=5(跨通道归一化)
–> 输出:27*27*96
2. conv2:
–> 256个filter2(计算不同?),size=5*5,stride=1, zero padding=2, (27+2*2-5)/1+1=27, 输出=27*27*256
–> ReLU2
–> max pooling2, size=3, stride=2,(27-3)/2+1=13,输出=13*13*256
–> norm2,local size=5
–> 输出=13*13*256
3. conv3:
–> 384个filter3,size=3*3,stride=1, zero padding=1, (13+2*1-3)/1+1=13,输出=13*13*384
–> ReLU3
–> 输出=13*13*384
4. conv4:
–> 384个filter4, size=3*3, stride=1, zero padding=1, (13+2*1-3)/1+1=13,输出=13*13*384
–> ReLU4
–> 输出=13*13*384
5. conv5:
–> 256个filter5, size=3*3, stride=1, zero padding=1, (13+2*1-3)/1+1=13,输出=13*13*256
–> ReLU5
–> max pooling3, size=3, stride=2, (13-3)/2+1=6
–> 输出=6*6*256
6. fc6:
–> fc6, node=4096
–> 输出=4096
–> ReLU6
–> dropout6
–> 输出=4096
7. fc7:
–> fc7, node=4096
–> ReLU7
–> dropout7
–> 输出=4096
8. fc8:
–> fc8, node=1000
–> 输出=1000