The pooling layer! 0 parameters!!!why?
The parameters are the weights right that we're trying to learn
AlexNet
Why use smaller filters? (3*3)
Stack of three 3x3 conv (stride 1) layers has same effective receptive field as one 7x7 conv layer
What is the effective receptive field of three 3x3 conv(stride 1) layers?
It's able to have more non-linearities in there , and it's also fewer parameters.
the parameter detaile
Inception module
Solution:"bottleneck" layers that use 1x1 convolutions to reduce feature depth
ResNet!!
gradient blow up
setting detailes